Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WAVE: Fix parse error in Associated Data List #309

Merged
merged 5 commits into from
Mar 28, 2018
Merged

WAVE: Fix parse error in Associated Data List #309

merged 5 commits into from
Mar 28, 2018

Conversation

marhop
Copy link
Member

@marhop marhop commented Feb 7, 2018

In JHOVE 1.18.1, validating a WAVE file that contains an Associated Data List chunk results in the status "Not well-formed" with error message "Unexpected end of file".

Consider this minimal example (binary version is attached) which constitutes a WAVE file without any audio data (empty Data chunk) but with an Associated Data List:

# File header
52494646 # "RIFF"
64000000 # file size
57415645 # "WAVE"

# Format chunk
666d7420 # chunkID "fmt "
10000000 # chunkSize
0100     # wFormatTag
0100     # wChannels
a0e9a002 # dwSamplesPerSec 44100k
40d34105 # dwAvgBytesPerSec
0200     # wBlockAlign
1000     # wBitsPerSample

# Data chunk (empty)
64617461 # chunkID "data"
00000000 # chunkSize

# Cue chunk
63756520 # chunkID "cue "
1c000000 # chunkSize
01000000 # dwCuePoints

# CuePoint
01000000 # dwIdentifier
00000000 # dwPosition
64617461 # fccChunk "data"
00000000 # dwChunkStart
00000000 # dwBlockStart
00000000 # dwSampleOffset

# Associated Data List
6c697374 # chunkId "list"
14000000 # chunkSize
6164746c # typeID "adtl"

# Label Chunk
6c61626c # chunkID "labl"
08000000 # chunkSize
01000000 # dwIdentifier
666f6f00 # dwText "foo\0"

The Associated Data List chunk (the chunk with ID "list") in this example has size 0x14 (little endian), meaning it holds 20 bytes of data. These 20 bytes include the list's type ID "adtl" and the nested sub-chunk (the chunk with ID "labl"): 4 bytes for the "adtl" string plus 16 bytes for the "labl" chunk.

When reading an Associated Data List's sub-chunks JHOVE eats bytes according to the value of the bytesLeft variable by calling the getNextChunkHeader() method. The bytesLeft variable is initialized with the Associated Data List's size. So in this example JHOVE tries to read 20 bytes worth of sub-chunks. But alas, JHOVE ignores the fact that the "adtl" type ID has a size of its own! It should not read 20 bytes of sub-chunks, but only 16 bytes, allowing for the 4 bytes that are consumed by the type ID. So JHOVE should subtract 4 bytes from the bytesLeft variable before it starts reading sub-chunks, just like it does in the ListInfoChunk class.

This pull request adjusts this behaviour.

adtl.zip

When reading an Associated Data List's sub-chunks JHOVE eats bytes
according to the value of the `bytesLeft` variable which is initialized
with the Associated Data List's size. Since the size encompasses not
only the sub-chunks but also the Associated Data List's 4 bytes sized
type ID field JHOVE should subtract 4 bytes from the `bytesLeft`
variable before it starts reading sub-chunks.
@codecov
Copy link

codecov bot commented Feb 7, 2018

Codecov Report

Merging #309 into integration will decrease coverage by <.01%.
The diff coverage is 0%.

Impacted file tree graph

@@                Coverage Diff                @@
##             integration     #309      +/-   ##
=================================================
- Coverage          43.47%   43.47%   -0.01%     
  Complexity          3308     3308              
=================================================
  Files                366      366              
  Lines              30122    30123       +1     
  Branches            5959     5959              
=================================================
  Hits               13096    13096              
- Misses             14629    14630       +1     
  Partials            2397     2397
Impacted Files Coverage Δ Complexity Δ
...ard/hul/ois/jhove/module/wave/SimpleTextChunk.java 0% <0%> (ø) 0 <0> (ø) ⬇️
.../hul/ois/jhove/module/wave/AssocDataListChunk.java 0% <0%> (ø) 0 <0> (ø) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4c2341a...9327e3b. Read the comment docs.

Those strings are NULL terminated, calling trim() removes the NULL byte.

It also removes all trailing whitespace, which may not be desired.
Something like `str.replaceFirst("\0$", "")` would be more precise.
Anyway, trim() is used in this commit to be consistent with the rest of
the WAVE module where trim() is used as well.
@marhop
Copy link
Member Author

marhop commented Feb 8, 2018

I added one tiny commit (11c9a0e) that treats NULL terminated strings like "foo\0" in the last line of the example above.

@david-russo david-russo added this to the Release v1.20 milestone Mar 9, 2018
@carlwilson carlwilson merged commit 959edd8 into openpreserve:integration Mar 28, 2018
@marhop marhop deleted the fix-wave-associated-data-list branch March 29, 2018 07:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants