Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JHOVE Incorrectly reading beyond RIFF 'data' Chunk ID and calling it invalid... #9

Closed
ross-spencer opened this issue Aug 21, 2014 · 4 comments

Comments

@ross-spencer
Copy link

I have received a 2GB wav file that I'm having difficulty validating in JHOVE. The tool tells me that I have an invalid character within a CHUNK ID.

Analyzing the file, however, and it seems that JHOVE is reading beyond the CHUNK ID and returning an invalid result.

52 49 46 46 
Chunk ID: 'RIFF'

F8 DE A0 84 
Chunk Size: ~2GB

57 41 56 45 
Format: 'WAVE'

66 6D 74 20 
Sub Chunk 1 ID: 'fmt'

10 00 00 00 
Sub Chunk 1 Size: 16

01 00 
Audio Format: WAVE_FORMAT_PCM

01 00 
Number of Channels: 1

00 77 01 00 
Sample Rate: 96000

00 65 04 00 
Byte Rate: 288,000

03 00 
Block Align: 3

18 00 
Bits per sample: 24-bits

64 61 74 61 
Sub Chunk 2 ID: 'data'

80 C6 A0 84 
Sub Chunk 2 Size: ~1.6GB

A7 *05 00* 70 04 00 6E F6 FF E9 FC FF F7 F4 FF B5 24 00
... data / payload ...

The error message seems to be returned from this part of the code:

https://github.com/gmcgath/jhove/blob/0dc774d98efa8c7581fe1602c3f6e713f499201d/src/main/java/edu/harvard/hul/ois/jhove/module/iff/ChunkHeader.java#L53

The byte causing the first issue is 0x05 at offset 46, I've starred offset 46 and 47. See also the screenshot.

The screenshot has been generated by looking at the following snippet from the 2GB file:

52 49 46 46 F8 DE A0 84 57 41 56 45 66 6D 74 20 10 00 00 00 01 00 01 00 00
77 01 00 00 65 04 00 03 00 18 00 64 61 74 61 80 C6 A0 84 A7 05 00 70 04 00 
6E F6 FF E9 FC FF F7 F4 FF B5 24 00 F2 FC FF 88 FC FF 2C E8 FF 1B 08 00 74 
03 00 26 EE FF 20 F6 FF 86 F6 FF 33 01 00 5F F3 FF C0 FC FF 47

The analysis shows, that 0x05 is no longer in the CHUNK ID, nor is the preceding byte 0x00, which will also show up in error if one artificially turns 0x05 into a byte greater than 0x32.

Screenshot:

invalid_character

JHOVE Version: 1.11
Java: 1.7
Platform: Windows XP SP3
Creating Application (WAV): Adobe Audition CS6 (Macintosh)

@ross-spencer ross-spencer changed the title JHOVE Incorrectly reading beyond RIFF Chunk ID and calling it invalid... JHOVE Incorrectly reading beyond RIFF 'data' Chunk ID and calling it invalid... Aug 21, 2014
anjackson added a commit to anjackson/jhove that referenced this issue Aug 21, 2014
@anjackson
Copy link
Member

Okay, so the problem seems to be that JHOVE assumes that an 'int' is a big enough variable to offsets etc. Your data chunk is so long it won't fit in an signed Java int, and so 'skipBytes' fails during an attempt to pass over the data chunk. Attempting to set up a pull-request including this test case.

@ross-spencer
Copy link
Author

Hi Andy,

Just to clarify JHOVE isn't storing offset values correctly, because we're stuffing 2^32 bit (unsigned) integers into signed 2^31 integers...

That is, both the offsets/chunk sizes we see in this breakdown extend beyond 2^31...

2^31 = 2,147,483,648 (possible positive integers can be stored/represented)

  • 0x84A0DEF8 = 2,225,135,352
  • 0x84A0C680 = 2,225,129,088

Thus causing JHOVE to get confused...

Just trying to get a good simple description for the folks digitizing here (and myself too!)

Great spot. Thank you for your help!

I hope Gary can provide some feedback on the proposed fixes/timescales etc.

Cheers,

Ross

@anjackson
Copy link
Member

The reality is somewhat messier, but yes, that's essentially it.

@carlwilson
Copy link
Member

Closed by #37

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants