Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WAVE: RF64, extended MIME type support, and bug fixes #308

Merged
merged 12 commits into from
Mar 29, 2018

Conversation

david-russo
Copy link
Member

@david-russo david-russo commented Feb 5, 2018

  • Added support for parsing and validating RF64 files
  • Made WAVE parser more resilient to unexpected chunk data
  • Improved reporting of WAVE codecs in WAVEFORMATEXTENSIBLE files
  • Avoids reporting file format and MIME type until signatures have been verified
  • Now reports extended MIME type information, e.g. audio/vnd.wave; codec=1, as per RFC 2361
  • Subformat GUID's are now reported in their standard format, e.g. 00000001-0000-0010-8000-00AA00389B71, instead of as an array of byte values
  • Added check to verify the existence of Data chunks
  • Added check to verify that Format chunks appear before Data chunks
  • Expanded WAVE example corpora to cover more formats and errors
  • Improved truncation detection and reporting
  • Fixed erroneous reporting of Cue Point values
  • Renamed "Cue" report property to "CuePoints"

Closes #184 and fixes #278.

The WAVE module can now recognize, read, and validate RF64 files up to
9.22 exabytes in size, support chunks with extended chunk sizes, and
files with large sample counts.
- The module can now recognize the subformat GUIDs for PCM and other
  WAVEFORMAT codecs, allowing them to be identified and interpreted
  correctly.
- Subformat GUIDs are now shown in their standard form instead of as
  arrays of bytes.
- Profile verification for BWF files has been simplified and some
  code comments clarified.
Setting the format and MIME type information has been moved to after
the format signature has been verified to prevent the module from
characterizing all files as WAVE files. The MIME type has also been
extended to include WAVE codec ID's per RFC 2361.
Fixes an issue where unexpected null data at the end of a chunk could
leave the parser at an invalid position to read any subsequent chunks.
@david-russo david-russo added the feature New functionality to be developed label Feb 5, 2018
@tledoux
Copy link
Contributor

tledoux commented Feb 6, 2018

Hi, is the requirement of Java 8 mandatory ?
Jhove is a fundation piece of code for preservation ; so asking for Java 8 forces everything on top of it to be in Java 8.

Maybe a requirement for java 7 would be enough. In any case, a quick survey is needed before going to java 8...

@david-russo
Copy link
Member Author

david-russo commented Feb 6, 2018

The requirement comes from the use of Java 8's new unsigned-operation methods, such as Long.compareUnsigned(). It could be removed if we re-implemented those functions locally, though I'd still encourage a move to Java 8: support from Oracle for 7 ended in 2015, and is ending in June this year for the OpenJDK. Java 8 has had almost 4 years worth of maintenance releases at this point, so I'd think it should be fairly well supported by now, but I certainly see the prudence in a survey given the infrastructure JHOVE might be embedded in, as you mentioned.

We now verify the amount of bytes skipped using InputStream's
`available` method to see how many bytes it's possible to skip
beforehand. To support these changes, the `available` method has
been implemented for JHOVE's InputStream classes where required,
namely RAFInputStream.
- Expanded WAVE example corpora to cover more tests
- New check to verify existence of Data chunks
- New check to verify Format chunks appear before Data chunks
- Improved truncation detection and reporting
ChunkStart values were being reported as BlockStart values,
BlockStart values were being reported as SampleOffset values,
and SampleOffset values were missing from the report entirely.
All values are now reported correctly.
The required Java 8 method `Long.compareUnsigned` has been copied into
the WAVE module where it was needed, with a caveat that it should be
replaced once Java 8 support has been approved.
@david-russo
Copy link
Member Author

david-russo commented Mar 14, 2018

The requirement for Java 8 has now been removed, reverting compatibility to Java 6.

@tledoux
Copy link
Contributor

tledoux commented Mar 14, 2018

Thanks. Very much appreciated

Copy link
Member

@carlwilson carlwilson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should chat briefly about JHOVE formatting.

}

private void init ()
private void init()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super pernickety and I have no good answer but the constructor above is corrected with the open bracket alongside the declaration (correct IMO) but for init just the spacing is corrected. Sorry take that back, mis-read the above and spent too much time recently working through JHOVE's inconsistent spacing / formatting issues.

("tcf:frames", Integer.toString (duration.getFrames ())));
Long.toString (duration.getSeconds ())));
_writer.println (margn2 + element ("tcf:frames",
Long.toString (duration.getFrames ())));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mixed spacing by the looks

Copy link
Member Author

@david-russo david-russo Mar 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, the *Handler files are a mess with white-space inconsistencies, even with tabs and spaces on the same line at times, or switching every handful of lines. I originally attempted to fix it, but there was just too much.

The half tabs and half spaces is actually consistent with the code's surrounding lines, which you can see by slowly highlighting line 4370 above.

_writer.println (margn3 + "NumberOfSamples: " +
Integer.toString (start.getSamples ()));
_writer.println (margn3 + "NumberOfSamples: " +
Long.toString (start.getSamples ()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mixed spacing on this and the next four changes from Integer to Long.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I probably should have matched the spacing with the line directly above instead of the line beneath, but this is one of those areas that is already in a sea of inconsistencies.

@@ -4378,7 +4378,7 @@ private void writeAESTimeRange (String baseIndent,
_writer.println (margn2 + elementStart ("tcf:samples",
sampleAttrs));
_writer.println (margn3 + element ("tcf:numberOfSamples",
Integer.toString (duration.getSamples ()) ));
Long.toString (duration.getSamples ()) ));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mixed spacing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A pre-existing issue.

mapLoc[0] = "LEFT";
mapLoc[1] = "RIGHT";
break;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mixed spacing.

Copy link
Member Author

@david-russo david-russo Mar 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks the same to me... Should we not indent switch cases?

}
for (int i = 0; i < numChannels; i++) {
mapLoc[i] = "UNKNOWN";
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mixed spacing.

@codecov
Copy link

codecov bot commented Mar 29, 2018

Codecov Report

Merging #308 into integration will increase coverage by 0.87%.
The diff coverage is 66.35%.

Impacted file tree graph

@@                Coverage Diff                @@
##             integration     #308      +/-   ##
=================================================
+ Coverage          43.57%   44.45%   +0.87%     
- Complexity          3333     3415      +82     
=================================================
  Files                367      368       +1     
  Lines              30193    30318     +125     
  Branches            5976     5994      +18     
=================================================
+ Hits               13158    13478     +320     
+ Misses             14631    14430     -201     
- Partials            2404     2410       +6
Impacted Files Coverage Δ Complexity Δ
...main/java/edu/harvard/hul/ois/jhove/JhoveBase.java 41.01% <ø> (ø) 42 <0> (ø) ⬇️
...rd/hul/ois/jhove/module/wave/MessageConstants.java 0% <ø> (ø) 0 <0> (ø) ⬇️
...edu/harvard/hul/ois/jhove/handler/TextHandler.java 0.54% <0%> (-0.01%) 2 <0> (ø)
...u/harvard/hul/ois/jhove/module/wave/FactChunk.java 0% <0%> (ø) 0 <0> (ø) ⬇️
...du/harvard/hul/ois/jhove/module/wave/CueChunk.java 0% <0%> (ø) 0 <0> (ø) ⬇️
.../edu/harvard/hul/ois/jhove/handler/XmlHandler.java 59.72% <100%> (+0.88%) 241 <0> (+5) ⬆️
...va/edu/harvard/hul/ois/jhove/AESAudioMetadata.java 86.48% <100%> (+9.72%) 36 <0> (+2) ⬆️
.../harvard/hul/ois/jhove/module/iff/ChunkHeader.java 88% <100%> (+31.47%) 7 <1> (+2) ⬆️
...u/harvard/hul/ois/jhove/module/wave/DataChunk.java 100% <100%> (+10%) 3 <0> (+1) ⬆️
...java/edu/harvard/hul/ois/jhove/RAFInputStream.java 47.61% <48.27%> (+5.45%) 15 <4> (+4) ⬆️
... and 13 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d383b37...e346e43. Read the comment docs.

@carlwilson carlwilson merged commit b46ffe7 into openpreserve:integration Mar 29, 2018
@david-russo david-russo deleted the rf64-support branch April 19, 2018 13:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New functionality to be developed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Invalid character in Chunk ID (WAVE Module) Support needed for RF64-formatted WAVs
3 participants