-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds EPUB validation extension module based on W3C's EPUBCheck #460
Conversation
Added EPUB module to jhove-ext-modules. This uses the EPUBcheck library, with a new MasterReport implementation for EPUBcheck that focuses on collecting and exploring relevant data for JHOVE RepInfo data. Tests included.
Signature check defaults to valid=true. The signature check doesn't do anything to check validity of a file. This overrides the `checkSignature()` method to make `valid=UNDETERMINED` by default in all cases.
2690749
to
10f95c7
Compare
4c88d58
to
dd9fd54
Compare
<groupId>org.apache.maven.plugins</groupId> | ||
<artifactId>maven-surefire-plugin</artifactId> | ||
<configuration> | ||
<argLine>-Xss1024k</argLine> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is the additional argument for 32bit JVMs. I'm a little curious as to whether this would affect Travis builds as it appears that both of the JVMs on Travis are 64bit. Am also curious if the failure is determinate, i.e. it fails consistently without this arg.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The behavior was quite strange. It seemed that one Travis build or the other would fail but never both or none - I'm not sure if that was just a coincidence. They seemed consistent in their failure or success when I repeated the same Travis build. Running them on a small test Linux instance I had they were much less consistent - if I ran them a few times, eventually they would build. I wonder if using a 32-bit JVM simply increases the chance of failure, but the failure is not impossible in the 64-bit environment for some reason.
|
||
} | ||
|
||
/** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These generate property methods appear to be useful, but they're also quite generic and might be useful to other module devs. While I don't propose moving them yet it looks like a useful maintenance / refactoring task for a later date.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seems to be an issue with the ordering of elements within the XML report. I'm not sure how serious it is and whether the test framework needs fixing or the module. In essence the XML reports are identical but the elements are in a different order. My guess is this has it's roots in the "multi-threadedness" of the underlying EPUB library.
I dug in to the property ordering. I think it might be because |
- E-PUB module test baseline update added to `jhove-bbt/scripts/create-1.23-target.sh`; - simplified test process and added more explicit logging; - fixed small `shellcheck` lint warnings, particularly error prone directory cycling; - added comments to disable `shellcheck` inclusion warning; and - removed `bin/.gitignore` generated by Eclipse.
- replaced `ArrayList` with `TreeSet` for elements testing suggested order mattered; - helper method to avoid null Property values been added to sets; - dedicated `Comparator` implementations for `Message` and `Property`; and - added example and error files for E-PUB module testing.
Codecov Report
@@ Coverage Diff @@
## integration #460 +/- ##
=================================================
+ Coverage 49.15% 49.75% +0.59%
+ Complexity 983 982 -1
=================================================
Files 56 55 -1
Lines 7895 7670 -225
Branches 1432 1392 -40
=================================================
- Hits 3881 3816 -65
+ Misses 3542 3402 -140
+ Partials 472 452 -20
Continue to review full report at Codecov.
|
…encies Inconsistencies were found in how these lists were formed and this opened the door to confusing reports. Removing the distinction between local and remote to match original EPUBCheck module.
FYI @karenhanson am aware of the conflict and how to solve it. Am just waiting on merging this as the conflicts can only be in the test scripts and as this branch / PR is a new module it's easier to untangle conflicts here. It'll get merged this week. |
This change is to add the
EPUB-ptc
extension module version 1.0. The module wraps the W3C EPUBCheck validation tool as a JHOVE module.Some notes in advance of full documentation:
checkSignatures()
method is based on the Library of Congress EPUB2and EPUB3 documentation. For thisvalid
is always set toUNDETERMINED
. This check only looks at "magic numbers" and file extension. If the signature matches, it sets "well formed" to true. Note that it does not perform the additional package checks that contribute to "well formed"-ness inparse()
so there is a small inconsistency there.(severity="FATAL") or (severity="ERROR" and message-id begins with "PKG-")
. In other words, fatal errors and package errors make the file "Not well formed".severity="ERROR"
.I look forward to your comments, and especially welcome any feedback on the rationales described above!