Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds EPUB validation extension module based on W3C's EPUBCheck #460

Merged
merged 20 commits into from
Dec 10, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
496cfd9
Add EPUB external module
karenhanson Jun 10, 2019
a39334c
Move constants for all Portico modules into PorticoConstants
karenhanson Jun 13, 2019
f5cd4da
EPUB checkSignature modified to return valid=UNDETERMINED
karenhanson Jul 3, 2019
f412dbd
Fix Codacy issues, per pull request report
karenhanson Jul 8, 2019
1d2d27b
Add empty corpora folder for EPUB-ptc
karenhanson Jul 8, 2019
94e7af6
Use latest EPUBCheck - 4.2.2 now available
karenhanson Jul 19, 2019
a8db674
Adjust thread stack size of JVM
karenhanson Jul 24, 2019
ee698e5
Use List not Set at top level of properties tree for consistency in p…
karenhanson Aug 20, 2019
809bb16
Add test to check UTF-8 encoding of EPUB title field
karenhanson Aug 23, 2019
607f268
TEST - E-PUB module testing
carlwilson Sep 24, 2019
fc266ae
FIX - Integration test fixes
carlwilson Sep 24, 2019
fde5ea7
FIX - add E-PUB memory param to Travis build.
carlwilson Sep 25, 2019
2cc25a1
DEBUG - added cat statement for Travis.
carlwilson Sep 25, 2019
cf85b91
FIX - send more stack memory.
carlwilson Sep 26, 2019
fbd6317
Remove separation of local and remote resource lists due to inconsist…
karenhanson Sep 30, 2019
93d74b1
Clean up references to local/remote resources
karenhanson Oct 1, 2019
a163653
Merge branch 'integration' into epub-ext
carlwilson Oct 21, 2019
b905e37
Merge branch 'integration' into epub-ext
carlwilson Oct 22, 2019
e435183
Merge branch 'integration' into epub-ext
carlwilson Dec 10, 2019
5199c32
Merge branch 'integration' into epub-ext
carlwilson Dec 10, 2019
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -363,6 +363,7 @@ The `jhove-ext-modules` contains JHOVE modules developed by external parties, sp
* PNG
* WARC
* GZIP
* EPUB

These are all packaged in a single modules JAR:

Expand Down
52 changes: 40 additions & 12 deletions jhove-bbt/scripts/create-1.23-target.sh
Original file line number Diff line number Diff line change
Expand Up @@ -47,30 +47,58 @@ showHelp() {
# Execution starts here
checkParams "$@";
if [[ -d "${targetRoot}" ]]; then
echo " - removing existing baseline at ${targetRoot}."
rm -rf "${targetRoot}"
fi

echo "Executing baseline update"
# Copying baseline for now we're not making any changes
echo "TEST BASELINE: Creating baseline"
# Simply copy baseline for now we're not making any changes
echo " - copying ${baselineRoot} baseline to ${targetRoot}"
cp -R "${baselineRoot}" "${targetRoot}"

###
# E-PUB Module Fixes
###
# These copies are all OK as they're new files and don't overwrite
if [[ -f "${candidateRoot}/errors/modules/audit-EPUB-ptc.jhove.xml" ]]; then
echo " - EPUB copying EPUB audit results"
cp "${candidateRoot}/errors/modules/audit-EPUB-ptc.jhove.xml" "${targetRoot}/errors/modules/audit-EPUB-ptc.jhove.xml"
fi
if [[ -d "${candidateRoot}/errors/modules/EPUB-ptc" ]]; then
echo " - EPUB copying error test reults"
cp -R "${candidateRoot}/errors/modules/EPUB-ptc" "${targetRoot}/errors/modules"
fi
if [[ -f "${candidateRoot}/examples/modules/audit-EPUB-ptc.jhove.xml" ]]; then
echo " - EPUB copying JHOVE audit results"
cp "${candidateRoot}/examples/modules/audit-EPUB-ptc.jhove.xml" "${targetRoot}/examples/modules/audit-EPUB-ptc.jhove.xml"
fi
if [[ -d "${candidateRoot}/examples/modules/EPUB-ptc" ]]; then
echo " - EPUB copying examples test reults"
cp -R "${candidateRoot}/examples/modules/EPUB-ptc" "${targetRoot}/examples/modules"
fi
# Replace the text for XML Parser, EPub won't build without that version and JHOVE seems fine about it.
echo " - EPUB replacing XML parser text"
sed -i 's%com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser%org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser%' "${targetRoot}/examples/modules/XML-hul/jhoveconf.xml.jhove.xml"
# Add line for EPUB module to JHOVE audit file
sed -i '14 a \ \ \ <module release="1.0">EPUB-ptc</module>' "${targetRoot}/audit.jhove.xml"

# Copy valid JP2K files across for new MIX metadata see https://github.com/openpreserve/jhove/pull/445
if [[ -d "${candidateRoot}/examples/modules/JPEG2000-hul" ]]; then
echo "Copying valid JPEG2000 examples."
echo " - Copying valid JPEG2000 examples."
cp -Rf "${candidateRoot}/examples/modules/JPEG2000-hul" "${targetRoot}/examples/modules/"
fi
if [[ -d "${candidateRoot}/errors/modules/JPEG2000-hul" ]]; then
echo "Copying JPEG2000 errors."
echo " - Copying JPEG2000 errors."
cp -Rf "${candidateRoot}/errors/modules/JPEG2000-hul" "${targetRoot}/errors/modules/"
fi

# Copy WAV files across for new MIX metadata see https://github.com/openpreserve/jhove/pull/445
if [[ -d "${candidateRoot}/examples/modules/WAVE-hul" ]]; then
echo "Copying valid WAVE examples."
echo " - Copying valid WAVE examples."
cp -Rf "${candidateRoot}/examples/modules/WAVE-hul" "${targetRoot}/examples/modules/"
fi
if [[ -d "${candidateRoot}/errors/modules/WAVE-hul" ]]; then
echo "Copying WAVE errors."
echo " - Copying WAVE errors."
cp -Rf "${candidateRoot}/errors/modules/WAVE-hul" "${targetRoot}/errors/modules/"
fi

Expand All @@ -86,27 +114,27 @@ fi

# Copy TIIF across for new Message IDs https://github.com/openpreserve/jhove/pull/510
if [[ -f "${candidateRoot}/examples/modules/TIFF-hul/chase-tif-f.tif.jhove.xml" ]]; then
echo "Copying affected TIFF examples."
echo " - Copying affected TIFF examples."
cp -Rf "${candidateRoot}/examples/modules/TIFF-hul/chase-tif-f.tif.jhove.xml" "${targetRoot}/examples/modules/TIFF-hul/"
fi
if [[ -f "${candidateRoot}/examples/modules/TIFF-hul/g3test.g3.jhove.xml" ]]; then
echo "Copying affected TIFF examples."
echo " - Copying affected TIFF examples."
cp -Rf "${candidateRoot}/examples/modules/TIFF-hul/g3test.g3.jhove.xml" "${targetRoot}/examples/modules/TIFF-hul/"
fi
if [[ -f "${candidateRoot}/examples/modules/TIFF-hul/smallliz.tif.jhove.xml" ]]; then
echo "Copying affected TIFF examples."
echo " - Copying affected TIFF examples."
cp -Rf "${candidateRoot}/examples/modules/TIFF-hul/smallliz.tif.jhove.xml" "${targetRoot}/examples/modules/TIFF-hul/"
fi
if [[ -f "${candidateRoot}/examples/modules/TIFF-hul/peppers.tif.jhove.xml" ]]; then
echo "Copying affected TIFF examples."
echo " - Copying affected TIFF examples."
cp -Rf "${candidateRoot}/examples/modules/TIFF-hul/peppers.tif.jhove.xml" "${targetRoot}/examples/modules/TIFF-hul/"
fi
if [[ -f "${candidateRoot}/examples/modules/TIFF-hul/zackthecat.tif.jhove.xml" ]]; then
echo "Copying affected TIFF examples."
echo " - Copying affected TIFF examples."
cp -Rf "${candidateRoot}/examples/modules/TIFF-hul/zackthecat.tif.jhove.xml" "${targetRoot}/examples/modules/TIFF-hul/"
fi
if [[ -f "${candidateRoot}/examples/modules/TIFF-hul/fax2d.g3.jhove.xml" ]]; then
echo "Copying affected TIFF examples."
echo " - Copying affected TIFF examples."
cp -Rf "${candidateRoot}/examples/modules/TIFF-hul/fax2d.g3.jhove.xml" "${targetRoot}/examples/modules/TIFF-hul/"
fi
if [[ -f "${candidateRoot}/examples/modules/TIFF-hul/quad-tile.tif.jhove.xml" ]]; then
Expand Down
2 changes: 1 addition & 1 deletion jhove-bbt/scripts/exec-with-to.sh
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ fi

# kill -0 pid Exit code indicates if a signal may be sent to $pid process.
(
((t = timeout))
((t = $timeout))

while ((t > 0)); do
sleep $interval
Expand Down
10 changes: 6 additions & 4 deletions jhove-bbt/scripts/process-modules.sh
Original file line number Diff line number Diff line change
Expand Up @@ -88,15 +88,17 @@ showHelp() {

# Cycle through the test module directories and invoke the correct JHOVE module
getCorpusModules() {
DIRS=$(ls -l "$paramModuleLoc" | grep -E '^d' | awk '{print $9}')
for DIR in $DIRS ; do
moduleName=${DIR}
for DIR in "$paramModuleLoc"/*/
do
# https://stackoverflow.com/questions/1371261/get-current-directory-name-without-full-path-in-a-bash-script
moduleName="${DIR%"${DIR##*[!/]}"}" # extglob-free multi-trailing-/ trim
moduleName="${moduleName##*/}" # remove everything before the last /
if [[ ! -e "$paramOutputRootDir/audit-$moduleName.jhove.xml" ]]
then
bash "$SCRIPT_DIR/exec-with-to.sh" -t 10 "$paramJhoveLoc/jhove" -m "${moduleName}" -h xml -o "$paramOutputRootDir/audit-$moduleName.jhove.xml"
fi
processModuleDir "$paramModuleLoc/$moduleName"
done
done
}

processModuleDir() {
Expand Down
2 changes: 2 additions & 0 deletions jhove-bbt/scripts/travis-test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,7 @@ if [[ -d "${tempInstallLoc}" ]]; then
fi

# Create the test target root if it doesn't exist
[[ -d "${TARGET_ROOT}" ]] && rm -rf "${TARGET_ROOT:?}/"*
[[ -d "${TARGET_ROOT}" ]] || mkdir -p "${TARGET_ROOT}"

# Grab the Major and Minor versions from the full Maven project version string
Expand All @@ -70,6 +71,7 @@ installJhoveFromFile "${JHOVE_INSTALLER}" "${tempInstallLoc}"

[[ -d "${CANDIDATE_ROOT}/${MAJOR_MINOR_VER}" ]] || mkdir -p "${CANDIDATE_ROOT}/${MAJOR_MINOR_VER}"

cat "${tempInstallLoc}/jhove"
echo ""
echo "Testing ${MAJOR_MINOR_VER}."
echo "=========================="
Expand Down
15 changes: 14 additions & 1 deletion jhove-ext-modules/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@

<properties>
<jwat.version>1.0.3</jwat.version>
<epubcheck.version>4.2.2</epubcheck.version>
</properties>

<build>
Expand Down Expand Up @@ -41,9 +42,15 @@
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<configuration>
<argLine>-Xss1024k</argLine>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is the additional argument for 32bit JVMs. I'm a little curious as to whether this would affect Travis builds as it appears that both of the JVMs on Travis are 64bit. Am also curious if the failure is determinate, i.e. it fails consistently without this arg.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The behavior was quite strange. It seemed that one Travis build or the other would fail but never both or none - I'm not sure if that was just a coincidence. They seemed consistent in their failure or success when I repeated the same Travis build. Running them on a small test Linux instance I had they were much less consistent - if I ran them a few times, eventually they would build. I wonder if using a 32-bit JVM simply increases the chance of failure, but the failure is not impossible in the 64-bit environment for some reason.

</configuration>
</plugin>
</plugins>
</build>

<dependencies>
<dependency>
<groupId>org.openpreservation.jhove</groupId>
Expand Down Expand Up @@ -75,5 +82,11 @@
<artifactId>jwat-warc</artifactId>
<version>${jwat.version}</version>
</dependency>
<!-- EPUBCheck for EPUB -->
<dependency>
<groupId>org.w3c</groupId>
<artifactId>epubcheck</artifactId>
<version>${epubcheck.version}</version>
</dependency>
</dependencies>
</project>
Loading