-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX - PDF minor version number checking. #317
Merged
Merged
Changes from all commits
Commits
Show all changes
10 commits
Select commit
Hold shift + click to select a range
4540b4a
Separate PDF Header parsing from Module.
carlwilson 566005f
Dedicated PDF Header parsing class.
carlwilson f362c66
PdfModule now uses dedicated Header parser.
carlwilson df9880c
Merge remote-tracking branch 'origin/integration' into fix/pdf-header
carlwilson 1b2e926
FIX - PDF minor version number checking.
carlwilson 4e5533b
FIX - Unnecessary if statement.
carlwilson 1170a4d
Merge branch 'integration' into fix/pdf-header
carlwilson c0f5b15
FIX - Review Comments
carlwilson 9020fbb
Merge branch 'integration' into fix/pdf-header
carlwilson 4e1a9fe
Merge branch 'integration' into fix/pdf-header
carlwilson File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
181 changes: 181 additions & 0 deletions
181
jhove-modules/src/main/java/edu/harvard/hul/ois/jhove/module/pdf/PdfHeader.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,181 @@ | ||
package edu.harvard.hul.ois.jhove.module.pdf; | ||
|
||
import java.io.IOException; | ||
|
||
/** | ||
* Simple class that is the a prototype of a proper header parser class. The aim | ||
* was to introduce a simple version check for the PDF/A minor version number, | ||
* see {@link PdfHeader#isVersionValid()}, while not changing anything else | ||
* through over ambition. | ||
* | ||
* @author <a href="mailto:carl@openpreservation.org">Carl Wilson</a> | ||
* <a href="https://github.com/carlwilson">carlwilson AT github</a> | ||
* @version 0.1 Created 8 Mar 2018:00:46:39 | ||
*/ | ||
|
||
public final class PdfHeader { | ||
public static final String PDF_VER1_HEADER_PREFIX = "PDF-1."; //$NON-NLS-1$ | ||
public static final String PDF_SIG_HEADER = "%" + PDF_VER1_HEADER_PREFIX; //$NON-NLS-1$ | ||
public static final String POSTSCRIPT_HEADER_PREFIX = "!PS-Adobe-"; //$NON-NLS-1$ | ||
public static final int MAX_VALID_MAJOR_VERSION = 7; | ||
|
||
private final String versionString; | ||
private final boolean isPdfACompilant; | ||
|
||
/** | ||
* | ||
*/ | ||
private PdfHeader(final String versionString, | ||
final boolean isPdfaCompliant) { | ||
this.versionString = versionString; | ||
this.isPdfACompilant = isPdfaCompliant; | ||
} | ||
|
||
/** | ||
* @return the version string parsed from the PDF Header | ||
*/ | ||
public String getVersionString() { | ||
return this.versionString; | ||
} | ||
|
||
/** | ||
* @return true if the header is considered PDF/A compliant, otherwise false | ||
*/ | ||
public boolean isPdfACompliant() { | ||
return this.isPdfACompilant; | ||
} | ||
|
||
/** | ||
* Performs a very simple version number validity check. Given version | ||
* number is a String of form 1.x, x is the minor version number. This | ||
* method parses the minor version number from the version String and tests | ||
* whether it is less than or equal to | ||
* {@link PdfHeader#MAX_VALID_MAJOR_VERSION}. | ||
* | ||
* @return true if an integer minor version number can be parsed from the | ||
* version string AND it is less than or equal to | ||
* {@link PdfHeader#MAX_VALID_MAJOR_VERSION}. Otherwise false. | ||
*/ | ||
public boolean isVersionValid() { | ||
// Set minor version to one larger than maximum so invalid if parse | ||
// fails | ||
int minorVersion = MAX_VALID_MAJOR_VERSION + 1; | ||
try { | ||
minorVersion = getMinorVersion(this.versionString); | ||
} catch (NumberFormatException nfe) { | ||
// TODO : Do nothing for now, the version number is still invalid. | ||
} | ||
return minorVersion <= MAX_VALID_MAJOR_VERSION; | ||
} | ||
|
||
/** | ||
* Creates a new {@link PdfHeader} instance using the passed parameters. | ||
* | ||
* @param versionString | ||
* the version number from the PDF Header, should be of form | ||
* <code>1.x</code> where x should be of the range 0-7. | ||
* @param isPdfaCompliant | ||
* boolean flag indicating if the PDF/A is compliant or non | ||
* compliant with JHOVE's PDF/A profile. | ||
* @return a {@link PdfHeader} instance initialised using | ||
* <code>versionString</code> and <code>isPdfaCompliant</code>. | ||
* @throws NullPointerException | ||
* when parameter <code>versionString</code> is null. | ||
*/ | ||
static PdfHeader fromValues(final String versionString, | ||
final boolean isPdfaCompliant) { | ||
if (versionString == null) | ||
throw new NullPointerException( | ||
"Parameter versionString can not be null."); | ||
return new PdfHeader(versionString, isPdfaCompliant); | ||
} | ||
|
||
/** | ||
* Factory method for {@link PdfHeader} that parses a new instance using the | ||
* supplied {@link Parser} instance. | ||
* | ||
* @param _parser | ||
* the {@link Parser} instance that will be used to parse header | ||
* details | ||
* @return a new {@link PdfHeader} instance derived using the supplied | ||
* {@link Parser} or <code>null</code> when no header could be found | ||
* and parsed. | ||
*/ | ||
public static PdfHeader parseHeader(final Parser parser) { | ||
Token token = null; | ||
String value = null; | ||
boolean isPdfACompliant = false; | ||
String version = null; | ||
|
||
/* Parse file header. */ | ||
for (;;) { | ||
if (parser.getOffset() > 1024) { | ||
return null; | ||
} | ||
try { | ||
token = null; | ||
token = parser.getNext(1024L); | ||
} catch (IOException ee) { | ||
return null; | ||
} catch (Exception e) { | ||
// fall through | ||
} | ||
|
||
if (token == null) { | ||
return null; | ||
} | ||
if (token instanceof Comment) { | ||
value = ((Comment) token).getValue(); | ||
if (value.indexOf(PDF_VER1_HEADER_PREFIX) == 0) { | ||
version = value.substring(4, 7); | ||
isPdfACompliant = true; | ||
break; | ||
} | ||
// The implementation notes (though not the spec) | ||
// allow an alternative signature of %!PS-Adobe-N.n PDF-M.m | ||
if (value.indexOf(POSTSCRIPT_HEADER_PREFIX) == 0) { | ||
// But be careful: that much by itself is the standard | ||
// PostScript signature. | ||
int n = value.indexOf(PDF_VER1_HEADER_PREFIX); | ||
if (n >= 11) { | ||
version = value.substring(n + 4); | ||
break; | ||
} | ||
} | ||
} | ||
} | ||
|
||
if (version == null) { | ||
return null; | ||
} | ||
|
||
try { | ||
isPdfACompliant = isTokenPdfACompliant(parser.getNext()); | ||
} catch (Exception excep) { | ||
// Most likely a ClassCastException on a non-comment | ||
isPdfACompliant = false; | ||
} | ||
// Check for PDF/A conformance. The next item must be | ||
// a comment with four characters, each greater than 127 | ||
return new PdfHeader(version, isPdfACompliant); | ||
} | ||
|
||
private static int getMinorVersion(final String version) { | ||
double doubleVer = Double.parseDouble(version); | ||
double fractPart = doubleVer % 1; | ||
int minor = (int) (10L * fractPart); | ||
return minor; | ||
} | ||
|
||
private static boolean isTokenPdfACompliant(final Token token) { | ||
String cmt = ((Comment) token).getValue(); | ||
char[] cmtArray = cmt.toCharArray(); | ||
int ctlcnt = 0; | ||
for (int i = 0; i < 4; i++) { | ||
if (cmtArray[i] > 127) { | ||
ctlcnt++; | ||
} | ||
} | ||
return (ctlcnt > 3); | ||
} | ||
} |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: Nothing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The aim is to not break existing behaviour. This ensures that this case is handled as before as the minor version number is set too large and false is returned. I've left the TODO as I'm intending to polish this a little more, but it needs a new error messages and is outside of the PR scope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if in the long term view it is as easy as thinking about the minor version not being set too large, especially with PDF 2.0 now available. Couldn't it be instead checking the versionString against a list of currently available versions? So, as of today: 1.0, 1.1, 1.2, 1.3, ,1.4, 1.5., 1.6, 1.7, 2.0 ?
Downside is that the list of available versions would have to be updated everytime a new version becomes available - but if the code instead states that the max correct minor version is 7, which is true for PDF 1.x, it would have to still be updated regarding the limitation of minor version for 2.x as there currently only is 2.0 (and this would again have to be updated once 2.1 becomes available).
Does that make any sense?