-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Addressing issue #93: CrossRefStream incorrectly assumes /Index value is a 2 element array #97
Addressing issue #93: CrossRefStream incorrectly assumes /Index value is a 2 element array #97
Conversation
array containing an arbitrary set of starting object number and object count pairs instead of a single pair. Calculate number of objects using final start/count pair. (Actual number will be less than or equal to this value.) Modified reading loop to read only objects in the index ranges. Modified PdfModule to test object number against stream's index instead of a range from 0 to the last object number. Attempt to conform to PDF specification http://wwwimages.adobe.com/content/dam/Adobe/en/devnet/pdf/pdfs/pdf_reference_archives/PDFReference15_v6.pdf, page 83 and PDF versions 1.6 and 1.7.
Current coverage is 3.43% (diff: 0.00%)
@@ integration #97 diff @@
============================================
Files 411 411
Lines 34669 34690 +21
Methods 0 0
Messages 0 0
Branches 7105 7110 +5
============================================
- Hits 8231 1192 -7039
- Misses 26438 33405 +6967
- Partials 0 93 +93
|
Sheila +1 |
+1 |
1 similar comment
+1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the patch Paul, and good catch on the discrepancy with the spec. The code is largely fine and only requires some minor clean-up.
private PdfStream _xstrm; // The underlying Stream object. | ||
private PdfDictionary _dict; | ||
private int _size; | ||
private int[] _index; | ||
private int _index_size; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_index_size
appears to be unnecessary since the same value can be retrieved from _index.length
.
/** Range elements of the _index array: | ||
Starting object and number of objects. | ||
*/ | ||
private class index_range { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, all class names should be camel-case with an initial capital letter, i.e. IndexRange
.
@@ -262,8 +294,16 @@ public int getFreeCount () | |||
/** Returns the total object count. */ | |||
public int getNumObjects () | |||
{ | |||
return _index[0] + _index[1]; | |||
return _index[_index_size - 1].start + _index[_index_size - 1].len; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like the original method, this seems like a convoluted way to calculate the value already stored in _size
. Is there a reason we can't simply return that?
private PdfStream _xstrm; // The underlying Stream object. | ||
private PdfDictionary _dict; | ||
private int _size; | ||
private int[] _index; | ||
private int _index_size; | ||
private index_range[] _index; | ||
private int[] _fieldSizes; | ||
private int _freeCount; | ||
private Filter[] _filters; | ||
private int _entriesRead; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the addition of _read_index
below, _entriesRead
is no longer in use and can be safely removed.
private int[] _fieldSizes; | ||
private int _freeCount; | ||
private Filter[] _filters; | ||
private int _entriesRead; | ||
private int _read_range; | ||
private int _read_index; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try and match any surrounding whitespace conventions. In this case 4 spaces instead of a tab. JHOVE's source code is currently a mess with indentation, but we should avoid adding to the confusion.
Also, the JHOVE (and Java) style convention for common variable names is to have them camel-cased with lowercase initial letters, e.g. _readRange
, such as in the surrounding code.
} | ||
|
||
public boolean isValidObject(int objNum) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add JavaDocs for any new methods.
Additionally reverted the `sObjNum` tests in PdfModule, as the replacement tested for something slightly different: instead of testing that `sObjNum` fell between 0 and the cross-reference table's highest object number, the new test only checked for the object number's existence within the current cross-reference stream.
Additionally reverted the `sObjNum` tests in PdfModule, as the replacement tested for something slightly different: instead of testing that `sObjNum` fell between 0 and the cross-reference table's highest object number, the new test only checked for the object number's existence within the current cross-reference stream.
Closed by #188 |
array containing an arbitrary set of starting object number and object
count pairs instead of a single pair.
Calculate number of objects using final start/count pair. (Actual number
will be less than or equal to this value.)
Modified reading loop to read only objects in the index ranges.
Modified PdfModule to test object number against stream's index instead of
a range from 0 to the last object number.
Attempt to conform to PDF specification
http://wwwimages.adobe.com/content/dam/Adobe/en/devnet/pdf/pdfs/pdf_reference_archives/PDFReference15_v6.pdf, page 83 and PDF versions 1.6 and 1.7.