-
Notifications
You must be signed in to change notification settings - Fork 538
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
process hexadecimal strings containing line breaks, but strip line breaks first (fix #273) #346
Conversation
Same here, unfortunately. Can't open the diff view for |
Yeah, right, that was an error in reasoning on my side, to think that we would now be able to see a comparison. Of course the initial state of the file is still regarded as a binary file and therefore won't be regarded for diffing. But in the future, once this is pushed, the file will be treated as a text file again. Here's the diff for the change (notice the 235c235
< $hexa = '<?xml version="1.0"?><body xmlns="http://www.w3.org/1999/xhtml" xmlns:xfa="http://www.xfa.org/schema/xfa-data/1.0/" xfa:APIVersion="Acrobat:19.10.0" xfa:spec="2.0.2" style="font-size:12.0pt;text-align:left;color:?0000;font-weight:normal;font-style:norm\
---
> $hexa = '<?xml version="1.0"?><body xmlns="http://www.w3.org/1999/xhtml" xmlns:xfa="http://www.xfa.org/schema/xfa-data/1.0/" xfa:APIVersion="Acrobat:19.10.0" xfa:spec="2.0.2" style="font-size:12.0pt;text-align:left;color:0000;font-weight:normal;font-style:norm\ And a Text Compare Report created with Beyond Compare (GitHub doesn't allow the upload of the html unfortunately): I also noticed that the hex color code following the byte char was incomplete, it should be |
Using the 3. commit itself also reveals your changes (75e23db). Thank you for clearing this, will look into it later on. |
I just saw that in your PR #345 the same line in Parser.php was changed: https://github.com/smalot/pdfparser/pull/345/files#diff-6dd9476cf5bca6913f04bf742aacdce9 Does it mean that this PR was replaced? |
Oh dear, I might have pushed something to the wrong branch... let me check! |
Yup, that was exactly what I did... Sorry for the mess in a PR that was supposed to clean up the mess! ;-) Corrected, master merged and code linted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Will keep this open for a few days to allow others to comment.
…eaks first (fix smalot#273) (smalot#346) * process hexadecimal strings containing line breaks, but strip line breaks first (fix smalot#273) * remove binary symbold from test data string * code linting
This is the same as #344, with the mess cleaned up.
Strings decoded as hexadecimal may contain line break characters in the hexadecimal representation of the raw data. These were ignored and the hexadecimal representation was returned unprocessed, resulting in what can be seen in #273: hexadecimal strings in the otherwise decoded text content.
This fix takes line breaks in hexadecimal representation into account, stripping those line breaks before decoding.
Please note that the CI check will expectedly fail for test case
testGetDataTmIssue336
while PR #343 is pending.Edit by @k00ni: fixes #273