-
Notifications
You must be signed in to change notification settings - Fork 538
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
process hexadecimal strings containing line breaks, but strip line breaks first (fix #273) #344
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's strange indeed... I had a similar issue in my GUI (SourceTree) recently, I think it might have to do with character encoding... I'll look into it! |
So this is the code I added at the end of // hexadecimal string with a line break should not return the input string
// addressing issue #273: https://github.com/smalot/pdfparser/issues/273
$hexa = "<0027004c0056005300520051004c0045004c004f004c005d0044006f006d0052001d000300560048005b00570044001000490048004c00550044000f0003001400170003004700480003004900480059004800550048004c00550052000300470048000300\n15001300150013>";
$this->assertEquals("\x0\x27\x0\x4c\x0\x56\x0\x53\x0\x52\x0\x51\x0\x4c\x0\x45\x0\x4c\x0\x4f\x0\x4c\x0\x5d\x0\x44\x0\x6f\x0\x6d\x0\x52\x0\x1d\x0\x3\x0\x56\x0\x48\x0\x5b\x0\x57\x0\x44\x0\x10\x0\x49\x0\x48\x0\x4c\x0\x55\x0\x44\x0\xf\x0\x3\x0\x14\x0\x17\x0\x3\x0\x47\x0\x48\x0\x3\x0\x49\x0\x48\x0\x59\x0\x48\x0\x55\x0\x48\x0\x4c\x0\x55\x0\x52\x0\x3\x0\x47\x0\x48\x0\x3\x0\x15\x0\x13\x0\x15\x0\x13", Font::decodeHexadecimal($hexa)); Maybe the mass of encoded chars leads git to believe that this is a binary file? It's very strange... I'm looking to find a way to get around this... |
Strings decoded as hexadecimal may contain line break characters in the hexadecimal representation of the raw data. These were ignored and the hexadecimal representation was returned unprocessed, resulting in what can be seen in #273: hexadecimal strings in the otherwise decoded text content.
This fix takes line breaks in hexadecimal representation into account, stripping those line breaks before decoding.
Please note that the CI check will expectedly fail for test case
testGetDataTmIssue336
while PR #343 is pending.Edit by @k00ni: fixes #273