process hexadecimal strings containing line breaks, but strip line breaks first (fix #273) #344

Connum · 2020-09-25T20:17:32Z

Strings decoded as hexadecimal may contain line break characters in the hexadecimal representation of the raw data. These were ignored and the hexadecimal representation was returned unprocessed, resulting in what can be seen in #273: hexadecimal strings in the otherwise decoded text content.

This fix takes line breaks in hexadecimal representation into account, stripping those line breaks before decoding.

Please note that the CI check will expectedly fail for test case testGetDataTmIssue336 while PR #343 is pending.

Edit by @k00ni: fixes #273

…eaks first (fix smalot#273)

k00ni

Its weird, I can't see the file difference of tests/Integration/FontTest.php (reference, its saying its a binary file).

@Connum can you please add a link showing the code you changed? (Or paste it here)

Connum · 2020-09-28T07:04:23Z

That's strange indeed... I had a similar issue in my GUI (SourceTree) recently, I think it might have to do with character encoding... I'll look into it!

Connum · 2020-09-28T07:53:15Z

So this is the code I added at the end of testDecodeHexadecimal():

// hexadecimal string with a line break should not return the input string
// addressing issue #273: https://github.com/smalot/pdfparser/issues/273
$hexa = "<0027004c0056005300520051004c0045004c004f004c005d0044006f006d0052001d000300560048005b00570044001000490048004c00550044000f0003001400170003004700480003004900480059004800550048004c00550052000300470048000300\n15001300150013>";
$this->assertEquals("\x0\x27\x0\x4c\x0\x56\x0\x53\x0\x52\x0\x51\x0\x4c\x0\x45\x0\x4c\x0\x4f\x0\x4c\x0\x5d\x0\x44\x0\x6f\x0\x6d\x0\x52\x0\x1d\x0\x3\x0\x56\x0\x48\x0\x5b\x0\x57\x0\x44\x0\x10\x0\x49\x0\x48\x0\x4c\x0\x55\x0\x44\x0\xf\x0\x3\x0\x14\x0\x17\x0\x3\x0\x47\x0\x48\x0\x3\x0\x49\x0\x48\x0\x59\x0\x48\x0\x55\x0\x48\x0\x4c\x0\x55\x0\x52\x0\x3\x0\x47\x0\x48\x0\x3\x0\x15\x0\x13\x0\x15\x0\x13", Font::decodeHexadecimal($hexa));

Maybe the mass of encoded chars leads git to believe that this is a binary file? It's very strange... I'm looking to find a way to get around this...

Connum · 2020-09-28T08:19:40Z

Very interesting... I tried different things, saving the file with another editor (SublimeText instead of VSCode), and running the git tool dos2unix available under Windows, which gave me:

dos2unix: Binary symbol 0x00 found at line 235
dos2unix: Skipping binary file FontTest.php

Looking at that line in SublimeEdit, which has a nicer representation for binary symbols, I could see that there's indeed a binary symbol in a test string, that shouldn't be there:

So it turns out that it doesn't have anything to do with the code I added, but I have no idea why it didn't cause issues earlier... I removed that character, as it has nothing to do with the XML test (it would actually render the XML, or at least the CSS inline style invalid!) and must have gotten there accidentally. The test case runs still fine, as expected.

process hexadecimal strings containing line breaks, but strip line br…

4455b0d

…eaks first (fix smalot#273)

Connum mentioned this pull request Sep 25, 2020

Hex (?) output #121

Open

k00ni requested changes Sep 28, 2020

View reviewed changes

k00ni added the fix label Sep 28, 2020

remove binary symbold from test data string

20c9f92

Connum closed this Sep 28, 2020

Connum deleted the fix-273 branch September 28, 2020 08:31

Connum mentioned this pull request Sep 28, 2020

process hexadecimal strings containing line breaks, but strip line breaks first (fix #273) #346

Merged

k00ni self-assigned this Sep 28, 2020

k00ni added invalid and removed fix labels Sep 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

process hexadecimal strings containing line breaks, but strip line breaks first (fix #273) #344

process hexadecimal strings containing line breaks, but strip line breaks first (fix #273) #344

Connum commented Sep 25, 2020 •

edited by k00ni

Loading

k00ni left a comment •

edited

Loading

Connum commented Sep 28, 2020

Connum commented Sep 28, 2020

Connum commented Sep 28, 2020 •

edited

Loading

process hexadecimal strings containing line breaks, but strip line breaks first (fix #273) #344

process hexadecimal strings containing line breaks, but strip line breaks first (fix #273) #344

Conversation

Connum commented Sep 25, 2020 • edited by k00ni Loading

k00ni left a comment • edited Loading

Choose a reason for hiding this comment

Connum commented Sep 28, 2020

Connum commented Sep 28, 2020

Connum commented Sep 28, 2020 • edited Loading

Connum commented Sep 25, 2020 •

edited by k00ni

Loading

k00ni left a comment •

edited

Loading

Connum commented Sep 28, 2020 •

edited

Loading