revert 4f4fd10 preserving fix for #260, fixing #319, #322 and #334 #342

Connum · 2020-09-22T21:11:27Z

This reverts the change from 4f4fd10 that caused at least issues #319, #322 and #334 (maybe even more...).
With a tiny change to the original code, it was confirmed in #260 that it still fixes that issue.

I also wrote a test for #319 locally, but wasn't sure whether we could use the sample PDF or not:

/**
 * Test that issue related pdf can now be parsed
 *
 * @see https://github.com/smalot/pdfparser/issues/319
 */
public function testIssue319()
{
    $filename = $this->rootDir.'/samples/bugs/Issue319.pdf';

    $document = $this->fixture->parseFile($filename);

    $this->assertStringContainsString('HS technology s.r.o., IČO: 52081061, Sídlo: Fraňa Mojtu 22, 94901 Nitra,\n
    Slovenská republika\n
    \n
    \n
    99,10', $document->getText());
}

This reverts commit 4f4fd10.

…ot#322 and smalot#334

k00ni

Thank you for your contribution! Your patch seems reasonable.

I also wrote a test for #319 locally, but wasn't sure whether we could use the sample PDF or not:

~~What does that mean? Is it available now?~~
~~I saw that you asked around in the issues you referenced. Can you give me a short summary about what you found out? That would be helpful.~~
~~Please address my in-code comments.~~

EDIT: Solved in next review.

tests/Integration/ParserTest.php

k00ni · 2020-09-23T13:36:50Z

@romanmatyus (#319), @Karmel0x (#322) and @CharmaineLohSiYing (#334): It would be great if you could give us a short feedback if this PR fixes your problems. Thank you in advance!

Connum · 2020-09-23T13:50:49Z

I also wrote a test for #319 locally, but wasn't sure whether we could use the sample PDF or not:
* [X]  What does that mean? Is it available now?

I can't include that test without a sample file to test against. It means I was able to verify that my changes fix the parsing of the file provided in #319, but we can't simply use that file without consent. I can try to reduce the file, but I fear changing and re-saving it could result in the original issue not occuring in the first place...

* [X]  I saw that you asked around in the issues you referenced. Can you give me a short summary about what you found out? That would be helpful.

I basically just confirmed that the issues were caused by the fix in 4f4fd10. In #260, the issue related to that fix, the author of the fix confirmed that with my changes their (not publically available) PDF still worked fine, affirming that we can have all three new issues fixed at the same time without reintroducing the issue raised in #260.

* [X]  Please address my in-code comments.

Done, I will address those in an additional commit.

…test cases

CharmaineLohSiYing · 2020-09-23T14:13:30Z

@romanmatyus (#319), @Karmel0x (#322) and @CharmaineLohSiYing (#334): It would be great if you could give us a short feedback if this PR fixes your problems. Thank you in advance!

Hello! This PR fixes my problems. All the characters in my pdf file are now output properly just like in the demo. Thank you!

Connum · 2020-09-23T14:45:30Z

@k00ni: The sample PDF from #319 is signed, so it cannot be edited directly. Once I print it to a new PDF or import and export it with Illustrator, it does seem to introduce some new issues, but not necessarily the one we need to reproduce...

k00ni

The sample PDF from #319 is signed, so it cannot be edited directly. Once I print it to a new PDF or import and export it with Illustrator, it does seem to introduce some new issues, but not necessarily the one we need to reproduce...

We can't use it, because it contains a logo, but its optional for this PR so we can ignore it.

I saw that you asked around in the issues you referenced. Can you give me a short summary about what you found out? That would be helpful.

Solved 👍

k00ni · 2020-09-25T09:25:57Z

I will keep that open for a ~week to allow others to comment. If there are no objections, I will merge.

MrMadClown · 2020-09-27T13:34:21Z

Hi,
might be related to 4f4fd10.
The text retrieved from This PDF is pretty scrambled in the current version (0.16.2). I reverted to 0.14.0 (pre 4f4fd10) no more scrambling.
I'm still seeing way more spaces in the parsed Text than I can see in the PDF but those might actually be zero-width-spaces not sure.

k00ni · 2020-09-28T06:36:54Z

Can you please check if you have to same behavior with the version in this branch (https://github.com/Connum/pdfparser/tree/fix-260-319-322-334)?

Connum · 2020-09-29T15:42:30Z

I can confirm that the parsing of the PDF provided by @MrMadClown is also fixed with this PR! One example:

Before

Gestaltung und Sat��4GIGNKPFKU�9GUVRJCN�)TC��M�&GUKIn / Norbert Lauterbach

After

Gestaltung und Satz: Regelindis Westphal Grafik-Design / Norbert Lauterbach

All other instances of scrambled text are fixed as well, no more "strange" characters!

k00ni · 2020-09-30T07:40:36Z

Thank you @Connum and the others for your effort.

…ot#322 and smalot#334 (smalot#342) * Revert "Fix \f crush" This reverts commit 4f4fd10. * revert 4f4fd10 preserving fix for smalot#260, fixing smalot#319, smalot#322 and smalot#334 * reduced sample PDF and added more descriptive comment to the two new test cases

Connum added 2 commits September 22, 2020 23:03

Revert "Fix \f crush"

cbf5cfe

This reverts commit 4f4fd10.

revert 4f4fd10 preserving fix for smalot#260, fixing smalot#319, smal…

d8bc1b4

…ot#322 and smalot#334

k00ni added the fix label Sep 23, 2020

k00ni self-assigned this Sep 23, 2020

k00ni linked an issue Sep 23, 2020 that may be closed by this pull request

Wrong characters parsed with getText() #260

Closed

k00ni requested changes Sep 23, 2020

View reviewed changes

k00ni reviewed Sep 23, 2020

View reviewed changes

tests/Integration/ParserTest.php Outdated Show resolved Hide resolved

k00ni reviewed Sep 23, 2020

View reviewed changes

tests/Integration/ParserTest.php Show resolved Hide resolved

k00ni added needs more info needs work labels Sep 23, 2020

This was linked to issues Sep 23, 2020

Fix encoding in demo #319

Closed

Changes in commit 4f4fd10 lead to more slashes getting stripped, than neccessary #322

Closed

Characters are translated correctly in Demo but not in Local #334

Closed

reduced sample PDF and added more descriptive comment to the two new …

9197657

…test cases

Connum requested a review from k00ni September 23, 2020 14:43

k00ni approved these changes Sep 25, 2020

View reviewed changes

k00ni removed needs more info needs work labels Sep 25, 2020

Connum mentioned this pull request Sep 25, 2020

Problem in part of the translation #273

Closed

k00ni merged commit 1862686 into smalot:master Sep 30, 2020

Connum deleted the fix-260-319-322-334 branch October 1, 2020 06:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

revert 4f4fd10 preserving fix for #260, fixing #319, #322 and #334 #342

revert 4f4fd10 preserving fix for #260, fixing #319, #322 and #334 #342

Connum commented Sep 22, 2020 •

edited

Loading

k00ni left a comment •

edited

Loading

k00ni commented Sep 23, 2020 •

edited

Loading

Connum commented Sep 23, 2020

CharmaineLohSiYing commented Sep 23, 2020

Connum commented Sep 23, 2020

k00ni left a comment

k00ni commented Sep 25, 2020

MrMadClown commented Sep 27, 2020

k00ni commented Sep 28, 2020

Connum commented Sep 29, 2020

k00ni commented Sep 30, 2020

revert 4f4fd10 preserving fix for #260, fixing #319, #322 and #334 #342

revert 4f4fd10 preserving fix for #260, fixing #319, #322 and #334 #342

Conversation

Connum commented Sep 22, 2020 • edited Loading

k00ni left a comment • edited Loading

Choose a reason for hiding this comment

k00ni commented Sep 23, 2020 • edited Loading

Connum commented Sep 23, 2020

CharmaineLohSiYing commented Sep 23, 2020

Connum commented Sep 23, 2020

k00ni left a comment

Choose a reason for hiding this comment

k00ni commented Sep 25, 2020

MrMadClown commented Sep 27, 2020

k00ni commented Sep 28, 2020

Connum commented Sep 29, 2020

k00ni commented Sep 30, 2020

Connum commented Sep 22, 2020 •

edited

Loading

k00ni left a comment •

edited

Loading

k00ni commented Sep 23, 2020 •

edited

Loading