Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does not parse pages other than the first page #331

Closed
Reqrefusion opened this issue Sep 14, 2020 · 7 comments · Fixed by #479
Closed

Does not parse pages other than the first page #331

Reqrefusion opened this issue Sep 14, 2020 · 7 comments · Fixed by #479

Comments

@Reqrefusion
Copy link

It converts only the first page of a 3 page pdf file. It does not do anything related to other pages. You can also observe this on the demo page.
Related pdf file:
https://www.resmigazete.gov.tr/ilanlar/eskiilanlar/2020/09/20200914-4-4.pdf

@k00ni k00ni added the bug label Sep 15, 2020
@Reqrefusion
Copy link
Author

Acrobat PDFMaker 10.1 for Word is used for this file, but similarly, files converted to PDF have the same error.

@Connum
Copy link
Contributor

Connum commented Oct 1, 2020

Interesting... You can use $pdf->getPages(); to get all three pages, but ->getText() returns an empty string on the last two of them. Furthermore, the third page is of type Smalot\PdfParser\PDFObject instead of Smalot\PdfParser\Page. That should be handled in Pages.php inside the if/else construct in getPages(). In my opinion this should never return anything else than Pages!

@Connum
Copy link
Contributor

Connum commented Oct 1, 2020

I invested several hours now, trying to figure out what's wrong here...
I think one thing we should do at least is to change the } else { in Pages.php to

} elseif ($kid instanceof Page) {

getPages() should only ever return array elements of the type Page, but evidently, 'Kids' can hold other types too that we wouldn't want to pollute the resulting array with!

Other than that, I'm at a bit of a loss here...

@Reqrefusion
Copy link
Author

@Connum Thanks for trying but I didn't see any improvement. Maybe in the future this problem will be resolved somehow. Thank you again.

@Connum
Copy link
Contributor

Connum commented Oct 3, 2020

Yeah, that wasn't a fix for the issue itself, just something I noticed we'd want to change while investigating it. For the issue itself, maybe I'll find some more time and motivation to investigate it, otherwise hopefully someone else will.

@hpvd
Copy link

hpvd commented Apr 28, 2021

ahh, found the same problem when trying the demo for several documents #416
-> are the any multipage files working fine?

@hpvd
Copy link

hpvd commented Apr 29, 2021

Another example file can be found here: #416 (comment)

@k00ni k00ni linked a pull request Nov 12, 2021 that will close this issue
k00ni added a commit that referenced this issue Nov 22, 2021
* Add files via upload

Fixing problem of incomplete analysis of the /Index entry.

* Delete RawDataParser.php

Wrong subdirectory.

* Add files via upload

Fix problem of uncomplete analysis of /Index entry.

* Update RawDataParser.php

optical changes

* Update RawDataParser.php

optical changes

* Update RawDataParser.php

optical changes

* Add files via upload

After adding a description to the file, the valid /Index entry now contains two entries (consisting of 2 values: first object number, number of objects):
/Index[2 1 21 2]

* Update RawDataParserTest.php

Adding test for issue 479

* Update RawDataParserTest.php

Forgot a {

* Update RawDataParser.php

Code style update

* Update RawDataParserTest.php

Added more description and more checks.

* Update PageTest.php

Issue #331 is fixed by issue #479: test updated

* Update RawDataParserTest.php

optical fix

* Update PageTest.php

optical changes

* Update RawDataParser.php

change to remove the native_function_invocation message

* Update tests/Integration/PageTest.php

Co-authored-by: Konrad Abicht <hi@inspirito.de>

* Update RawDataParser.php

Added comments...

* Update RawDataParser.php

Changes for CS fixer

* Update PageTest.php

Comment update

* Update tests/Integration/PageTest.php

Co-authored-by: Konrad Abicht <hi@inspirito.de>

Co-authored-by: Konrad Abicht <hi@inspirito.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants