Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[api-minor] Always allow e.g. rendering to continue even if there are errors, and add a stopAtErrors parameter to getDocument to opt-out of this behaviour (issue 6342, issue 3795, bug 1130815) #8240

Merged
merged 2 commits into from
Apr 13, 2017

Conversation

Snuffleupagus
Copy link
Collaborator

@Snuffleupagus Snuffleupagus commented Apr 5, 2017

Other PDF readers, e.g. Adobe Reader and PDFium (in Chrome), will attempt to render as much of a page as possible even if there are errors present.
Currently we just bail as soon the first error is hit, which means that we'll usually not render anything in these cases and just display a blank page instead.

NOTE: This patch changes the default behaviour of the PDF.js API to always attempt to recover as much data as possible, even when encountering errors during e.g. getOperatorList/getTextContent, which thus improve our handling of corrupt PDF files and allow the default viewer to handle errors slightly more gracefully.
In the event that an API consumer wishes to use the old behaviour, where we stop parsing as soon as an error is encountered, the stopAtErrors parameter can be set at getDocument.

Fixes, inasmuch it's possible since the PDF files are corrupt, e.g. issue #6342, issue #3795, and bug 1130815 (and probably others too).

Note: Slightly smaller diff with https://github.com/mozilla/pdf.js/pull/8240/files?w=1.


This change is Reviewable

@Snuffleupagus
Copy link
Collaborator Author

/botio test

@pdfjsbot
Copy link

pdfjsbot commented Apr 6, 2017

From: Bot.io (Linux)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 1

Live output at: http://107.21.233.14:8877/a41796f6412227a/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Apr 6, 2017

From: Bot.io (Windows)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 1

Live output at: http://54.215.176.217:8877/6874c22d59ca2f9/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Apr 6, 2017

From: Bot.io (Windows)


Success

Full output at http://54.215.176.217:8877/6874c22d59ca2f9/output.txt

Total script time: 23.20 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

@pdfjsbot
Copy link

pdfjsbot commented Apr 6, 2017

From: Bot.io (Linux)


Success

Full output at http://107.21.233.14:8877/a41796f6412227a/output.txt

Total script time: 29.76 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

… errors, and add a `stopAtErrors` parameter to `getDocument` to opt-out of this behaviour (issue 6342, issue 3795, bug 1130815)

Other PDF readers, e.g. Adobe Reader and PDFium (in Chrome), will attempt to render as much of a page as possible even if there are errors present.
Currently we just bail as soon the first error is hit, which means that we'll usually not render anything in these cases and just display a blank page instead.

NOTE: This patch changes the default behaviour of the PDF.js API to always attempt to recover as much data as possible, even when encountering errors during e.g. `getOperatorList`/`getTextContent`, which thus improve our handling of corrupt PDF files and allow the default viewer to handle errors slightly more gracefully.
In the event that an API consumer wishes to use the old behaviour, where we stop parsing as soon as an error is encountered, the `stopAtErrors` parameter can be set at `getDocument`.

Fixes, inasmuch it's possible since the PDF files are corrupt, e.g. issue 6342, issue 3795, and [bug 1130815](https://bugzilla.mozilla.org/show_bug.cgi?id=1130815) (and probably others too).
…egardless of the value of the `stopAtErrors` option

Compared to the parsing of e.g. an entire page, it doesn't really make sense to only be able to render a Type3 glyph partially.
@Snuffleupagus
Copy link
Collaborator Author

Updated to fix a merge conflict in test/pdfs/.gitignore, no actual code changes were made.

@Snuffleupagus
Copy link
Collaborator Author

/botio-linux preview

@pdfjsbot
Copy link

From: Bot.io (Linux)


Received

Command cmd_preview from @Snuffleupagus received. Current queue size: 0

Live output at: http://107.21.233.14:8877/1bf6c42c54f165e/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux)


Success

Full output at http://107.21.233.14:8877/1bf6c42c54f165e/output.txt

Total script time: 2.80 mins

Published

Copy link
Contributor

@yurydelendik yurydelendik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Logic around clone of PartialEvaluator looks complicated though. Any future refactoring there will be nice to have.

@yurydelendik
Copy link
Contributor

/botio makeref

@pdfjsbot
Copy link

From: Bot.io (Linux)


Received

Command cmd_makeref from @yurydelendik received. Current queue size: 0

Live output at: http://107.21.233.14:8877/327af8cfa87ec48/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_makeref from @yurydelendik received. Current queue size: 2

Live output at: http://54.215.176.217:8877/472fb6adc43e8c3/output.txt

@yurydelendik yurydelendik merged commit c4c44c1 into mozilla:master Apr 13, 2017
@pdfjsbot
Copy link

From: Bot.io (Linux)


Success

Full output at http://107.21.233.14:8877/327af8cfa87ec48/output.txt

Total script time: 29.54 mins

  • Lint: Passed
  • Make references: Passed
  • Check references: Passed

@pdfjsbot
Copy link

From: Bot.io (Windows)


Success

Full output at http://54.215.176.217:8877/472fb6adc43e8c3/output.txt

Total script time: 22.78 mins

  • Lint: Passed
  • Make references: Passed
  • Check references: Passed

@Snuffleupagus Snuffleupagus deleted the api-stopAtErrors branch April 13, 2017 16:40
movsb pushed a commit to movsb/pdf.js that referenced this pull request Jul 14, 2018
[api-minor] Always allow e.g. rendering to continue even if there are errors, and add a `stopAtErrors` parameter to `getDocument` to opt-out of this behaviour (issue 6342, issue 3795, bug 1130815)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants