-
Notifications
You must be signed in to change notification settings - Fork 10k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[api-minor] Always allow e.g. rendering to continue even if there are errors, and add a stopAtErrors
parameter to getDocument
to opt-out of this behaviour (issue 6342, issue 3795, bug 1130815)
#8240
Conversation
/botio test |
From: Bot.io (Linux)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 1 Live output at: http://107.21.233.14:8877/a41796f6412227a/output.txt |
From: Bot.io (Windows)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 1 Live output at: http://54.215.176.217:8877/6874c22d59ca2f9/output.txt |
From: Bot.io (Windows)SuccessFull output at http://54.215.176.217:8877/6874c22d59ca2f9/output.txt Total script time: 23.20 mins
|
From: Bot.io (Linux)SuccessFull output at http://107.21.233.14:8877/a41796f6412227a/output.txt Total script time: 29.76 mins
|
… errors, and add a `stopAtErrors` parameter to `getDocument` to opt-out of this behaviour (issue 6342, issue 3795, bug 1130815) Other PDF readers, e.g. Adobe Reader and PDFium (in Chrome), will attempt to render as much of a page as possible even if there are errors present. Currently we just bail as soon the first error is hit, which means that we'll usually not render anything in these cases and just display a blank page instead. NOTE: This patch changes the default behaviour of the PDF.js API to always attempt to recover as much data as possible, even when encountering errors during e.g. `getOperatorList`/`getTextContent`, which thus improve our handling of corrupt PDF files and allow the default viewer to handle errors slightly more gracefully. In the event that an API consumer wishes to use the old behaviour, where we stop parsing as soon as an error is encountered, the `stopAtErrors` parameter can be set at `getDocument`. Fixes, inasmuch it's possible since the PDF files are corrupt, e.g. issue 6342, issue 3795, and [bug 1130815](https://bugzilla.mozilla.org/show_bug.cgi?id=1130815) (and probably others too).
…egardless of the value of the `stopAtErrors` option Compared to the parsing of e.g. an entire page, it doesn't really make sense to only be able to render a Type3 glyph partially.
Updated to fix a merge conflict in |
/botio-linux preview |
From: Bot.io (Linux)ReceivedCommand cmd_preview from @Snuffleupagus received. Current queue size: 0 Live output at: http://107.21.233.14:8877/1bf6c42c54f165e/output.txt |
From: Bot.io (Linux)SuccessFull output at http://107.21.233.14:8877/1bf6c42c54f165e/output.txt Total script time: 2.80 mins Published |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Logic around clone of PartialEvaluator looks complicated though. Any future refactoring there will be nice to have.
/botio makeref |
From: Bot.io (Linux)ReceivedCommand cmd_makeref from @yurydelendik received. Current queue size: 0 Live output at: http://107.21.233.14:8877/327af8cfa87ec48/output.txt |
From: Bot.io (Windows)ReceivedCommand cmd_makeref from @yurydelendik received. Current queue size: 2 Live output at: http://54.215.176.217:8877/472fb6adc43e8c3/output.txt |
From: Bot.io (Linux)SuccessFull output at http://107.21.233.14:8877/327af8cfa87ec48/output.txt Total script time: 29.54 mins
|
From: Bot.io (Windows)SuccessFull output at http://54.215.176.217:8877/472fb6adc43e8c3/output.txt Total script time: 22.78 mins
|
[api-minor] Always allow e.g. rendering to continue even if there are errors, and add a `stopAtErrors` parameter to `getDocument` to opt-out of this behaviour (issue 6342, issue 3795, bug 1130815)
Other PDF readers, e.g. Adobe Reader and PDFium (in Chrome), will attempt to render as much of a page as possible even if there are errors present.
Currently we just bail as soon the first error is hit, which means that we'll usually not render anything in these cases and just display a blank page instead.
NOTE: This patch changes the default behaviour of the PDF.js API to always attempt to recover as much data as possible, even when encountering errors during e.g.
getOperatorList
/getTextContent
, which thus improve our handling of corrupt PDF files and allow the default viewer to handle errors slightly more gracefully.In the event that an API consumer wishes to use the old behaviour, where we stop parsing as soon as an error is encountered, the
stopAtErrors
parameter can be set atgetDocument
.Fixes, inasmuch it's possible since the PDF files are corrupt, e.g. issue #6342, issue #3795, and bug 1130815 (and probably others too).
Note: Slightly smaller diff with https://github.com/mozilla/pdf.js/pull/8240/files?w=1.
This change is