[api-minor] Always allow e.g. rendering to continue even if there are errors, and add a `stopAtErrors` parameter to `getDocument` to opt-out of this behaviour (issue 6342, issue 3795, bug 1130815) #8240

Snuffleupagus · 2017-04-05T15:37:40Z

Other PDF readers, e.g. Adobe Reader and PDFium (in Chrome), will attempt to render as much of a page as possible even if there are errors present.
Currently we just bail as soon the first error is hit, which means that we'll usually not render anything in these cases and just display a blank page instead.

NOTE: This patch changes the default behaviour of the PDF.js API to always attempt to recover as much data as possible, even when encountering errors during e.g. getOperatorList/getTextContent, which thus improve our handling of corrupt PDF files and allow the default viewer to handle errors slightly more gracefully.
In the event that an API consumer wishes to use the old behaviour, where we stop parsing as soon as an error is encountered, the stopAtErrors parameter can be set at getDocument.

Fixes, inasmuch it's possible since the PDF files are corrupt, e.g. issue #6342, issue #3795, and bug 1130815 (and probably others too).

Note: Slightly smaller diff with https://github.com/mozilla/pdf.js/pull/8240/files?w=1.

This change is

Snuffleupagus · 2017-04-06T08:29:02Z

/botio test

pdfjsbot · 2017-04-06T08:29:03Z

From: Bot.io (Linux)

Received

Command cmd_test from @Snuffleupagus received. Current queue size: 1

Live output at: http://107.21.233.14:8877/a41796f6412227a/output.txt

pdfjsbot · 2017-04-06T08:29:03Z

From: Bot.io (Windows)

Received

Command cmd_test from @Snuffleupagus received. Current queue size: 1

Live output at: http://54.215.176.217:8877/6874c22d59ca2f9/output.txt

pdfjsbot · 2017-04-06T09:00:02Z

From: Bot.io (Windows)

Success

Full output at http://54.215.176.217:8877/6874c22d59ca2f9/output.txt

Total script time: 23.20 mins

Font tests: Passed
Unit tests: Passed
Regression tests: Passed

pdfjsbot · 2017-04-06T09:00:52Z

From: Bot.io (Linux)

Success

Full output at http://107.21.233.14:8877/a41796f6412227a/output.txt

Total script time: 29.76 mins

Font tests: Passed
Unit tests: Passed
Regression tests: Passed

… errors, and add a `stopAtErrors` parameter to `getDocument` to opt-out of this behaviour (issue 6342, issue 3795, bug 1130815) Other PDF readers, e.g. Adobe Reader and PDFium (in Chrome), will attempt to render as much of a page as possible even if there are errors present. Currently we just bail as soon the first error is hit, which means that we'll usually not render anything in these cases and just display a blank page instead. NOTE: This patch changes the default behaviour of the PDF.js API to always attempt to recover as much data as possible, even when encountering errors during e.g. `getOperatorList`/`getTextContent`, which thus improve our handling of corrupt PDF files and allow the default viewer to handle errors slightly more gracefully. In the event that an API consumer wishes to use the old behaviour, where we stop parsing as soon as an error is encountered, the `stopAtErrors` parameter can be set at `getDocument`. Fixes, inasmuch it's possible since the PDF files are corrupt, e.g. issue 6342, issue 3795, and [bug 1130815](https://bugzilla.mozilla.org/show_bug.cgi?id=1130815) (and probably others too).

…egardless of the value of the `stopAtErrors` option Compared to the parsing of e.g. an entire page, it doesn't really make sense to only be able to render a Type3 glyph partially.

Snuffleupagus · 2017-04-11T07:00:29Z

Updated to fix a merge conflict in test/pdfs/.gitignore, no actual code changes were made.

Snuffleupagus · 2017-04-11T07:07:16Z

/botio-linux preview

pdfjsbot · 2017-04-11T07:07:17Z

From: Bot.io (Linux)

Received

Command cmd_preview from @Snuffleupagus received. Current queue size: 0

Live output at: http://107.21.233.14:8877/1bf6c42c54f165e/output.txt

pdfjsbot · 2017-04-11T07:10:05Z

From: Bot.io (Linux)

Success

Full output at http://107.21.233.14:8877/1bf6c42c54f165e/output.txt

Total script time: 2.80 mins

Published

yurydelendik

Looks good. Logic around clone of PartialEvaluator looks complicated though. Any future refactoring there will be nice to have.

yurydelendik · 2017-04-13T15:58:46Z

/botio makeref

pdfjsbot · 2017-04-13T15:58:46Z

From: Bot.io (Linux)

Received

Command cmd_makeref from @yurydelendik received. Current queue size: 0

Live output at: http://107.21.233.14:8877/327af8cfa87ec48/output.txt

pdfjsbot · 2017-04-13T15:58:47Z

From: Bot.io (Windows)

Received

Command cmd_makeref from @yurydelendik received. Current queue size: 2

Live output at: http://54.215.176.217:8877/472fb6adc43e8c3/output.txt

pdfjsbot · 2017-04-13T16:28:19Z

From: Bot.io (Linux)

Success

Full output at http://107.21.233.14:8877/327af8cfa87ec48/output.txt

Total script time: 29.54 mins

Lint: Passed
Make references: Passed
Check references: Passed

pdfjsbot · 2017-04-13T16:36:48Z

From: Bot.io (Windows)

Success

Full output at http://54.215.176.217:8877/472fb6adc43e8c3/output.txt

Total script time: 22.78 mins

Lint: Passed
Make references: Passed
Check references: Passed

[api-minor] Always allow e.g. rendering to continue even if there are errors, and add a `stopAtErrors` parameter to `getDocument` to opt-out of this behaviour (issue 6342, issue 3795, bug 1130815)

Snuffleupagus added core corrupted-pdf labels Apr 5, 2017

Snuffleupagus requested a review from yurydelendik April 5, 2017 15:37

Snuffleupagus mentioned this pull request Apr 5, 2017

[api-minor] Add an ignoreErrors parameter to PDFPageProxy.{render, getTextContent, getOperatorList} to allow e.g. rendering to continue even if there are errors (issue 6342, issue 3795, bug 1130815) #8176

Closed

Snuffleupagus added 2 commits April 11, 2017 08:59

Always ignore Type3 glyphs if their OperatorLists contain errors, r…

fbe7b2e

…egardless of the value of the `stopAtErrors` option Compared to the parsing of e.g. an entire page, it doesn't really make sense to only be able to render a Type3 glyph partially.

yurydelendik approved these changes Apr 13, 2017

View reviewed changes

yurydelendik merged commit c4c44c1 into mozilla:master Apr 13, 2017

Snuffleupagus deleted the api-stopAtErrors branch April 13, 2017 16:40

This was referenced Apr 13, 2017

"This PDF document might not be displayed correctly." - Endless loading. #6342

Closed

Warning: Unhandled rejection: Error: Invalid floating point number: NaN #3795

Closed

Snuffleupagus mentioned this pull request Sep 17, 2017

Allow getOperatorList/getTextContent to skip errors when parsing broken XObjects (issue 8702, issue 8704) #8922

Merged

Snuffleupagus mentioned this pull request Jun 13, 2018

Allow FontFaceObject.getPathGenerator to ignore non-embedded fonts during rendering #9809

Merged

Snuffleupagus mentioned this pull request Oct 31, 2019

Allow skipping of errors when parsing broken/unsupported ColorSpaces (issue 6707, issue 11287) #11296

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[api-minor] Always allow e.g. rendering to continue even if there are errors, and add a `stopAtErrors` parameter to `getDocument` to opt-out of this behaviour (issue 6342, issue 3795, bug 1130815) #8240

[api-minor] Always allow e.g. rendering to continue even if there are errors, and add a `stopAtErrors` parameter to `getDocument` to opt-out of this behaviour (issue 6342, issue 3795, bug 1130815) #8240

Snuffleupagus commented Apr 5, 2017 •

edited

Loading

Snuffleupagus commented Apr 6, 2017

pdfjsbot commented Apr 6, 2017

pdfjsbot commented Apr 6, 2017

pdfjsbot commented Apr 6, 2017

pdfjsbot commented Apr 6, 2017

Snuffleupagus commented Apr 11, 2017

Snuffleupagus commented Apr 11, 2017

pdfjsbot commented Apr 11, 2017

pdfjsbot commented Apr 11, 2017

yurydelendik left a comment

yurydelendik commented Apr 13, 2017

pdfjsbot commented Apr 13, 2017

pdfjsbot commented Apr 13, 2017

pdfjsbot commented Apr 13, 2017

pdfjsbot commented Apr 13, 2017

[api-minor] Always allow e.g. rendering to continue even if there are errors, and add a stopAtErrors parameter to getDocument to opt-out of this behaviour (issue 6342, issue 3795, bug 1130815) #8240

[api-minor] Always allow e.g. rendering to continue even if there are errors, and add a stopAtErrors parameter to getDocument to opt-out of this behaviour (issue 6342, issue 3795, bug 1130815) #8240

Conversation

Snuffleupagus commented Apr 5, 2017 • edited Loading

Snuffleupagus commented Apr 6, 2017

pdfjsbot commented Apr 6, 2017

From: Bot.io (Linux)

Received

pdfjsbot commented Apr 6, 2017

From: Bot.io (Windows)

Received

pdfjsbot commented Apr 6, 2017

From: Bot.io (Windows)

Success

pdfjsbot commented Apr 6, 2017

From: Bot.io (Linux)

Success

Snuffleupagus commented Apr 11, 2017

Snuffleupagus commented Apr 11, 2017

pdfjsbot commented Apr 11, 2017

From: Bot.io (Linux)

Received

pdfjsbot commented Apr 11, 2017

From: Bot.io (Linux)

Success

Published

yurydelendik left a comment

Choose a reason for hiding this comment

yurydelendik commented Apr 13, 2017

pdfjsbot commented Apr 13, 2017

From: Bot.io (Linux)

Received

pdfjsbot commented Apr 13, 2017

From: Bot.io (Windows)

Received

pdfjsbot commented Apr 13, 2017

From: Bot.io (Linux)

Success

pdfjsbot commented Apr 13, 2017

From: Bot.io (Windows)

Success

[api-minor] Always allow e.g. rendering to continue even if there are errors, and add a `stopAtErrors` parameter to `getDocument` to opt-out of this behaviour (issue 6342, issue 3795, bug 1130815) #8240

[api-minor] Always allow e.g. rendering to continue even if there are errors, and add a `stopAtErrors` parameter to `getDocument` to opt-out of this behaviour (issue 6342, issue 3795, bug 1130815) #8240

Snuffleupagus commented Apr 5, 2017 •

edited

Loading