Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make use of textAdvanceScale consistent during combineTextItems. Fix for #7878. #7879

Merged
merged 1 commit into from
Dec 19, 2016

Conversation

rossj
Copy link
Contributor

@rossj rossj commented Dec 6, 2016

Currently, while generating the text content of a page, multiple text draw operations may be combined into a single textContentItem. The running size of this textContentItem is kept as text is added and as the text transformation is changed; however, the summing of this size inconsistently applies the textAdvanceScale factor.

This pull request fixes the discrepancy by only using the textAdvanceScale upon completion / flushing of the textContentItem. The result is that the textContentItem's size is calculated correctly, improving text highlighting in cases like #7878.

More details:

Currently, textAdvanceScale is applied when adding the width / height of actual text content here. The textAdvanceScale is not applied when accounting for size changes due to text advance transformation changes, such as here.

I believe this issue might not appear often as it requires there to be a non-zero difference between the text advance distance and the previously calculated lastAdvanceWidth in order for the non-application of the scaling factor to have an effect.

@timvandermeij
Copy link
Contributor

/botio-linux preview

@pdfjsbot
Copy link

pdfjsbot commented Dec 7, 2016

From: Bot.io (Linux)


Received

Command cmd_preview from @timvandermeij received. Current queue size: 0

Live output at: http://107.21.233.14:8877/77b19eac6be9ae1/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Dec 7, 2016

From: Bot.io (Linux)


Success

Full output at http://107.21.233.14:8877/77b19eac6be9ae1/output.txt

Total script time: 2.27 mins

Published

@timvandermeij
Copy link
Contributor

/botio test

@pdfjsbot
Copy link

pdfjsbot commented Dec 7, 2016

From: Bot.io (Windows)


Received

Command cmd_test from @timvandermeij received. Current queue size: 0

Live output at: http://107.22.172.223:8877/ed74e45fe736535/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Dec 7, 2016

From: Bot.io (Linux)


Received

Command cmd_test from @timvandermeij received. Current queue size: 0

Live output at: http://107.21.233.14:8877/c720ff5dbd84647/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Dec 7, 2016

From: Bot.io (Windows)


Failed

Full output at http://107.22.172.223:8877/ed74e45fe736535/output.txt

Total script time: 26.45 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: FAILED

Image differences available at: http://107.22.172.223:8877/ed74e45fe736535/reftest-analyzer.html#web=eq.log

@pdfjsbot
Copy link

pdfjsbot commented Dec 7, 2016

From: Bot.io (Linux)


Failed

Full output at http://107.21.233.14:8877/c720ff5dbd84647/output.txt

Total script time: 26.76 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: FAILED

Image differences available at: http://107.21.233.14:8877/c720ff5dbd84647/reftest-analyzer.html#web=eq.log

@timvandermeij
Copy link
Contributor

The tests reveal some unexpected changes. In particular, firefox-issue1936-text-page1 seems bad. Could you check why that is the case?

@rossj
Copy link
Contributor Author

rossj commented Dec 14, 2016

@you are right about the issues with some files -- sorry for not noticing these.

I tracked the issue down to the fact that textMatrix scaling was already being applied to TJ offsets during processing of the showSpacedText op, which I was then re-applying when "flushing" the textContentItem. Removing these first scaling operations fixes the problem files.

screen shot 2016-12-14 at 1 42 43 am

I should mention that I do find the scaling operations that I removed a bit odd, and I don't quite understand mathematically how they correspond with the equations in section 9.4.4 "Text Space Details" of the PDF spec. I believe my new updates are in line with the provided text matrix equations, but I could be missing something.

@timvandermeij
Copy link
Contributor

Looks good. Could you add the file from #7878 as a regression test? Check https://github.com/mozilla/pdf.js/pull/7714/files as an example of how to add a test case. You have to do three things: update .gitignore to ignore the PDF file, add the PDF file itself and add an entry to test_manifest.json like the one over there. This helps us to prevent any regressions in the future.

When that is done, please squash the commits into one commit. See https://github.com/mozilla/pdf.js/wiki/Squashing-Commits for how to do that easily. Thanks!

@rossj
Copy link
Contributor Author

rossj commented Dec 14, 2016

Sure, I'll make those changes. Is it best to name the new test file after the issue or to give it a more descriptive name?

I've also just noticed that evaluator.js isn't properly detecting the font in the #7878 file as monospace, despite the FixedPitch font flag being set and the Font providing a single W width. This results in the default glyph width of 1000 being used instead of the correct width of 600. This doesn't effect highlighting the #7878 PDF very much because each glyph is essentially positioned explicitly, but I can imagine for longer string draws the calculated text/highlight width would be much too large.

Do you suggest I look into the monospace detection as part of this PR or should I make a new issue?

@brendandahl
Copy link
Contributor

Is it best to name the new test file after the issue or to give it a more descriptive name?

Issue number is preferred for referencing it later.

Do you suggest I look into the monospace detection as part of this PR or should I make a new issue?

New PR would be best.

@rossj
Copy link
Contributor Author

rossj commented Dec 16, 2016

Squashed. Please let me know if anything else should be changed. Thanks!

@timvandermeij
Copy link
Contributor

/botio-linux preview

@pdfjsbot
Copy link

From: Bot.io (Linux)


Received

Command cmd_preview from @timvandermeij received. Current queue size: 0

Live output at: http://107.21.233.14:8877/b8d955a2ca1c613/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux)


Success

Full output at http://107.21.233.14:8877/b8d955a2ca1c613/output.txt

Total script time: 2.20 mins

Published

@timvandermeij
Copy link
Contributor

/botio test

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_test from @timvandermeij received. Current queue size: 0

Live output at: http://107.22.172.223:8877/7f8db47a07acdba/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux)


Received

Command cmd_test from @timvandermeij received. Current queue size: 0

Live output at: http://107.21.233.14:8877/1d42593fde57d1c/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Failed

Full output at http://107.22.172.223:8877/7f8db47a07acdba/output.txt

Total script time: 26.01 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: FAILED

Image differences available at: http://107.22.172.223:8877/7f8db47a07acdba/reftest-analyzer.html#web=eq.log

@pdfjsbot
Copy link

From: Bot.io (Linux)


Failed

Full output at http://107.21.233.14:8877/1d42593fde57d1c/output.txt

Total script time: 26.28 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: FAILED

Image differences available at: http://107.21.233.14:8877/1d42593fde57d1c/reftest-analyzer.html#web=eq.log

@yurydelendik
Copy link
Contributor

/botio makeref

Thank you for the patch!

@pdfjsbot
Copy link

From: Bot.io (Linux)


Received

Command cmd_makeref from @yurydelendik received. Current queue size: 0

Live output at: http://107.21.233.14:8877/1380ed48fb3de45/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_makeref from @yurydelendik received. Current queue size: 0

Live output at: http://107.22.172.223:8877/024b0ad867c5812/output.txt

@yurydelendik yurydelendik merged commit 3b3a179 into mozilla:master Dec 19, 2016
@pdfjsbot
Copy link

From: Bot.io (Linux)


Success

Full output at http://107.21.233.14:8877/1380ed48fb3de45/output.txt

Total script time: 24.98 mins

  • Lint: Passed
  • Make references: Passed
  • Check references: Passed

@pdfjsbot
Copy link

From: Bot.io (Windows)


Success

Full output at http://107.22.172.223:8877/024b0ad867c5812/output.txt

Total script time: 25.49 mins

  • Lint: Passed
  • Make references: Passed
  • Check references: Passed

movsb pushed a commit to movsb/pdf.js that referenced this pull request Jul 14, 2018
Make use of textAdvanceScale consistent during combineTextItems. Fix for mozilla#7878.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants