Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poor performance on NASA Budget #4339

Closed
bthorben opened this issue Feb 26, 2014 · 9 comments
Closed

Poor performance on NASA Budget #4339

bthorben opened this issue Feb 26, 2014 · 9 comments

Comments

@bthorben
Copy link
Contributor

Performance is extremely poor when viewing the NASA 2014 Budget request available at http://www.nasa.gov/pdf/750614main_NASA_FY_2014_Budget_Estimates-508.pdf

@bthorben
Copy link
Contributor Author

bthorben commented Mar 3, 2014

Update on our analysis: pdf.js seemed to process the fonts in the document many times, basically loading the font again for every page. This suggested a cache issue. For testing, we added a cache after the font was translated and that made the document 10x faster.

@timvandermeij
Copy link
Contributor

@bthorben Nice! Could you make a pull request for that if it solves the PDF.js issue?

@Snuffleupagus
Copy link
Collaborator

Since there has been a lot of focus on reducing memory consumption of PDF.js lately, it would also be interesting to know if, and how, this kind of caching impacts the memory consumption.

@bthorben
Copy link
Contributor Author

bthorben commented Mar 3, 2014

Our "solution" is really just a quick hack here, something we added to test our theories about how PDF.js works. The way we cached is actually quite inefficient and doing it right would probably improve performance on this document another 2 - 4 times. We will spend more time to find an elegant solution.

@bthorben
Copy link
Contributor Author

bthorben commented Mar 3, 2014

@Snuffleupagus Can you give us some data that shows the problems with memory consumption? Regarding this issue, not generating the fonts many times but caching them reduces memory consumption when viewing this document considerably

@Snuffleupagus
Copy link
Collaborator

Can you give us some data that shows the problems with memory consumption?

Sorry, I don't think I expressed myself clearly enough!
I just meant that it would be nice, when you submit a PR, to include a comment about the memory consumption before and after the patch. (Nothing complicated, just something like e.g. in #4355.)

@bthorben
Copy link
Contributor Author

bthorben commented Mar 3, 2014

@Snuffleupagus, ok, I see. It would be much nicer if we could have actual benchmarking

bthorben pushed a commit to bthorben/pdf.js that referenced this issue Mar 5, 2014
While working on issue mozilla#4339 it was confusing that the code to
translate a font is not found in a single place. This commit extracts
the code and puts in a class called FontTranslator
@bthorben
Copy link
Contributor Author

bthorben commented Mar 5, 2014

We analysed the issue further. We wrote a small tool (available here) to gain insights into the document and its object graph. This is our conclusion on which we will base a solution:

The document makes use of at least one Type 0 font. Type 0 fonts are basically composed of

  1. a CMap
  2. a CIDFont

In this particular case there are many Type0 fonts (shown at [1]) which use the same CIDFont, as shown by this graph (extracted using an uncompressed version of the NASA budget using our tool):

4339, explanation graph

The node on the left (177065 T6) is the font program of the CID Font, above that you see its FontDescriptor and the CIDFont dictionary. We shortened the graph, but on the right you see three Type 0 fonts that use this font. The nodes 28, 46 and 10 are the CMap dictionaries and they reference an array as their DescendantFonts that has our CID Font as it’s sole reference.

This situation shouldn’t be that special (I guess this makes sense for a linearised document) but here it gets interesting: The CMaps are all the same, which means that the Type 0 fonts all actually look the same. Since now PDF.js stores the translated Font object at the Type0 font node (more precisely: its parsed dictionary, compare [2]), for each font there will be another one created. This is what makes the NASA-Budget so slow in PDF.js.

[1]

### CONTENT OF  18490 ###
18490 0 obj
<<
  /BaseFont /EZAGTP+Arial
  /DescendantFonts 13076 0 R
  /Encoding /Identity-H
  /Subtype /Type0
  /ToUnicode 28 0 R
  /Type /Font
>>
### END CONTENT 18490 ###
### CONTENT OF  18496 ###
18496 0 obj
<<
  /BaseFont /EZAGTP+Arial
  /DescendantFonts 13086 0 R
  /Encoding /Identity-H
  /Subtype /Type0
  /ToUnicode 46 0 R
  /Type /Font
>>
### END CONTENT 18496 ###
### CONTENT OF  18483 ###
18483 0 obj
<<
  /BaseFont /EZAGTP+Arial
  /DescendantFonts 13067 0 R
  /Encoding /Identity-H
  /Subtype /Type0
  /ToUnicode 10 0 R
  /Type /Font
>>
### END CONTENT 18483 ###

[2] this.fontCache.put(fontRef, font); in src/core/evaluator.js

@bthorben
Copy link
Contributor Author

bthorben commented Mar 5, 2014

Our solution is relatively simple: We create a cache at the font-descriptor of the CIDFont that is indexed by encoding. This means if the encoding is the same the expensive font translation will be done only once.

bthorben pushed a commit to bthorben/pdf.js that referenced this issue Mar 5, 2014
This should fix mozilla#4339. We attached an explanation of the idea at the issue.
chriskr pushed a commit to chriskr/pdf.js that referenced this issue Mar 13, 2014
Different fonts can point to the same font descriptor
(see mozilla#4339 for details). With this
commit such fonts are treated as aliases if they have also the same encoding.
The according info is stored on the font descriptor. This change must also
ensure that aliases use always the same font name because translated fonts
can get cleared depending on the CLEANUP_TIMEOUT setting.
bthorben pushed a commit to bthorben/pdf.js that referenced this issue Mar 14, 2014
While working on issue mozilla#4339 it was confusing that the code to
translate a font is not found in a single place. This commit extracts
the code and puts in a class called FontTranslator
chriskr pushed a commit to chriskr/pdf.js that referenced this issue Apr 8, 2014
…s aliases

Different fonts can point to the same font descriptor
(see mozilla#4339 for details). With this
commit such fonts are treated as aliases if they have also the same encoding
and the same toUnicode map. The according info is stored on the font descriptor.
This change must also ensure that aliases use always the same font name
because translated fonts can get cleared depending on the CLEANUP_TIMEOUT setting.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants