Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New soft mask method causes some text not to be rendered #14982

Closed
soren121 opened this issue Jun 3, 2022 · 18 comments · Fixed by #18035
Closed

New soft mask method causes some text not to be rendered #14982

soren121 opened this issue Jun 3, 2022 · 18 comments · Fixed by #18035
Assignees

Comments

@soren121
Copy link

soren121 commented Jun 3, 2022

Half of the text on this restaurant menu PDF does not render in current versions of PDF.js. I remembered that this PDF used to render correctly in Firefox, so I ran git bisect and identified commit 2d1f9ff as the change which introduced the regression.


Attach (recommended) or Link to PDF file here: Tillamook-Market-Menu-04.04.22.pdf

Configuration:

  • Web browser and its version: Firefox 101.0 arm64
  • Operating system and its version: macOS 12.4
  • PDF.js version: git master (1953967)
  • Is a browser extension: No

Steps to reproduce the problem:

  1. Open the PDF.
  2. Observe that roughly half the text on the menu does not appear.

What is the expected behavior? (add screenshot)
Screen Shot 2022-06-02 at 8 47 03 PM

What went wrong? (add screenshot)
Screen Shot 2022-06-02 at 8 47 12 PM

Link to a viewer (if hosted on a site other than mozilla.github.io/pdf.js or as Firefox/Chrome extension): n/a

@Snuffleupagus
Copy link
Collaborator

Snuffleupagus commented Jun 3, 2022

Interestingly enough, this seems to be somehow connected to the current canvas size and/or the zoom level. At around 50%, and below, the document actually renders correctly.

@marco-c
Copy link
Contributor

marco-c commented Jun 3, 2022

I filed a bug on Bugzilla so we can track the regression, https://bugzilla.mozilla.org/show_bug.cgi?id=1772498.

@marco-c
Copy link
Contributor

marco-c commented Jun 3, 2022

I can reproduce on my machine too. At 20% zoom some additional text appears, at 10% all text appears.

@marco-c
Copy link
Contributor

marco-c commented Jul 19, 2022

@Snuffleupagus since this is a regression from #14175 and you reviewed that one, you probably have the most context here. Would you mind looking into this?

@VictoriaSarycheva
Copy link

Hello! We have a 2 big customers with the same problem, please, post here any updates (@Snuffleupagus would be so much appreciated)

@Snuffleupagus
Copy link
Collaborator

Snuffleupagus commented Sep 12, 2022

since this is a regression from #14175 and you reviewed that one, you probably have the most context here. Would you mind looking into this?

While I reviewed that one, I don't really have any deeper understanding of the code (since the SMask-handling is often quite complex). To make matters worse the test-cases are usually very big and complicated, as happens to be the case here, which makes debugging challenging.
Hence I'm not entirely sure where to start debugging something like this, and I also don't really have enough spare time for something like this :-(

We have a 2 big customers with the same problem, please, post here any updates

Please keep in mind that this is an open-source project that you're able to use for free. Hence, rather than expecting unpaid contributors to work for free, have you considered letting one (or more) of your developers help out with debugging or even fixing this issue!?

@Snuffleupagus
Copy link
Collaborator

Having managed to create a slightly smaller test-case, see issue14982_reduced.pdf, I now understand more specifically what breaks. Although I've not managed to (immediately) see how to fix this, however the problem is connected to SMasks larger than

const MAX_GROUP_SIZE = 4096;

since disabling the following code "fixes" this, and the duplicate, issue

pdf.js/src/display/canvas.js

Lines 2665 to 2672 in 987062c

if (drawnWidth > MAX_GROUP_SIZE) {
scaleX = drawnWidth / MAX_GROUP_SIZE;
drawnWidth = MAX_GROUP_SIZE;
}
if (drawnHeight > MAX_GROUP_SIZE) {
scaleY = drawnHeight / MAX_GROUP_SIZE;
drawnHeight = MAX_GROUP_SIZE;
}

Looking at the old code, we were previously taking the smask.scaleX/smask.scaleY into account during composeSMask. Unfortunately trying to restore, a variant of, that code doesn't work and most likely other (possibly larger) changes are required.

@tomaszczura
Copy link

Any plans on fixing that?

@wmerobertson
Copy link

Hi @Snuffleupagus - I am working on fixing this issue. I see that disabling the below does "fix" the issue.

pdf.js/src/display/canvas.js

Lines 2665 to 2672 in 987062c

if (drawnWidth > MAX_GROUP_SIZE) {
scaleX = drawnWidth / MAX_GROUP_SIZE;
drawnWidth = MAX_GROUP_SIZE;
}
if (drawnHeight > MAX_GROUP_SIZE) {
scaleY = drawnHeight / MAX_GROUP_SIZE;
drawnHeight = MAX_GROUP_SIZE;
}

Why do we need that code? If I leave it commented out and view other pdfs I have, they still look good. I'm still ramping up on pdf.js so apologies if this is a basic question. Any information you have on this would be great. Thank you!

@calixteman
Copy link
Contributor

If I understand correctly:

So I'd be in favor of removing this limit.
@Snuffleupagus, wdyt ?

@Snuffleupagus
Copy link
Collaborator

The way that I'm reading this is that it wasn't directly linked to WebGL support, but more about reducing overall memory usage (and improving performance) when handling SMasks, given that the limit was added in its own commit: 5262e6f

So I'd be in favor of removing this limit.

Would it still not be a good idea to actually debug and fix this regression first, before we consider outright removing the limit?

What worries me here: Imagine if we just remove the limit, and that leads to issues for users e.g. on lower powered/memory hardware? In that case we might need to re-add the limit, which would cause this bug (and its duplicates) to re-appear.
To me, that situation wouldn't feel great.

@calixteman
Copy link
Contributor

Tbh I don't really know what's the best here.
I agree with you about the memory but having this limit and consequently an upscale could induce some bluriness.
Anyway, I think I found what's wrong.

@calixteman calixteman self-assigned this Mar 11, 2023
@Snuffleupagus
Copy link
Collaborator

One quick data-point regarding the MAX_GROUP_SIZE value:

  • Rendering the PDF document from New soft mask method causes some text not to be rendered #14982 (comment) takes ~10 seconds for me, against the master branch.
  • Doubling the MAX_GROUP_SIZE value, i.e. setting it to 8192 instead, makes rendering take ~14 seconds.
  • Removing the limit altogether makes rendering take ~18 seconds, and causes the browser be sluggish for the duration.

Given that this is on up-to-date and powerful hardware, I can't imagine that e.g. a mobile phone will handle an increased limit very well. Maybe we should consider increasing the limit a little bit by default, but keep the old value for the GECKOVIEW build?

@calixteman
Copy link
Contributor

In term of performance, the main issue is the use getImageData/putImageData.
So if we want to see major improvements we should try to remove them.
For example, in reading this:

function composeSMaskLuminosity(maskData, layerData, transferMap) {
const length = maskData.length;
for (let i = 3; i < length; i += 4) {
const y =
maskData[i - 3] * 77 + // * 0.3 / 255 * 0x10000
maskData[i - 2] * 152 + // * 0.59 ....
maskData[i - 1] * 28; // * 0.11 ....
layerData[i] = transferMap
? (layerData[i] * transferMap[y >> 8]) >> 8
: (layerData[i] * y) >> 16;
}
}

I think we could apply:

  • a matrix filter (a feColorMatrix) on the mask.
  • a transfer filter if any (like we do for the transfer maps stuff)
  • and use aglobalCompositeOperation = "multiply" when drawing an image on the other.

And probably we could do some similar stuff for:

function composeSMaskBackdrop(bytes, r0, g0, b0) {
const length = bytes.length;
for (let i = 3; i < length; i += 4) {
const alpha = bytes[i];
if (alpha === 0) {
bytes[i - 3] = r0;
bytes[i - 2] = g0;
bytes[i - 1] = b0;
} else if (alpha < 255) {
const alpha_ = 255 - alpha;
bytes[i - 3] = (bytes[i - 3] * alpha + r0 * alpha_) >> 8;
bytes[i - 2] = (bytes[i - 2] * alpha + g0 * alpha_) >> 8;
bytes[i - 1] = (bytes[i - 1] * alpha + b0 * alpha_) >> 8;
}
}
}
function composeSMaskAlpha(maskData, layerData, transferMap) {
const length = maskData.length;
const scale = 1 / 255;
for (let i = 3; i < length; i += 4) {
const alpha = transferMap ? transferMap[maskData[i]] : maskData[i];
layerData[i] = (layerData[i] * alpha * scale) | 0;
}
}

I didn't test anything (it's on my todo) but I think it could work.
I'm not super excited by the idea of adding some bluriness but in the mean time I'm concerned by the memory use in general.

I do wonder if we could just draw on the suspendedCanvas what we've to draw (strings, lines, ...) in just setting the correct filters and the composite operation correctly: this way we don't have any memory issues, in term of perf, it should almost nothing... Finally it could just work like for the transfer map stuff and the spider (#16115), when I wrote this patch I was just amazed (and I didn't expect it was so easy) to see that it works just with few lines of code. Few months ago we would have to create a canvas, draw the spider, then apply the transfer on the pixels and finally draw the image back to a suspendedCanvas... I've the impress that this exactly what we're doing right now for the smask stuff.

@wmerobertson
Copy link

Thanks for looking into this @Snuffleupagus and @calixteman.

@Snuffleupagus - you mentioned:

Maybe we should consider increasing the limit a little bit by default

If the document is very tall, it still gets clipped if the document is smaller than MAX_GROUP_SIZE. I'm not sure if increasing the limit a little bit would work.

What would you all think about adding maxGroupSize as an option in PDFViewerOptions? I tried adding this as an option in my fork and passing it through to CanvasGraphics. I set the value to the height of the tall pdf I am trying to render and it seems to work for that use case.

@wmerobertson
Copy link

What would you all think about adding maxGroupSize as an option in PDFViewerOptions? I tried adding this as an option in my fork and passing it through to CanvasGraphics. I set the value to the height of the tall pdf I am trying to render and it seems to work for that use case.

Another (better?) option here may be to add something like useGroupSize as a boolean in PDFViewerOptions and then just skip the block at pdf.js/src/display/canvas.js if true.

@calixteman - it sounds like you have an idea for a more robust approach but I wanted to present this idea as well.

@marco-c
Copy link
Contributor

marco-c commented May 1, 2024

Are we OK increasing or removing MAX_GROUP_SIZE after the performance improvements from #18029?

@calixteman
Copy link
Contributor

I think we must remove this limit:

So I think we must at least try, wait and see if users complain.

@calixteman calixteman self-assigned this May 1, 2024
calixteman added a commit to calixteman/pdf.js that referenced this issue May 1, 2024
…r not

It fixes issues mozilla#14982 and mozilla#14724.
The main problem of upscaling a canvas is that it can induces some pixelation
(see issue mozilla#14724). So this patch is just removing the limit and as a side
effect it fixes issue mozilla#14982.
As far as I can tell, in looking different profiles (especially some memory profile)
in using the Firefox profiler, I don't see any noticeable difference in term of
memory use.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Closed
Development

Successfully merging a pull request may close this issue.

7 participants