-
Notifications
You must be signed in to change notification settings - Fork 10k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert uniXXXX
glyph names to proper ones when building the charCodeToGlyphId
map for TrueType fonts (bug 1132849, issue 6893, issue 6894)
#7069
Convert uniXXXX
glyph names to proper ones when building the charCodeToGlyphId
map for TrueType fonts (bug 1132849, issue 6893, issue 6894)
#7069
Conversation
Awesome work! I can confirm that this fixes both issues I filed. The text is also more crisp and more like how other PDF viewers render the files (what I see is best seen in the reference images on Linux from above). The original and the reduced test case from the Bugzilla issue already seem to be fixed for me with the current master (at least on Arch Linux when comparing PDF.js with Okular), but nevertheless it's good to include it as a reduced test case. @brendandahl Could you please review this? |
For me there're slight differences with Although there's now a couple of somewhat similar test-cases, which might be unnecessary. The already existing However, I didn't want to remove any existing tests in this patch. |
I've update the patch slightly, to only attempt this conversion for the /botio test |
This is looking good. I was curious how other PDF viewers handle this and found https://pdfium.googlesource.com/pdfium/+/master/core/src/fxge/freetype/fx_freetype.cpp#56 . Seems like it would be a good idea to make a general function for getting the unicode from glyph name. There function handles names like 'u123456' and a few other cases. Now I wonder if we have any broken pdfs with names like the above. |
Sure, that sounds fine! The only one I can remember seeing in practice is @brendandahl How closely do you want that function to mimic what the FreeType code does? |
I've added a @brendandahl Is this what you had in mind? /botio-linux preview |
From: Bot.io (Linux)ReceivedCommand cmd_preview from @Snuffleupagus received. Current queue size: 0 Live output at: http://107.21.233.14:8877/b0e4ee870623fb9/output.txt |
From: Bot.io (Linux)SuccessFull output at http://107.21.233.14:8877/b0e4ee870623fb9/output.txt Total script time: 0.99 mins Published |
hexStr = name.substr(3); | ||
} else if (nameLen >= 5 && nameLen <= 7) { // 'uXXXX{XX}' | ||
hexStr = name.substr(1); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit easier to follow if we add an else
and return -1 early here.
Yes, looks very good. Just the minor nits above. |
…odeToGlyphId` map for TrueType fonts (bug 1132849, issue 6893, issue 6894) This patch adds a `getUnicodeForGlyph` helper function, which is used to recover Unicode values for non-standard glyph names. Some PDF generators, e.g. Scribus PDF, use improper `uniXXXX` glyph names which breaks the glyph mapping. We can avoid this by converting them to "standard" glyph names instead. Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1132849. Fixes 6893. Fixes 6894.
Thanks for the review, I've addressed the comments. /botio test |
From: Bot.io (Windows)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://107.22.172.223:8877/635be36d4835bb1/output.txt |
From: Bot.io (Linux)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://107.21.233.14:8877/29f3c11931563c4/output.txt |
From: Bot.io (Windows)SuccessFull output at http://107.22.172.223:8877/635be36d4835bb1/output.txt Total script time: 20.43 mins
|
From: Bot.io (Linux)FailedFull output at http://107.21.233.14:8877/29f3c11931563c4/output.txt Total script time: 22.14 mins
Image differences available at: http://107.21.233.14:8877/29f3c11931563c4/reftest-analyzer.html#web=eq.log |
r+ |
From: Bot.io (Windows)ReceivedCommand cmd_makeref from @brendandahl received. Current queue size: 0 Live output at: http://107.22.172.223:8877/c58317f2e65a163/output.txt |
From: Bot.io (Linux)ReceivedCommand cmd_makeref from @brendandahl received. Current queue size: 0 Live output at: http://107.21.233.14:8877/258818beaeb1a03/output.txt |
From: Bot.io (Windows)SuccessFull output at http://107.22.172.223:8877/c58317f2e65a163/output.txt Total script time: 20.21 mins
|
From: Bot.io (Linux)FailedFull output at http://107.21.233.14:8877/258818beaeb1a03/output.txt Total script time: 23.52 mins
|
/botio-linux makeref |
From: Bot.io (Linux)ReceivedCommand cmd_makeref from @brendandahl received. Current queue size: 0 Live output at: http://107.21.233.14:8877/a88dee7cbd7205b/output.txt |
From: Bot.io (Linux)FailedFull output at http://107.21.233.14:8877/a88dee7cbd7205b/output.txt Total script time: 23.29 mins
|
/botio-linux makeref |
From: Bot.io (Linux)ReceivedCommand cmd_makeref from @timvandermeij received. Current queue size: 0 Live output at: http://107.21.233.14:8877/56900e861a1f844/output.txt |
From: Bot.io (Linux)SuccessFull output at http://107.21.233.14:8877/56900e861a1f844/output.txt Total script time: 21.79 mins
|
Convert `uniXXXX` glyph names to proper ones when building the `charCodeToGlyphId` map for TrueType fonts (bug 1132849, issue 6893, issue 6894)
Thank you! |
Some PDF generators, e.g. Scribus PDF, use improper
uniXXXX
glyph names which breaks the glyph mapping. We can avoid this by converting them to "standard" glyph names instead.Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1132849.
Fixes #6893.
Fixes #6894.
/cc @timvandermeij