Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text shaping #820

Merged
merged 40 commits into from
Aug 2, 2023
Merged

Text shaping #820

merged 40 commits into from
Aug 2, 2023

Conversation

andersonhc
Copy link
Collaborator

@andersonhc andersonhc commented Jun 14, 2023

New feature: use Harfbuzz to perform text shaping allowing fpdf2 to produce files using complex font features like ligatures, kerning, character replacement and positioning.

Fixes #365 #540 #549 #679 #700 #812

Checklist:

  • Initial draft
  • Allow users to choose features (ligatures, kerning...)
  • Allow users to choose text direction and language in addition to harfbuzz best guess logic
  • Fix font subset to correct missing character detection (so missing glyph and fallback font work again)
  • Adapt word stretching and character stretching logic
  • Create unit tests covering all new features
  • Document how to use all the new features
  • This PR is ready to be merged
  • A mention of the change is present in CHANGELOG.md

By submitting this pull request, I confirm that my contribution is made under the terms of the GNU LGPL 3.0 license.

@andersonhc andersonhc requested a review from Lucas-C as a code owner June 14, 2023 03:07
Copy link
Member

@Lucas-C Lucas-C left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job @andersonhc! 👏

This looks very promising, I'm eager to see the final version of this work to be release 😊

fpdf/fonts.py Outdated Show resolved Hide resolved
fpdf/fonts.py Show resolved Hide resolved
fpdf/fpdf.py Outdated Show resolved Hide resolved
fpdf/fonts.py Outdated Show resolved Hide resolved
fpdf/line_break.py Show resolved Hide resolved
fpdf/output.py Outdated Show resolved Hide resolved
test/text_shaping/test_text_shaping.py Outdated Show resolved Hide resolved
test/text_shaping/test_text_shaping.py Outdated Show resolved Hide resolved
fpdf/fonts.py Outdated Show resolved Hide resolved
fpdf/fonts.py Outdated Show resolved Hide resolved
@Lucas-C
Copy link
Member

Lucas-C commented Jun 14, 2023

In the GitHub Actions pipeline, Pylint raises a few issues:

$ pylint fpdf test tutorial/tuto*.py
************* Module fpdf.graphics_state
fpdf/graphics_state.py:322:4: E0102: method already defined line 318 (function-redefined)
************* Module fpdf.fpdf
fpdf/fpdf.py:89:0: W0611: Unused escape_parens imported from util (unused-import)
************* Module fpdf.fonts
fpdf/fonts.py:15:0: W0702: No exception type(s) specified (bare-except)
fpdf/fonts.py:14:4: E0401: Unable to import 'uharfbuzz' (import-error)
fpdf/fonts.py:74:4: R6301: Method could be a function (no-self-use)
fpdf/fonts.py:182:11: C0121: Comparison 'hb == None' should be 'hb is None' (singleton-comparison)
fpdf/fonts.py:210:11: C0121: Comparison 'hb == None' should be 'hb is None' (singleton-comparison)
fpdf/fonts.py:331:21: C0201: Consider iterating the dictionary directly instead of calling .keys() (consider-iterating-dictionary)
fpdf/fonts.py:337:21: C0201: Consider iterating the dictionary directly instead of calling .keys() (consider-iterating-dictionary)
************* Module test.fonts.test_font_remap
test/fonts/test_font_remap.py:11:17: E1120: No value for argument 'identities' in constructor call (no-value-for-parameter)
test/fonts/test_font_remap.py:28:4: C0206: Consider iterating with .items() (consider-using-dict-items)
test/fonts/test_font_remap.py:29:8: C0206: Consider iterating with .items() (consider-using-dict-items)
************* Module test.text_shaping.test_text_shaping
test/text_shaping/test_text_shaping.py:3:0: W0611: Unused FPDFException imported from fpdf (unused-import)
test/text_shaping/test_text_shaping.py:4:0: W0611: Unused LOREM_IPSUM imported from test.conftest (unused-import)

@Lucas-C
Copy link
Member

Lucas-C commented Jun 19, 2023

I that see you have been working hard on this PR!

The VeraPDF checker complains about this reference PDF file:

- 6.3.6-1 (1): ./test/text_shaping/multi_cell_markdown_with_styling.pdf

This is the rule that applies:
https://github.com/veraPDF/veraPDF-validation-profiles/wiki/PDFA-Part-1-rules#rule-636-1

For every font embedded in a conforming file and used for rendering, the glyph width information in the font dictionary and in the embedded font program shall be consistent.

Do you see why glyph widths could be inconsistent?

@andersonhc
Copy link
Collaborator Author

I that see you have been working hard on this PR!

The VeraPDF checker complains about this reference PDF file:

- 6.3.6-1 (1): ./test/text_shaping/multi_cell_markdown_with_styling.pdf

This is the rule that applies: https://github.com/veraPDF/veraPDF-validation-profiles/wiki/PDFA-Part-1-rules#rule-636-1

For every font embedded in a conforming file and used for rendering, the glyph width information in the font dictionary and in the embedded font program shall be consistent.

Do you see why glyph widths could be inconsistent?

I was storing the glyph advance harfbuzz returns (x_advance) as the glyph width, but it's not working because the same glyph was returning different widths, that's why Vera is complaining.

I did some digging and turns out the x_advance can be different because of different factors like kerning.

I'm working on storing the width from fonttools as glyph width and create a new "force positioning" tag to force the text matrix to be manually set if the x_advance is not equal the glyph width, it should be fixed soon.

My next challenge now is working on "right-to-left" texts like Arabic.

@codecov
Copy link

codecov bot commented Jun 21, 2023

Codecov Report

Patch coverage: 91.11% and project coverage change: -0.19% ⚠️

Comparison is base (dface79) 93.39% compared to head (18608fd) 93.21%.

❗ Current head 18608fd differs from pull request most recent head 5bb3c5e. Consider uploading reports for the commit 5bb3c5e to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #820      +/-   ##
==========================================
- Coverage   93.39%   93.21%   -0.19%     
==========================================
  Files          27       27              
  Lines        7407     7576     +169     
  Branches     1335     1375      +40     
==========================================
+ Hits         6918     7062     +144     
- Misses        307      326      +19     
- Partials      182      188       +6     
Files Changed Coverage Δ
fpdf/fpdf.py 92.55% <68.18%> (-0.42%) ⬇️
fpdf/graphics_state.py 98.76% <83.33%> (-0.60%) ⬇️
fpdf/fonts.py 92.30% <91.36%> (-3.46%) ⬇️
fpdf/line_break.py 98.07% <94.66%> (-1.11%) ⬇️
fpdf/output.py 97.18% <100.00%> (-0.05%) ⬇️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@andersonhc andersonhc changed the title Text shaping draft Text shaping Jun 21, 2023
@andersonhc
Copy link
Collaborator Author

VeraPDF is complaining of the following non-whitelisted errors:

6.2.3.2-1 - All ICCBased colour spaces shall be embedded as ICC profile streams as described in PDF Reference 4.5
6.2.3.3-1 - DeviceRGB may be used only if the file has a PDF/A-1 OutputIntent that uses an RGB colour space
6.2.3.3-2 - DeviceCMYK may be used only if the file has a PDF/A-1 OutputIntent that uses a CMYK colour space
6.2.3.3-3 - If an uncalibrated colour space is used in a file then that file shall contain a PDF/A-1 OutputIntent, as defined in 6.2.2
6.7.3-2 - The value of Title entry from the document information dictionary, if present, and its analogous XMP property dc:title['x-default'] shall be equivalent.
6.7.3-3 - The value of Author entry from the document information dictionary, if present, and its analogous XMP property dc:creator shall be equivalent. dc:creator shall contain exactly one entry
6.7.3-4 - The value of Subject entry from the document information dictionary, if present, and its analogous XMP property dc:description['x-default'] shall be equivalent.
6.7.3-5 - The value of Keywords entry from the document information dictionary, if present, and its analogous XMP property pdf:Keywords shall be equivalent.
6.7.3-6 - The value of Creator entry from the document information dictionary, if present, and its analogous XMP property xmp:CreatorTool shall be equivalent.
6.7.3-7 - The value of Producer entry from the document information dictionary, if present, and its analogous XMP property pdf:Producer shall be equivalent.

@andersonhc
Copy link
Collaborator Author

@Lucas-C
Our last run that didn't get errors in VeraPDF was
https://github.com/PyFPDF/fpdf2/actions/runs/5315507629/jobs/9623956434

It was downloading VeraPDF 1.22.3

This one right now is downloading VeraPDF 1.24.1
On the 1.24 release Vera changed some color space and XMP metadata validations.
https://github.com/veraPDF/veraPDF-library/blob/integration/RELEASENOTES.md

See the errors we are getting on my post above.

Should I whitelist these errors on verapdf-ignore.json or do you want to look at these files first?

fpdf/output.py Outdated Show resolved Hide resolved
@Lucas-C
Copy link
Member

Lucas-C commented Aug 1, 2023

Hi @andersonhc!

Sorry for the delay. I finally took the time to review this PR.

I made a few comments, but I'm mostly curious to know what you think about this PR status:
do you want to work on some final points before merging it?

andersonhc and others added 6 commits August 1, 2023 23:27
Co-authored-by: Lucas Cimon <925560+Lucas-C@users.noreply.github.com>
Co-authored-by: Lucas Cimon <925560+Lucas-C@users.noreply.github.com>
Co-authored-by: Lucas Cimon <925560+Lucas-C@users.noreply.github.com>
@andersonhc
Copy link
Collaborator Author

Hi @andersonhc!

Sorry for the delay. I finally took the time to review this PR.

I made a few comments, but I'm mostly curious to know what you think about this PR status: do you want to work on some final points before merging it?

I believe I implemented all I had planned for this PR and I'm happy with the result.

@Lucas-C Lucas-C merged commit b671cb6 into py-pdf:master Aug 2, 2023
@Lucas-C
Copy link
Member

Lucas-C commented Aug 2, 2023

Merged! Thank you for this excellent contribution @andersonhc 👍

@Lucas-C
Copy link
Member

Lucas-C commented Aug 2, 2023

@allcontributors please add @andersonhc for doc, ideas

@allcontributors
Copy link

@Lucas-C

I've put up a pull request to add @andersonhc! 🎉

@gmischler
Copy link
Collaborator

Very cool addition, which will increase fpdf2s potential target audience significantly!
As far as I'm aware, no other Python based PDF generator has this capability.

I'll just update "Text.md", so you'll want to wait for another quick PR before sending the next release

@Lucas-C
Copy link
Member

Lucas-C commented Aug 2, 2023

I'll just update "Text.md", so you'll want to wait for another quick PR before sending the next release

Alright!
Take your time, I plan to make the release on Friday: #856 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Hindi text is rendered incorrectly
4 participants