Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle multilingual strings to improve text shaping results (fix #1187) #1193

Merged
merged 4 commits into from
Jun 6, 2024

Conversation

andersonhc
Copy link
Collaborator

As pointed in #1187, if the text has different languages, harfbuzz will auto-detect and shape using the first script found.

This change includes the Unicode Scripts table into fpdf2 and breaks the input string into different fragments that are shaped individually if multiple scripts are found.

Having fragments being "script aware" will also be useful in the future to implement automatic text wrapping.

Checklist:

  • The GitHub pipeline is OK (green),
    meaning that both pylint (static code analyzer) and black (code formatter) are happy with the changes of this PR.

  • A unit test is covering the code added / modified by this PR

  • This PR is ready to be merged

  • In case of a new feature, docstrings have been added, with also some documentation in the docs/ folder

  • A mention of the change is present in CHANGELOG.md

By submitting this pull request, I confirm that my contribution is made under the terms of the GNU LGPL 3.0 license.

@andersonhc andersonhc requested a review from gmischler as a code owner June 4, 2024 12:51
@andersonhc andersonhc changed the title Handle multilingual string to improve text shaping results (fix #1187) Handle multilingual strings to improve text shaping results (fix #1187) Jun 4, 2024
@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 99.46524% with 1 line in your changes missing coverage. Please review.

Project coverage is 93.29%. Comparing base (2b866d8) to head (acb6af1).
Report is 15 commits behind head on master.

Current head acb6af1 differs from pull request most recent head 45074fd

Please upload reports for the commit 45074fd to get more accurate results.

Files Patch % Lines
fpdf/unicode_script.py 99.43% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1193      +/-   ##
==========================================
+ Coverage   93.25%   93.29%   +0.03%     
==========================================
  Files          30       31       +1     
  Lines        9253     9524     +271     
  Branches     2104     2135      +31     
==========================================
+ Hits         8629     8885     +256     
- Misses        385      393       +8     
- Partials      239      246       +7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@gmischler gmischler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Painful having to do a linear search, but I guess it's not worth the effort here to build a range tree (given the caching).

Oh, and in case you care about 100% coverage, I think Codecov would like to see a test case that actually exhausts UNICODE_RANGE_TO_SCRIPT.

@andersonhc andersonhc merged commit fbbb3f7 into py-pdf:master Jun 6, 2024
11 checks passed
@andersonhc andersonhc deleted the unicode-script branch November 11, 2024 02:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants