-
Notifications
You must be signed in to change notification settings - Fork 259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added info about an arabic script fix, fixed typo #490
Conversation
Codecov Report
@@ Coverage Diff @@
## master #490 +/- ##
==========================================
- Coverage 92.11% 91.96% -0.16%
==========================================
Files 23 23
Lines 6684 6555 -129
Branches 1366 1333 -33
==========================================
- Hits 6157 6028 -129
Misses 299 299
Partials 228 228
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️ Thank you for this PR!
Would you mind please also adding a short mention of this workaround in CHANGELOG.md
?
Something like this:
## [2.5.7] - not released yet
### Added
- workaround by @semaeostomea to support the arabic script: [link to documentation](https://pyfpdf.github.io/fpdf2/Unicode.html#arabic-script-workaround)
@all-contributors please add @semaeostomea for documentation |
I've put up a pull request to add @semaeostomea! 🎉 |
Welcome @semaeostomea The fact that arabic_reshaper can prepare text sufficiently for us to process directly indicates that there are seperate Unicode code points for each positional variant. Given that, your solution should really be integrated in fpdf2 directly. For the time being it is of course great to have it documented as a workaround. Do you happen to know of any other scripts where all positional variants have seperate code points? If there are any, a final solution should treat them all this way. I happen to be somewhat familiar with Mongolian, and I haven't found any positional code points for that (or Uighur, Manchu, etc.). In cases like this, we'd have to rely on substitutions offered by the font file. Can you confirm this? I'm not very familiar with Hangul. Since those caracters are technically composites and require special input methods to type, I expected them to be handled like ligatures in Unicode as well. Your remark indicates that I was mistaken about this, and each combination is indeed a seperate code point. So much the better! Sorry about the hindic/indic confusion, I tend to fall for that once in a while... 😉 |
CHANGELOG.md
Outdated
@@ -394,3 +394,7 @@ prevented strings passed first to the text-rendering methods to be displayed. | |||
* turned `accept_page_break` into a property | |||
* unit tests now use the standard `unittest` lib | |||
* massive code cleanup using `flake8` | |||
|
|||
## [2.5.7] - not released yet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the addition.
This should be at the top of the CHANGELOG.md
file though :)
You may have to rebase your branch because I just merged PR #492 that added an entry to this file too
This GitHub guide may help you to rebase the doc-arabic-fix
branch of your fork repo:
https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork
I just changed "Arabic script workaround" to "Right to Left & Arabic script workaround" because it generally also fixes the out-of-order problem of RTL scripts with bidi. (I forgot to add a proper commit message because I was on auto-mode and just quickly pushed like I do with my private repo, it's the "u" commit) |
This PR looks great to me! @gmischler: would you add anything or are your OK to merge it? |
As far as I understand, Mongolian (et. al) Unicode is complete for the isolated forms.
Shouldn't some type of Unicode normalization fix those diacritics? Come to think of it, I'll have to test later if this would actually solve #459. |
No obstacles that I can see. |
The Mongolian Unicode is complete, but there are some problems with it. I don't understand the underlying mechanism enough, I just know that it can result in the wrong form being displayed sometimes.
This can fix come diacritics but unfortunately not all. I can imagine it would fix Czech diacritics because those exist as a separate code point when combined with the letter (if that's how it works). That's not the case for some diacritics and tone marks in Diné bizaad because they're not used in any other language combined like that
|
related to #487
As requested I added the info and code snippet for the arabic script fix to
docs/Text.md
anddocs/Unicode.md
I also noticed that Hangul was mentioned as a script that's not supported, which isn't the case as you can see here:
which is why I removed it, I added Kannada and Tamil as additional examples instead because they were mentioned in #474 and #365
I also changed "Hindic" to "Indic" (note: Hindi is one of many Indian languages and uses Devanagari, "Hindic scripts" do not exist, I assume this was a typo)
CHANGELOG.md
^ This is my first pull request, I'm not sure if something like this warrants a changelog entry
By submitting this pull request, I confirm that my contribution is made under the terms of the GNU LGPL 3.0 license.