-
-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix non LGC glyphs in avatars and txt file previews #25529
Fix non LGC glyphs in avatars and txt file previews #25529
Conversation
/backport to stable21 |
/backport to stable20 |
/backport to stable19 |
🤖 beep boop beep 🤖 Here are the logs for the failed build: Status of 2026: failurenodbShow full log
|
CI is not happy. |
Generated avatars as well as text file previews are rendered using the "core/fonts/NotoSans-Regular.ttf" font. The file was the standard hinted "NotoSans-Regular.ttf" file from https://www.google.com/get/noto/. However that file does not cover some non LGC (Latin, Greek, Cyrillic) scripts, like Arabic, Devanagari or Hebrew, to name a few. Markdown file previews also use "core/fonts/NotoSans-Bold.ttf", which is in the same situation as the regular one. Due to limitations in the TTF format it is not possible to provide a single file for each style that includes all Noto fonts. However, it is possible to add more scripts to the standard "NotoSans-Regular.ttf" and "NotoSans-Bold.ttf" files (although no CJK (Chinese, Japanese, Korean) glyph can be included due to the aforementioned limitations). This commit replaces the standard files with an extended version created using the Noto Tools. The build script (as well as a patch for the Noto Tools) is also included for reference and to be able to update the font files in the future if needed. Due to the additional scripts added the font files are now much larger, although this does not seem to increase the time spent rendering LGC scripts. Note that the file for the bold style still contains less scripts than the regular one, as not all scripts supported by Noto have a bold weight. Signed-off-by: Daniel Calviño Sánchez <danxuliu@gmail.com>
5c652ff
to
1713d28
Compare
Rebased on master, guest avatar test file regenerated with the updated font, and psalm baseline updated based on the ignored error for BackgroundCleanupJob that the command is based on. |
The command is meant to be used when the fonts used to render texts ("core/fonts/NotoSans-Regular.ttf" and "core/fonts/NotoSans-Bold.ttf") are changed (for example, to add support for other scripts). The avatar and text file previews will be removed, so they will be generated again with the updated font when needed. Signed-off-by: Daniel Calviño Sánchez <danxuliu@gmail.com>
1713d28
to
9f96a47
Compare
Somebody forgot to execute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🐘
if you had to rebuild the font with a patched version of Noto Tools, does it mean we are the first ever to meet this kind of issue ? For CJK as far as I know, at least on desktop, there's "font substitution" logic built-in where if a glyph is missing in one font it will pick it in another. This makes it possible to mix different fonts. (I remember that Linux desktops had a bug where sometimes font substitution mixed Japanese with Chinese glyphs which rendered those with a different font style and made it ugly, haven't seen this issue since). |
The backport to stable19 failed. Please do this backport manually. |
(Sorry for the wall of text, but hey, it has an image too!)
Generated avatars as well as text file previews are rendered using the
core/fonts/NotoSans-Regular.ttf
font. The file was the standard hintedNotoSans-Regular.ttf
file from https://www.google.com/get/noto/. However that file does not cover some non LGC (Latin, Greek, Cyrillic) scripts, like Arabic, Devanagari or Hebrew, to name a few.Markdown file previews also use
core/fonts/NotoSans-Bold.ttf
, which is in the same situation as the regular one.Due to limitations in the TTF format it is not possible to provide a single file for each style that includes all Noto fonts. However, it is possible to add more scripts to the standard
NotoSans-Regular.ttf
andNotoSans-Bold.ttf
files (although no CJK (Chinese, Japanese, Korean) glyph can be included due to the aforementioned limitations).This pull request replaces the standard files with an extended version created using the Noto Tools. The build script (as well as a patch for the Noto Tools) is also included for reference and to be able to update the font files in the future if needed.
Besides that an OCC command was added to easily remove generated avatars and text files. This should ease updating the rendered texts in those instances that used scripts not supported before in the fonts.
Due to the additional scripts added the font files are now much larger, although this does not seem to increase the time spent rendering LGC scripts. Due to event handling by the CardDAV converter the command to reset the avatars also causes the avatars to be generated again, and in my tests running it with 1000 users did not increase the time to finish the command. Of course there are a lot of other operations executed in that case, it is not an isolated performance test of the rendering time, but at least it shows that any extra time needed to render the texts with a larger font is not noticeable in comparison with the rest of operations.
Note that the file for the bold style still contains less scripts than the regular one, as not all scripts supported by Noto have a bold weight.
Right-to-left languages
libgd, the library used to render texts to images, added support for complex text layouts in 2.3.0. However, it is not enough to use libgd 2.3.0; it must be compiled with libraqm support (as well as FreeType support). If libraqm is not available then right-to-left text will be rendered left-to-right, which is obviously wrong (specially because it seems that for example in the Arabic script the glyph of a character changes depending on the following character, see screenshot above).
See below for an example using Docker to rebuild libgd with libraqm support.
However, even if right-to-left text is properly rendered with libraqm please note that the text will still be left aligned in the previews. It would be necessary to check if the characters of a paragraph are mostly left-to-right or right-to-left (it seems that PHP does not provide a convenient way to do that :-( ) and then align the paragraph to one side or the other. That would be a simplification, a proper implementation would be quite more involved... but let's not forget that we are taking just about previews here ;-)
In any case, right-to-left text is not aligned to the right in the text editor from Nextcloud, so for the time being right-to-left text is also aligned to the left in previews to make it look similar (yes, it is a lame excuse, you do not need to point it out :-P ).
Chinese, Japanese and Korean scripts
#4198 is NOT fixed by this pull request, as CJK glyphs are not properly rendered yet. As mentioned above this is due to technical limitations of the TrueType format. Specifically, the problem is that TrueType files can not contain more than 65535 glyphs, and the font for Chinese, Japanese and Korean scripts alone already includes 65535 glyphs.
The only possible way (as far as I know) to support CJK scripts along with other scripts (beyond basic LGC scripts, as they are included in the CJK font) would be to ship both the non-CJK and the CJK fonts and then use one or the other to render each paragraph depending on the characters in it.
Of course the fun would not stop there ;-) Turns out that the same character may be rendered using a different glyph depending on the language of the text (although it seems that it happens too in other scripts, like Cyrillic when written in Bulgarian or in Russian). The Noto fonts support that, but it must be supported too by the rendering engine and, although I have not checked, I do not think that it is supported by libgd (I guess that the functions to render text would need a parameter to specify the locale).
For that reason Noto provides the CJK fonts in four different variants; all the files include the same glyphs, but each one uses one language variant by default if the "locl" property is not specified. However, that would still require to ship the four different CJK font files (which are ~16 MiB each :-O ) and also switch between them when rendering depending on the locale.
And all that is even ignoring that those scripts (and others, like Mongolian) can be written vertically instead of horizontally 🤷
So... if someone wants to implement all that please be my guest. It is far above my limited knowledge in this field :-)
Example of how to rebuild libgd with libraqm support to test this pull request: