Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix non LGC glyphs in avatars and txt file previews #25529

Merged

Conversation

danxuliu
Copy link
Member

@danxuliu danxuliu commented Feb 8, 2021

Previews

(Sorry for the wall of text, but hey, it has an image too!)

Generated avatars as well as text file previews are rendered using the core/fonts/NotoSans-Regular.ttf font. The file was the standard hinted NotoSans-Regular.ttf file from https://www.google.com/get/noto/. However that file does not cover some non LGC (Latin, Greek, Cyrillic) scripts, like Arabic, Devanagari or Hebrew, to name a few.

Markdown file previews also use core/fonts/NotoSans-Bold.ttf, which is in the same situation as the regular one.

Due to limitations in the TTF format it is not possible to provide a single file for each style that includes all Noto fonts. However, it is possible to add more scripts to the standard NotoSans-Regular.ttf and NotoSans-Bold.ttf files (although no CJK (Chinese, Japanese, Korean) glyph can be included due to the aforementioned limitations).

This pull request replaces the standard files with an extended version created using the Noto Tools. The build script (as well as a patch for the Noto Tools) is also included for reference and to be able to update the font files in the future if needed.

Besides that an OCC command was added to easily remove generated avatars and text files. This should ease updating the rendered texts in those instances that used scripts not supported before in the fonts.

Due to the additional scripts added the font files are now much larger, although this does not seem to increase the time spent rendering LGC scripts. Due to event handling by the CardDAV converter the command to reset the avatars also causes the avatars to be generated again, and in my tests running it with 1000 users did not increase the time to finish the command. Of course there are a lot of other operations executed in that case, it is not an isolated performance test of the rendering time, but at least it shows that any extra time needed to render the texts with a larger font is not noticeable in comparison with the rest of operations.

Note that the file for the bold style still contains less scripts than the regular one, as not all scripts supported by Noto have a bold weight.

Right-to-left languages

libgd, the library used to render texts to images, added support for complex text layouts in 2.3.0. However, it is not enough to use libgd 2.3.0; it must be compiled with libraqm support (as well as FreeType support). If libraqm is not available then right-to-left text will be rendered left-to-right, which is obviously wrong (specially because it seems that for example in the Arabic script the glyph of a character changes depending on the following character, see screenshot above).

See below for an example using Docker to rebuild libgd with libraqm support.

However, even if right-to-left text is properly rendered with libraqm please note that the text will still be left aligned in the previews. It would be necessary to check if the characters of a paragraph are mostly left-to-right or right-to-left (it seems that PHP does not provide a convenient way to do that :-( ) and then align the paragraph to one side or the other. That would be a simplification, a proper implementation would be quite more involved... but let's not forget that we are taking just about previews here ;-)

In any case, right-to-left text is not aligned to the right in the text editor from Nextcloud, so for the time being right-to-left text is also aligned to the left in previews to make it look similar (yes, it is a lame excuse, you do not need to point it out :-P ).

Chinese, Japanese and Korean scripts

#4198 is NOT fixed by this pull request, as CJK glyphs are not properly rendered yet. As mentioned above this is due to technical limitations of the TrueType format. Specifically, the problem is that TrueType files can not contain more than 65535 glyphs, and the font for Chinese, Japanese and Korean scripts alone already includes 65535 glyphs.

The only possible way (as far as I know) to support CJK scripts along with other scripts (beyond basic LGC scripts, as they are included in the CJK font) would be to ship both the non-CJK and the CJK fonts and then use one or the other to render each paragraph depending on the characters in it.

Of course the fun would not stop there ;-) Turns out that the same character may be rendered using a different glyph depending on the language of the text (although it seems that it happens too in other scripts, like Cyrillic when written in Bulgarian or in Russian). The Noto fonts support that, but it must be supported too by the rendering engine and, although I have not checked, I do not think that it is supported by libgd (I guess that the functions to render text would need a parameter to specify the locale).

For that reason Noto provides the CJK fonts in four different variants; all the files include the same glyphs, but each one uses one language variant by default if the "locl" property is not specified. However, that would still require to ship the four different CJK font files (which are ~16 MiB each :-O ) and also switch between them when rendering depending on the locale.

And all that is even ignoring that those scripts (and others, like Mongolian) can be written vertically instead of horizontally 🤷

So... if someone wants to implement all that please be my guest. It is far above my limited knowledge in this field :-)

Example of how to rebuild libgd with libraqm support to test this pull request:

docker run --rm --interactive --tty --volume /home/dani/Nextcloud/nextcloud-server/:/var/www/html --volume /var/www/html/data --volume /var/www/html/config ubuntu:20.10 bash

# Enable deb-src repositories
sed -i 's/^# deb-src/deb-src/' /etc/apt/sources.list
apt-get update

# Prepare build system for packages
mkdir build
cd build
apt-get install -y build-essential fakeroot dpkg-dev

# Get sources of libgd3 and install build dependencies
apt-get source libgd3
apt-get build-dep -y libgd3
# Manual installation, it is not a dependency, but it is automatically used when
# building the package if found.
apt-get install -y libraqm-dev

# Rebuild the package (which now will use libraqm) and install it
cd libgd2-2.3.0
dpkg-buildpackage -rfakeroot -b
dpkg -i ../libgd3_2.3.0-2_amd64.deb

# Install standard PHP packages and Apache for Nextcloud
apt-get install -y php-intl php-cli php-curl php-pgsql php-zip php-xml php-mysql php-mbstring php-xdebug php-imagick php-apcu php-gd php-sqlite3 libapache2-mod-php

service apache2 restart

# Install Nextcloud
chown -R www-data /var/www/html/config /var/www/html/data
su --shell /bin/bash --login www-data
cd /var/www/html
php occ maintenance:install --admin-pass=admin
php occ config:system:set trusted_domains 0 --value={IP-OF-THE-CONTAINER}

@danxuliu
Copy link
Member Author

danxuliu commented Feb 8, 2021

/backport to stable21

@danxuliu
Copy link
Member Author

danxuliu commented Feb 8, 2021

/backport to stable20

@danxuliu
Copy link
Member Author

danxuliu commented Feb 8, 2021

/backport to stable19

@faily-bot
Copy link

faily-bot bot commented Feb 8, 2021

🤖 beep boop beep 🤖

Here are the logs for the failed build:

Status of 2026: failure

nodb

Show full log
There were 19 warnings:

1) Test\AppFramework\Controller\AuthPublicShareControllerTest::testAuthenticateAuthenticated
Trying to configure method "isAuthenticated" which cannot be configured because it does not exist, has not been specified, is final, or is static

2) Test\Support\Subscription\RegistryTest::testDelegateIsHardUserLimitReachedWithoutSupportAppAndUserCount with data set #0 (35, 15, 2, false)
Cannot stub or mock class or interface "Test\Support\Subscription\UserInterface" which does not exist

3) Test\Support\Subscription\RegistryTest::testDelegateIsHardUserLimitReachedWithoutSupportAppAndUserCount with data set #1 (35, 45, 15, false)
Cannot stub or mock class or interface "Test\Support\Subscription\UserInterface" which does not exist

4) Test\Support\Subscription\RegistryTest::testDelegateIsHardUserLimitReachedWithoutSupportAppAndUserCount with data set #2 (35, 45, 5, true)
Cannot stub or mock class or interface "Test\Support\Subscription\UserInterface" which does not exist

5) Test\Support\Subscription\RegistryTest::testDelegateIsHardUserLimitReachedWithoutSupportAppAndUserCount with data set #3 (35, 45, 55, false)
Cannot stub or mock class or interface "Test\Support\Subscription\UserInterface" which does not exist

6) OCA\DAV\Tests\Unit\DAV\Controller\InvitationResponseControllerTest::testAccept
Trying to configure method "fetch" which cannot be configured because it does not exist, has not been specified, is final, or is static

7) OCA\DAV\Tests\Unit\DAV\Controller\InvitationResponseControllerTest::testAcceptSequence
Trying to configure method "fetch" which cannot be configured because it does not exist, has not been specified, is final, or is static

8) OCA\DAV\Tests\Unit\DAV\Controller\InvitationResponseControllerTest::testAcceptRecurrenceId
Trying to configure method "fetch" which cannot be configured because it does not exist, has not been specified, is final, or is static

9) OCA\DAV\Tests\Unit\DAV\Controller\InvitationResponseControllerTest::testAcceptTokenNotFound
Trying to configure method "fetch" which cannot be configured because it does not exist, has not been specified, is final, or is static

10) OCA\DAV\Tests\Unit\DAV\Controller\InvitationResponseControllerTest::testAcceptExpiredToken
Trying to configure method "fetch" which cannot be configured because it does not exist, has not been specified, is final, or is static

11) OCA\DAV\Tests\Unit\DAV\Controller\InvitationResponseControllerTest::testDecline
Trying to configure method "fetch" which cannot be configured because it does not exist, has not been specified, is final, or is static

12) OCA\DAV\Tests\Unit\DAV\Controller\InvitationResponseControllerTest::testProcessMoreOptionsResult
Trying to configure method "fetch" which cannot be configured because it does not exist, has not been specified, is final, or is static

13) OCA\DAV\Tests\unit\CalDAV\CalendarTest::testConfidentialClassification with data set #0 (3, false)
No method rule is set

14) OCA\DAV\Tests\unit\DAV\Migration\RefreshWebcalJobRegistrarTest::testRun
Trying to configure method "fetch" which cannot be configured because it does not exist, has not been specified, is final, or is static

15) Warning
The data provider specified for ExpirationTest::testGetMaxAgeAsTimestamp is invalid.
PHPUnit\Util\Exception: Method timestampTestData does not exist

16) OCA\TwoFactorBackupCodes\Tests\Unit\Listener\ActivityPublisherTest::testHandleCodesGeneratedEvent
Method publish may not return value of type Mock_IEvent_551381b4, its return declaration is ": void"

17) OCA\UpdateNotification\Tests\Notification\BackgroundJobTest::testCreateNotifications with data set #1 ('app2', '1.0.1', '1.0.0', '1.0.0', true, array('user1'), array(array('user1')))
Method notify may not return value of type Mock_INotification_6e0f29fe, its return declaration is ": void"

18) OCA\UpdateNotification\Tests\Notification\BackgroundJobTest::testCreateNotifications with data set #2 ('app3', '1.0.1', false, false, true, array('user2', 'user3'), array(array('user2'), array('user3')))
Method notify may not return value of type Mock_INotification_6e0f29fe, its return declaration is ": void"

19) OCA\user_ldap\tests\Jobs\UpdateGroupsTest::testHandleKnownGroups
Trying to configure method "fetchAll" which cannot be configured because it does not exist, has not been specified, is final, or is static

--

There was 1 failure:

1) Test\Avatar\GuestAvatarTest::testGet
Failed asserting that two strings are equal.
--- Expected
+++ Actual
@@ @@
-Binary String: 0x89504e470d0a1a0a0000000d4948445200000020000000200802000000fc18eda3000000097048597300000ec400000ec401952b0e1b000000a649444154488963f49cb58881968089a6a68f5a306ac1a8050c0c0c0c0c2cc428b25592af74b643139c77f2dc9a4b57a963010474ef3ff2f4e32738f7f597afc4e822c182fbefde3f78f78178f5103038e200029819991891b8ffa96ec1e4206f646eeaaa0d4f3f7ea6a6059d7b0f3f418ae4979fa91dc90f3f7c188c913c985291a290201b33339cfbfacbd7f7df7f50d38252471b642e914505e368ab62d482510b86810500c10a35d97433bcc60000000049454e44ae426082
+Binary String: 0x89504e470d0a1a0a0000000d4948445200000020000000200802000000fc18eda3000000097048597300000ec400000ec401952b0e1b000000bf49444154488963f49cb58881968089a6a68f5a306ac1d0b0808518455cacac62bcdc6882efbe7efff4f327752c309695aa74b643139c77f2dc9a4b57a9630104b4ec3ef8e4c34738f7ddf7efc4e822c182679f3e3d42b2804830382219025899985998100efaf3ef1f952d9818e885cc4d5db5e1e9c7cfd4b460fad1532f3e7f8173df7cf9468c2e122cb8fce2e583771f88570f0123a4a8800013196959017e38f7fedbf74f3e7ea2a60549e646c85c228b0ac6d156c5a805a3160c030b00f4283ddebeb8599a0000000049454e44ae426082

/drone/src/tests/lib/Avatar/GuestAvatarTest.php:70

@rullzer
Copy link
Member

rullzer commented Feb 12, 2021

CI is not happy.
I'd also say lets not backport this. This has been this way since ages. I'd rather not risk breaking things on stable branches.

Generated avatars as well as text file previews are rendered using the
"core/fonts/NotoSans-Regular.ttf" font. The file was the standard hinted
"NotoSans-Regular.ttf" file from https://www.google.com/get/noto/.
However that file does not cover some non LGC (Latin, Greek, Cyrillic)
scripts, like Arabic, Devanagari or Hebrew, to name a few.

Markdown file previews also use "core/fonts/NotoSans-Bold.ttf", which is
in the same situation as the regular one.

Due to limitations in the TTF format it is not possible to provide a
single file for each style that includes all Noto fonts. However, it is
possible to add more scripts to the standard "NotoSans-Regular.ttf" and
"NotoSans-Bold.ttf" files (although no CJK (Chinese, Japanese, Korean)
glyph can be included due to the aforementioned limitations).

This commit replaces the standard files with an extended version created
using the Noto Tools. The build script (as well as a patch for the Noto
Tools) is also included for reference and to be able to update the font
files in the future if needed.

Due to the additional scripts added the font files are now much larger,
although this does not seem to increase the time spent rendering LGC
scripts.

Note that the file for the bold style still contains less scripts than
the regular one, as not all scripts supported by Noto have a bold
weight.

Signed-off-by: Daniel Calviño Sánchez <danxuliu@gmail.com>
@danxuliu danxuliu force-pushed the fix-non-lgc-glyphs-in-avatars-and-txt-file-previews branch from 5c652ff to 1713d28 Compare March 1, 2021 18:19
@danxuliu
Copy link
Member Author

danxuliu commented Mar 1, 2021

Rebased on master, guest avatar test file regenerated with the updated font, and psalm baseline updated based on the ignored error for BackgroundCleanupJob that the command is based on.

The command is meant to be used when the fonts used to render texts
("core/fonts/NotoSans-Regular.ttf" and "core/fonts/NotoSans-Bold.ttf")
are changed (for example, to add support for other scripts). The avatar
and text file previews will be removed, so they will be generated again
with the updated font when needed.

Signed-off-by: Daniel Calviño Sánchez <danxuliu@gmail.com>
@danxuliu danxuliu force-pushed the fix-non-lgc-glyphs-in-avatars-and-txt-file-previews branch from 1713d28 to 9f96a47 Compare March 1, 2021 19:55
@danxuliu
Copy link
Member Author

danxuliu commented Mar 1, 2021

Somebody forgot to execute composer run cs:fix before the previous rebase 🤦

Copy link
Member

@rullzer rullzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure...

Copy link
Member

@juliusknorr juliusknorr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🐘

@PVince81
Copy link
Member

if you had to rebuild the font with a patched version of Noto Tools, does it mean we are the first ever to meet this kind of issue ?
I was just wondering how other projects are dealing with this problem and am surprised that there is no library or tool that makes it work.

For CJK as far as I know, at least on desktop, there's "font substitution" logic built-in where if a glyph is missing in one font it will pick it in another. This makes it possible to mix different fonts. (I remember that Linux desktops had a bug where sometimes font substitution mixed Japanese with Chinese glyphs which rendered those with a different font style and made it ugly, haven't seen this issue since).

@MorrisJobke MorrisJobke merged commit 268acd3 into master Mar 22, 2021
@MorrisJobke MorrisJobke deleted the fix-non-lgc-glyphs-in-avatars-and-txt-file-previews branch March 22, 2021 20:06
@backportbot-nextcloud
Copy link

The backport to stable19 failed. Please do this backport manually.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants