-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Conversation
@1ec5, thanks for your PR! By analyzing this pull request, we identified @ansis and @jfirebaugh to be potential reviewers. |
@1ec5 ❤️ this in itself is a huge improvement and makes the labels more readily readable. Can confirm the arabic text is much easier to read. It looks like its spelled as individual alphabets instead of being written as a word. Source: Learnt arabic in school for 6 years |
Yes, only the display order should be flipped - the logical order should be preserved in the underlying string, for (as a contrived off-the-top-of-my-head example) something like text-to-speech reading of labels. |
@mikemorris and @1ec5 The u32string I am referring to is a private copy of the data structure that is used for glyph placement only. It contains all the code points on which the mirroring transformation can be applied. |
@mb12, thank you for the tip. Indeed, we’ll need to handle line placement as well as point placement. That was an oversight on my part. I’ve edited the original PR description to list that as a to-do item. |
One problem with mirroring inside the string is that it’ll interfere with line wrapping. Consider the label “גן הסלעים, גן הקקטוסים והגן הגזום”, which needs to be wrapped onto multiple lines: If we mirror the data rather than the layout, we’ll end up with a bottom-to-top, right-to-left label, which would only be appropriate for the Hanunó’o alphabet (which incidentally is bottom-to-top, left-to-right). Something like this:
Bidirectional text also requires multiple layout passes, one for the string overall and one for each nested text run. |
@1ec5 Would the following work for text justification? Split the u32string into words, mirror each word and concatenate them. |
Would the words be concatenated in logical or visual order? If in logical order, you’d wind up with something this like – legible but incomprehensible. If in visual order, the problem described in #6057 (comment) would remain. If labels with line placement can’t be wrapped, then yes, mirroring the string might be fine for just line placement, but we’d still need the approach I took for symbol placement. |
For RTL languages, what is the expected behavior? If the pbf file contains string "like this", does it need to be shown as "ekil siht" or does it need to be shown like "siht ekil"? Irrespective of whether this transformation happens in u32string or during glyph placement, the transform needs to ensure visual correctness. |
"siht ekil". And a wrapped string that's stored in the pbf as "quite a long string" would be displayed as:
A bilingual string stored as "left sinister / right dexter" would be displayed as "left sinister / retxed thgir". Other than this last case (where this branch tends to put the right-to-left run to the left of the left-to-right run) and some edge cases involving weak-directional characters like punctuation, the layout and placement of each glyph is correct on this branch. Is there a particular reason to avoid doing this layout work during glyph placement? |
Also note that even in this naïve approach, we need to detect direction for text runs so that we don't reverse blocks of arabic numbers. |
Western Arabic numerals (0–9) aren’t strongly right-to-left, so they’re correctly laid out from left to right. Eastern Arabic numerals (U+0660–U+0669, U+066B, U+066C, U+06F0–U+06F9) are being detected as strongly right-to-left but shouldn’t be, since they’re stored and displayed in “big endian” order. We should exclude them from the We can make directionality detection much more robust without much trouble, either by linking ICU or by bringing in the relevant data, which shouldn’t be all that large. |
@1ec5 The only advantage of doing it pre glyph placement (if possible) would be that it would work for all the cases (both line and point placement), would be easier to debug and help avoid special isRtl cases in Glyph placement code. Plus no code modification would be needed for any optimizations (For e.g. when placement is line but alignment is viewport rectangles for all the glyphs are merged into a single rectangle). |
Reverse text that begins with a character in the Arabic, Hebrew, Syriac, or Thaana Unicode blocks. This change does not include complex text layout (bidirectional text, context-sensitive shaping, ligatures, or ordering).
This PR is a proof of concept for right-to-left support in core. Labels are mirrored based on the presence of strongly right-to-left characters.
Compare the Hebrew label to the text in the popover. The popover is displaying the same string but rendered by Core Text, which fully supports right-to-left text and complex text shaping.
For bilingual or diglossic labels, which are common in some countries on OpenStreetMap, runs of right-to-left text inside predominantly left-to-right text and vice versa are reversed back to logical order.
As you can see in the above screenshot, complex text shaping remains unimplemented. It’s a much larger technical challenge that affects not only layout but also font glyph selection (mapbox/DEPRECATED-mapbox-gl#4). Regardless, can a native speaker of Arabic, Persian, Urdu, etc. comment on whether mirroring alone at least improves readability compared to what we have currently? I realize that a lack of contextual forms and ligatures would continue to make labels unpresentable, but I’m hopeful we can still make small progress while we work towards full support for these languages.
In practice, mirroring alone does appear to get us complete Hebrew support in styles derived from Mapbox Streets source. Final letter forms are encoded separately in Unicode, rather like in Greek, so they’re already being displayed correctly in master. Meanwhile, niqqud are nearly nonexistent in
name
tags in OpenStreetMap: two short road segments and a building in all of Israel.On this branch, strong right-to-left characters are defined as those characters that belong to the Arabic, Arabic Supplement, Arabic Extended-A, Hebrew, Syriac, and Thaana code blocks in Unicode. There is currently no support for weak directionality, so spaces and generic punctuation are treated as left-to-right text.
We could address directionality limitations by replacing my hand-rolled algorithm with minimal usage of ICU, which is available on Android, iOS, and macOS. ICU is unavailable on the Web, which is why mapbox/mapbox-gl-js#1841 would require a large dependency. Even if we can’t get intelligent directionality support into GL JS, I don’t think that should block support in the native codebase, because no one is ever going to complain that Hebrew text looks “too correct” on one platform. 😉
This is merely a proof of concept, but the following work would need to take place in order for it to land as a stopgap solution:
/cc @mikemorris @kkaefer @planemad