Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should dir=auto with no strong characters inherit directionality from parent or be ltr? #10097

Closed
dbaron opened this issue Jan 26, 2024 · 15 comments · Fixed by #10140
Closed
Labels
i18n-alreq Notifies Arabic script experts of relevant issues i18n-hlreq Notifies Hebrew script experts of relevant issues i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response.

Comments

@dbaron
Copy link
Member

dbaron commented Jan 26, 2024

What is the issue with the HTML Standard?

The HTML spec has for a while (at least since sometime before the refactoring in #9554) said that, in the cases where dir=auto determines directionality from the text content of an element (as opposed to the cases where it looks at form control values as it does for text inputs and textarea), the element inherits directionality from the parent when the text content does not have strong characters ("a code point whose bidirectional character type is L, AL, or R"). (See spec.)

For a long time no implementations did this. As part of recent directionality changes I shipped this behavior in Chrome 120. (See the CL that made the change.) This CL changed one WPT which still fails in Safari and Firefox.

We recently got a bug report about this behavior, with a description of a case that it broke.

My question is: do we want to revert this behavior in Chrome and change the HTML spec back to the old browser behavior? Or is what the HTML spec previously described considered the desirable behavior?

cc @fantasai @r12a

@dbaron dbaron added i18n-alreq Notifies Arabic script experts of relevant issues i18n-hlreq Notifies Hebrew script experts of relevant issues labels Jan 26, 2024
@dbaron
Copy link
Member Author

dbaron commented Jan 26, 2024

I should clarify that the old implementation behavior prior to the recent change was that a dir=auto element that contained no strong characters would fall back to ltr.

@dbaron dbaron added the i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. label Jan 26, 2024
@annevk
Copy link
Member

annevk commented Jan 29, 2024

I would suggest we revert to the behavior it had for a very long time. The upside is not worth the downstream cost.

@dbaron
Copy link
Member Author

dbaron commented Jan 29, 2024

A little more history here:

In November 2010, dir=auto was added in fd6901d, fixing http://www.w3.org/Bugs/Public/show_bug.cgi?id=10808, with the behavior that dir=auto elements with no strong characters are ltr.

In April 2013, in work tied to https://www.w3.org/Bugs/Public/show_bug.cgi?id=17835, b00c2b0 added the rule that dir=auto elements inherit parent directionality if they have no strong characters and are not a <bdi>.

In June 2013, also tied to https://www.w3.org/Bugs/Public/show_bug.cgi?id=17835, b967fb3 removed the exception for <bdi> that was added in April 2013.

In July 2013, fixing https://www.w3.org/Bugs/Public/show_bug.cgi?id=17835#c21, 773c6d1 changed the rule from being that elements inherit the parent directionality if they have no strong characters, to that they only inherit the parent directionality if they are empty (and nonempty elements with no strong characters are LTR).

In August 2014, without clear explanation, d2021df changed the rule from only empty element inheriting from the parent (and non-empty elements without strong characters being ltr) to instead being that any elements without strong characters inherit from the parent. (This undid the July 2013 change, and went back to the June 2013 state.)

I believe the spec behavior for this case has not changed since then.

I should test which of the above behaviors existing implementations follow.

@dbaron
Copy link
Member Author

dbaron commented Jan 29, 2024

cc @aharon-lanin

@aphillips
Copy link
Contributor

There will certainly be cases where dir=auto containing no strongly directional characters results in broken display by applying either of the specific directions (either ltr or rtl). The bidi isolation of the element will help prevent broader spillover effects, but the element itself can still be broken.

My guess is that forcing ltr is wrong because RTL content authors would then have no control over the direction of dir=auto content. Many RTL language pages declare direction on e.g. the html element and don't repeated it internally.

Note that I18N's staff contact has changed (@r12a is still around, but please copy... @xfq)

@smaug----
Copy link

I tend to agree with @annevk
But @jfkthame might have other opinions.

@aphillips
Copy link
Contributor

aphillips commented Jan 29, 2024

Note that the behavior it had "for a very long time" was because we didn't address bidirectionality thoroughly for a very long time.

There is also some history in the HTML Bidi requirements doc at this location

Also, note that the bug mentioned in the start of this issue has to do with how the Unicode Bidi Algorithm works with numbers (numbers are not strongly directional).

If you have a negative number (-1.7 say), the display of the minus sign will be incorrect if it is not prefixed with a character such as the ALM (U+061C) or LRM (U+200E). Some I18N libraries such as ICU4J, produce ALM while Intl.NumberFormat will produce LRM when formatting such a number. If no prefixing is done, the results can have the minus sign "swing around" to the other side depending on the direction. dir=auto isn't magic... sometimes UBA needs more help.

@jfkthame
Copy link

My first instinct was to feel that the "new" behavior (in the spec for the past decade, but not actually implemented in browsers until the recent Chrome change) would be preferable, but I'm not sure it's worth the potential compat issues of changing something that's been established on the web for so many years.

I don't believe there is any ideal algorithm or heuristic that can be relied on to automatically give the desired result in all cases; whatever we do, added "hints" in the form of bidi control characters and/or properties will sometimes be needed. Given this, stability should probably trump the possibly-marginal benefits of the change.

So +1 to @annevk, I think.

@r12a
Copy link

r12a commented Jan 30, 2024

This is a complicated area. The expected behaviour for numbers and punctuation varies according to what punctuation marks are involved, whether and what characters precede the expression, and what language this is. See https://r12a.github.io/scripts/arab/arb.html#expressions

For example, if this was a sentence capturing data related to 'Change in flow rate' (written in Arabic), and it was provided with the value written in memory as '100‐200' (ie. an increase) using ‐ U+2010 HYPHEN as the separator (which is appropriate), you'd expect to see '200-100' rendered in an Arabic sentence (but '100-200' in a Hebrew sentence). Enter those values in the JS Fiddle and Chrome shows '200-100' (indicating an increase), but Firefox applies LTR direction and so you get '100-200', which in a true RTL context would indicate a decrease rather than an increase. (Note, that this behaviour is different from what you get when using - U+002D HYPHEN-MINUS.)

So that's at least one example that counters the one using the negative sign, but i expect that working out all the possible scenarios and effects is a little complicated, and i'm not sure what to recommend at the moment. It seems as if Chrome's approach might possibly be safer when working with data being dropped into inline text, but that's just a hunch at the moment.

On the other hand, if a telephone number or a MAC address string are dropped into an Arabic sentence, then applying LTR directionality would usually be recommended – you don't want the component parts moving around. I don't know whether there's a one-size-fits-all answer to the use case described (which ignores metadata that comes with the data inserted, or an expectation of what type of data the insertion is (eg. a MAC address), that can be used to set the dir attribute on the inline element).

Not sure that helps us get closer to an answer, but it may hopefully advise a little caution in drawing conclusions here.

@r12a
Copy link

r12a commented Jan 30, 2024

Here's a quick test to show the range example. It gives different results in Firefox and Chrome, but Chrome is the one that's correct this time.

@dbaron
Copy link
Member Author

dbaron commented Feb 15, 2024

So I wrote a quick test for the inheritance cases and it looks like the behavior prior to the recent Chromium change was the same in all engines: dir=auto was LTR if there were no strong characters, and inheritance wasn't a thing. Given that the discussion here doesn't have a clear conclusion that the new Chromium behavior is better, I'm going to revert Chromium back to the old behavior on that point.

chromium-wpt-export-bot pushed a commit to web-platform-tests/wpt that referenced this issue Feb 16, 2024
…eriting).

This reverts (conditioned on a new DirAutoNoInheritance flag) a previous
change from https://crrev.com/c4557b863d101826932f33757e9398e7fca056c9
and makes it so that dir=auto elements never inherit directionality from
their parent.  Instead, when no strong characters are present, they have
LTR directionality, as we did before.

Reverting to our old behavior (and reverting the relevant test to its
old state) seems like the best option given the discussion in
whatwg/html#10097 .

Once this ships to stable we can remove the mechanisms used to support
that inheritance.

The changes in html/dom/elements/global-attributes/dir_auto-N-EN.html
and its reference are a direct revert of the prior test change.

The new dir-shadow-42 test is a version of dir-shadow-41 with the
directions swapped, to make sure things are tested more thoroughly.

Fixed: 41494751
Bug: 576815
Change-Id: I68b36a1fc19a0553f582fdf2fd02a94d6e633686
chromium-wpt-export-bot pushed a commit to web-platform-tests/wpt that referenced this issue Feb 16, 2024
…eriting).

This reverts (conditioned on a new DirAutoNoInheritance flag) a previous
change from https://crrev.com/c4557b863d101826932f33757e9398e7fca056c9
and makes it so that dir=auto elements never inherit directionality from
their parent.  Instead, when no strong characters are present, they have
LTR directionality, as we did before.

Reverting to our old behavior (and reverting the relevant test to its
old state) seems like the best option given the discussion in
whatwg/html#10097 .

Once this ships to stable we can remove the mechanisms used to support
that inheritance.

The changes in html/dom/elements/global-attributes/dir_auto-N-EN.html
and its reference are a direct revert of the prior test change.

The new dir-shadow-42 test is a version of dir-shadow-41 with the
directions swapped, to make sure things are tested more thoroughly.

Fixed: 41494751
Bug: 576815
Change-Id: I68b36a1fc19a0553f582fdf2fd02a94d6e633686
chromium-wpt-export-bot pushed a commit to web-platform-tests/wpt that referenced this issue Feb 16, 2024
…eriting).

This reverts (conditioned on a new DirAutoNoInheritance flag) a previous
change from https://crrev.com/c4557b863d101826932f33757e9398e7fca056c9
and makes it so that dir=auto elements never inherit directionality from
their parent.  Instead, when no strong characters are present, they have
LTR directionality, as we did before.

Reverting to our old behavior (and reverting the relevant test to its
old state) seems like the best option given the discussion in
whatwg/html#10097 .

Once this ships to stable we can remove the mechanisms used to support
that inheritance.

The changes in html/dom/elements/global-attributes/dir_auto-N-EN.html
and its reference are a direct revert of the prior test change in the CL
above.

The changes in bdi-auto-dir-default.html and
bdi-dir-default-to-auto.html are reverting a newer test change from
https://crrev.com/4278cdb00ac4e727c1123e5eb4aba86509e87c0b .

The new dir-shadow-42 test is a version of dir-shadow-41 with the
directions swapped, to make sure things are tested more thoroughly.

Fixed: 41494751
Bug: 576815
Change-Id: I68b36a1fc19a0553f582fdf2fd02a94d6e633686
aarongable pushed a commit to chromium/chromium that referenced this issue Feb 16, 2024
…eriting).

This reverts (conditioned on a new DirAutoNoInheritance flag) a previous
change from https://crrev.com/c4557b863d101826932f33757e9398e7fca056c9
and makes it so that dir=auto elements never inherit directionality from
their parent.  Instead, when no strong characters are present, they have
LTR directionality, as we did before.

Reverting to our old behavior (and reverting the relevant test to its
old state) seems like the best option given the discussion in
whatwg/html#10097 .

Once this ships to stable we can remove the mechanisms used to support
that inheritance.

The changes in html/dom/elements/global-attributes/dir_auto-N-EN.html
and its reference are a direct revert of the prior test change in the CL
above.

The changes in bdi-auto-dir-default.html and
bdi-dir-default-to-auto.html are reverting a newer test change from
https://crrev.com/4278cdb00ac4e727c1123e5eb4aba86509e87c0b .

The new dir-shadow-42 test is a version of dir-shadow-41 with the
directions swapped, to make sure things are tested more thoroughly.

Fixed: 41494751
Bug: 576815
Change-Id: I68b36a1fc19a0553f582fdf2fd02a94d6e633686
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5302287
Commit-Queue: David Baron <dbaron@chromium.org>
Commit-Queue: Di Zhang <dizhangg@chromium.org>
Auto-Submit: David Baron <dbaron@chromium.org>
Reviewed-by: Di Zhang <dizhangg@chromium.org>
Cr-Commit-Position: refs/heads/main@{#1261442}
chromium-wpt-export-bot pushed a commit to web-platform-tests/wpt that referenced this issue Feb 16, 2024
…eriting).

This reverts (conditioned on a new DirAutoNoInheritance flag) a previous
change from https://crrev.com/c4557b863d101826932f33757e9398e7fca056c9
and makes it so that dir=auto elements never inherit directionality from
their parent.  Instead, when no strong characters are present, they have
LTR directionality, as we did before.

Reverting to our old behavior (and reverting the relevant test to its
old state) seems like the best option given the discussion in
whatwg/html#10097 .

Once this ships to stable we can remove the mechanisms used to support
that inheritance.

The changes in html/dom/elements/global-attributes/dir_auto-N-EN.html
and its reference are a direct revert of the prior test change in the CL
above.

The changes in bdi-auto-dir-default.html and
bdi-dir-default-to-auto.html are reverting a newer test change from
https://crrev.com/4278cdb00ac4e727c1123e5eb4aba86509e87c0b .

The new dir-shadow-42 test is a version of dir-shadow-41 with the
directions swapped, to make sure things are tested more thoroughly.

Fixed: 41494751
Bug: 576815
Change-Id: I68b36a1fc19a0553f582fdf2fd02a94d6e633686
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5302287
Commit-Queue: David Baron <dbaron@chromium.org>
Commit-Queue: Di Zhang <dizhangg@chromium.org>
Auto-Submit: David Baron <dbaron@chromium.org>
Reviewed-by: Di Zhang <dizhangg@chromium.org>
Cr-Commit-Position: refs/heads/main@{#1261442}
chromium-wpt-export-bot pushed a commit to web-platform-tests/wpt that referenced this issue Feb 16, 2024
…eriting).

This reverts (conditioned on a new DirAutoNoInheritance flag) a previous
change from https://crrev.com/c4557b863d101826932f33757e9398e7fca056c9
and makes it so that dir=auto elements never inherit directionality from
their parent.  Instead, when no strong characters are present, they have
LTR directionality, as we did before.

Reverting to our old behavior (and reverting the relevant test to its
old state) seems like the best option given the discussion in
whatwg/html#10097 .

Once this ships to stable we can remove the mechanisms used to support
that inheritance.

The changes in html/dom/elements/global-attributes/dir_auto-N-EN.html
and its reference are a direct revert of the prior test change in the CL
above.

The changes in bdi-auto-dir-default.html and
bdi-dir-default-to-auto.html are reverting a newer test change from
https://crrev.com/4278cdb00ac4e727c1123e5eb4aba86509e87c0b .

The new dir-shadow-42 test is a version of dir-shadow-41 with the
directions swapped, to make sure things are tested more thoroughly.

Fixed: 41494751
Bug: 576815
Change-Id: I68b36a1fc19a0553f582fdf2fd02a94d6e633686
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5302287
Commit-Queue: David Baron <dbaron@chromium.org>
Commit-Queue: Di Zhang <dizhangg@chromium.org>
Auto-Submit: David Baron <dbaron@chromium.org>
Reviewed-by: Di Zhang <dizhangg@chromium.org>
Cr-Commit-Position: refs/heads/main@{#1261442}
dbaron added a commit to dbaron/html that referenced this issue Feb 16, 2024
This restores the old behavior that was part of the spec from November
2010 to April 2013, which is that elements with dir=auto (and <bdi>
without dir) that contain no strong LTR or strong RTL characters fall
back to ltr, rather than inheriting the parent's directionality.

This matches what is implemented in Gecko and WebKit, and what was
implemented in Chromium until a few months ago.  A small number of
regressions were reported as a result of this change.  Given that I
don't see a clear consensus that the spec behavior is better, it seems
better to revert back to the existing behavior and avoid breaking some
existing content that depends on it.

Fixes whatwg#10097.
@fantasai
Copy link
Contributor

fantasai commented Feb 16, 2024

Pulling up my comment from https://www.w3.org/Bugs/Public/show_bug.cgi?id=17835:

The reason for [having all-neutral paragraphs resolve to LTR] is to keep the reordering results consistent with plaintext. In plaintext, a paragraph of neutral characters is ordered LTR.

I don't mind defaulting to the inherited direction when the element is empty, if that makes for a better UI, but when it has neutral content, it should default to LTR regardless of the inherited direction.

Basically regardless of compatibility, I think reverting neutral sequences to LTR is the right answer.

@aphillips
Copy link
Contributor

aphillips commented Feb 16, 2024

@fantasai For otherwise RTL text, that breaks the directional run when it does not need to. Here's a quick-and-dirty example:

البحرين مصر <span dir=auto>{{}}</span> الكويت!

In the example, defaulting to LTR causes what is otherwise a unidirectional string to be bidirectional. This is visible in selection:

image

Try it

annevk pushed a commit that referenced this issue Feb 22, 2024
This restores the old behavior that was part of the spec from November
2010 to April 2013, which is that elements with dir=auto (and <bdi>
without dir) that contain no strong LTR or strong RTL characters fall
back to ltr, rather than inheriting the parent's directionality.

This matches what is implemented in Gecko and WebKit, and what was
implemented in Chromium until a few months ago. A small number of
regressions were reported as a result of this change. Given that I
don't see a clear consensus that the spec behavior is better, it seems
better to revert back to the existing behavior and avoid breaking some
existing content that depends on it.

Fixes #10097.
moz-v2v-gh pushed a commit to mozilla/gecko-dev that referenced this issue Feb 23, 2024
…aracters ltr (rather than inheriting)., a=testonly

Automatic update from web-platform-tests
Make dir=auto elements without strong characters ltr (rather than inheriting).

This reverts (conditioned on a new DirAutoNoInheritance flag) a previous
change from https://crrev.com/c4557b863d101826932f33757e9398e7fca056c9
and makes it so that dir=auto elements never inherit directionality from
their parent.  Instead, when no strong characters are present, they have
LTR directionality, as we did before.

Reverting to our old behavior (and reverting the relevant test to its
old state) seems like the best option given the discussion in
whatwg/html#10097 .

Once this ships to stable we can remove the mechanisms used to support
that inheritance.

The changes in html/dom/elements/global-attributes/dir_auto-N-EN.html
and its reference are a direct revert of the prior test change in the CL
above.

The changes in bdi-auto-dir-default.html and
bdi-dir-default-to-auto.html are reverting a newer test change from
https://crrev.com/4278cdb00ac4e727c1123e5eb4aba86509e87c0b .

The new dir-shadow-42 test is a version of dir-shadow-41 with the
directions swapped, to make sure things are tested more thoroughly.

Fixed: 41494751
Bug: 576815
Change-Id: I68b36a1fc19a0553f582fdf2fd02a94d6e633686
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5302287
Commit-Queue: David Baron <dbaron@chromium.org>
Commit-Queue: Di Zhang <dizhangg@chromium.org>
Auto-Submit: David Baron <dbaron@chromium.org>
Reviewed-by: Di Zhang <dizhangg@chromium.org>
Cr-Commit-Position: refs/heads/main@{#1261442}

--

wpt-commits: 9630591d607ac18d6572602cf0f1ba5129fe9954
wpt-pr: 44621
marcoscaceres pushed a commit to web-platform-tests/wpt that referenced this issue Feb 23, 2024
…eriting).

This reverts (conditioned on a new DirAutoNoInheritance flag) a previous
change from https://crrev.com/c4557b863d101826932f33757e9398e7fca056c9
and makes it so that dir=auto elements never inherit directionality from
their parent.  Instead, when no strong characters are present, they have
LTR directionality, as we did before.

Reverting to our old behavior (and reverting the relevant test to its
old state) seems like the best option given the discussion in
whatwg/html#10097 .

Once this ships to stable we can remove the mechanisms used to support
that inheritance.

The changes in html/dom/elements/global-attributes/dir_auto-N-EN.html
and its reference are a direct revert of the prior test change in the CL
above.

The changes in bdi-auto-dir-default.html and
bdi-dir-default-to-auto.html are reverting a newer test change from
https://crrev.com/4278cdb00ac4e727c1123e5eb4aba86509e87c0b .

The new dir-shadow-42 test is a version of dir-shadow-41 with the
directions swapped, to make sure things are tested more thoroughly.

Fixed: 41494751
Bug: 576815
Change-Id: I68b36a1fc19a0553f582fdf2fd02a94d6e633686
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5302287
Commit-Queue: David Baron <dbaron@chromium.org>
Commit-Queue: Di Zhang <dizhangg@chromium.org>
Auto-Submit: David Baron <dbaron@chromium.org>
Reviewed-by: Di Zhang <dizhangg@chromium.org>
Cr-Commit-Position: refs/heads/main@{#1261442}
ErichDonGubler pushed a commit to erichdongubler-mozilla/firefox that referenced this issue Feb 24, 2024
…aracters ltr (rather than inheriting)., a=testonly

Automatic update from web-platform-tests
Make dir=auto elements without strong characters ltr (rather than inheriting).

This reverts (conditioned on a new DirAutoNoInheritance flag) a previous
change from https://crrev.com/c4557b863d101826932f33757e9398e7fca056c9
and makes it so that dir=auto elements never inherit directionality from
their parent.  Instead, when no strong characters are present, they have
LTR directionality, as we did before.

Reverting to our old behavior (and reverting the relevant test to its
old state) seems like the best option given the discussion in
whatwg/html#10097 .

Once this ships to stable we can remove the mechanisms used to support
that inheritance.

The changes in html/dom/elements/global-attributes/dir_auto-N-EN.html
and its reference are a direct revert of the prior test change in the CL
above.

The changes in bdi-auto-dir-default.html and
bdi-dir-default-to-auto.html are reverting a newer test change from
https://crrev.com/4278cdb00ac4e727c1123e5eb4aba86509e87c0b .

The new dir-shadow-42 test is a version of dir-shadow-41 with the
directions swapped, to make sure things are tested more thoroughly.

Fixed: 41494751
Bug: 576815
Change-Id: I68b36a1fc19a0553f582fdf2fd02a94d6e633686
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5302287
Commit-Queue: David Baron <dbaron@chromium.org>
Commit-Queue: Di Zhang <dizhangg@chromium.org>
Auto-Submit: David Baron <dbaron@chromium.org>
Reviewed-by: Di Zhang <dizhangg@chromium.org>
Cr-Commit-Position: refs/heads/main@{#1261442}

--

wpt-commits: 9630591d607ac18d6572602cf0f1ba5129fe9954
wpt-pr: 44621
@fantasai
Copy link
Contributor

@aphillips That seems appropriate. It looks weird only because you chose a palindrome.

@aphillips
Copy link
Contributor

@fantasai I'm not explaining it well then.

Replace the brackets with two other neutrals then. Let's says it's <span dir=auto>.!</span>. The neutrals are reordered LTR even though there is no reason to. Now the selection looks like the following, with the . reordered on the left when there's no apparent reason for it to be:

image

Now if we insert an RTL character (for auto see) we get selection that proceeds RTL because no LTR run is introduced by auto. The selection proceeds consistently RTL and the text continues to be ordered RTL all the way through. This is what an RTL speaker would expect, I think, based on looking at just the text alone:

image
image

Or weak sequences:
image

My point is that the LTR default causes the auto span to introduce a reorder of the spanned text when none is called for by the text itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
i18n-alreq Notifies Arabic script experts of relevant issues i18n-hlreq Notifies Hebrew script experts of relevant issues i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response.
Development

Successfully merging a pull request may close this issue.

8 participants
@aphillips @dbaron @fantasai @annevk @jfkthame @r12a @smaug---- and others