Normative: List new Unicode v14 `Script`/`Script_Extensions` values #2515

mathiasbynens · 2021-09-14T12:18:42Z

The following new values for the already-supported properties Script and Script_Extensions are added:

Cypro_Minoan (Cpmn)
Old_Uyghur (Ougr)
Tangsa (Tnsa)
Toto (Toto)
Vithkuqi (Vith)

bakkot · 2021-11-10T23:56:11Z

@mathiasbynens I believe last time we said we were going to ask for consensus for these. Do you want to present this or would you like the editors to take care of it? I imagine it will be an extremely short agenda item either way.

mathiasbynens · 2021-11-11T11:55:17Z

My memory is that we would indeed ask for consensus for any new properties being added, but not for any new values to already-supported properties. Otherwise we’d be in this weird state where we claim to support the latest Unicode, and we claim to support Script_Extensions in property escapes, yet we somehow don’t support one of the Script_Extensions values in the latest Unicode release.

IMHO, asking for consensus on new values for already-supported properties would be equivalent to asking for TC39 consensus on the set of new ID_Start characters in every new Unicode release. I can’t imagine anyone objecting to a specific ID_Start character but not others, and similarly I cannot imagine anyone objecting to a specific Script_Extensions value but not another. We either support Script_Extensions as a whole, or we don’t — and that decision was already made as part of the \p{…} proposal.

The meeting notes from when this was last discussed are here: https://github.com/tc39/notes/blob/master/meetings/2020-06/june-2.md#introducing-unicode-support But I agree it doesn’t clearly capture a decision on this specific case. My memory and intuition goes back to when I first proposed \p{…}.

bakkot · 2021-11-17T23:06:08Z

@mathiasbynens That sounds reasonable, but I don't think we explicitly got consensus for that at the last meeting. I want to bring this PR to the next meeting and more explicitly get consensus on

new properties will require plenary approval
new values for existing properties will be marked normative but merged by editors without plenary approval

On the other hand, it's not totally clear to the editors what the purpose of the list of values in General_Category and Script/Script_Extensions serves, if the editors are always going to fast-track any updates to them without plenary approval, vs just having a normative reference to Unicode in place of those two tables.

mathiasbynens · 2021-11-18T09:08:37Z

@mathiasbynens That sounds reasonable, but I don't think we explicitly got consensus for that at the last meeting. I want to bring this PR to the next meeting and more explicitly get consensus on

new properties will require plenary approval

new values for existing properties will be marked normative but merged by editors without plenary approval

This sounds like re-establishing consensus on something that was already part of the approved \p{…} proposal, but I understand that we might be disagreeing on that.

As part of the \p{…} proposal we decided to support all Script, Script_Extensions, and General_Category character property values, and (at the very least) the intention was for that list of values to be kept in sync with upstream Unicode over time. I thought this intention was part of what was approved at the time.

If relitigated, it’d be problematic if this would not re-gain consensus for the reasons I described.

michaelficarra · 2021-11-18T15:43:03Z

@mathiasbynens any reason not to remove the tables that enumerate those values then?

mathiasbynens · 2021-11-18T15:49:45Z

@mathiasbynens any reason not to remove the tables that enumerate those values then?

We discussed that, no? https://github.com/tc39/notes/blob/master/meetings/2020-06/june-2.md#introducing-unicode-support They were originally added to reduce the risk of interoperability issues, and IMHO we shouldn’t change that.

michaelficarra · 2021-11-18T15:53:01Z

@mathiasbynens It was clear that we didn't want to remove the table that lists properties. It was not clear that we didn't want to remove the table that lists value.

mathiasbynens · 2021-11-18T16:28:28Z

Besides interop, listing the values + aliases explicitly is useful since we decided not to support loose matching in ECMAScript’s \p{…}. If we defer to the Unicode Standard, it becomes ambiguous what the “canonical” way to write each value / value alias really is, since Unicode doesn’t define it. (We generally use the casing that’s found in Unicode’s data files, but as far as Unicode is concerned, these labels are case-insensitive.)

michaelficarra · 2021-11-29T23:16:02Z

Added a topic for next plenary: tc39/agendas#1084

mathiasbynens · 2021-12-15T08:15:24Z

Per the 2021-12-14 TC39 meeting, patches that add new upstream Unicode values & value aliases to already-supported-in-ECMAScript properties do not require explicit consensus. Removing the label. Thanks for presenting this, @michaelficarra!

bakkot · 2021-12-15T16:04:45Z

@mathiasbynens One of the things we said we'd do as part of this is to document exactly how the names of the entries in this table are chosen, either in the spec text itself or in some metadata somewhere. Can you help us figure out the right way of saying it? Would "these names correspond to the first spelling (including casing) used for each value in Scripts.txt" be accurate?

michaelficarra · 2021-12-15T16:56:15Z

It was also suggested that we include that editorial process in the spec itself, so I plan to discuss the merits of that in the next editor call.

mathiasbynens · 2021-12-15T17:12:16Z

@mathiasbynens One of the things we said we'd do as part of this is to document exactly how the names of the entries in this table are chosen, either in the spec text itself or in some metadata somewhere. Can you help us figure out the right way of saying it? Would "these names correspond to the first spelling (including casing) used for each value in Scripts.txt" be accurate?

What you were suggesting might lead to the same results for the Script/Script_Extensions names (I haven’t checked) but wouldn’t explain where we get the canonical value aliases from. The source I’ve been using is PropertyValueAliases.txt (since we need the values + any value aliases as well). This applies to both the Script/Script_Extensions table as well as the General_Category table.

Note that this logic unfortunately doesn’t cover ALL properties listed in the spec — in particular Any, ASCII, and Assigned don’t appear in PropertyValueAliases.txt nor in PropertyAliases.txt since they’re technically not properties (at the Unicode level). For those, we just settled on a casing that felt consistent with the other properties/values (as part of the \p{…} proposal). Then again, that is a one-off scenario that wouldn’t occur in these annual PRs — we’d absolutely still want to get consensus on adding brand new properties to ECMAScript (just not for any new values for existing properties).

markusicu · 2021-12-15T21:51:26Z

Right, these are the files to look at:

Formally speaking, the particular use of case/space/underscore is unlikely to change but not (I think) guaranteed to not change.

bakkot · 2021-12-23T18:13:03Z

I've pushed up a commit with the following paragraph just before the table-unicode-general-category-values.html and table-unicode-script-values.html tables:

The spellings of entries in these tables (including casing) were chosen to match the first occurrence of each property in the files PropertyAliases.txt and PropertyValueAliases.txt in the Unicode Character Database at the time each entry was added to this specification. However, because the precise spellings in those files are not guaranteed to be stable, implementations are required to follow this table rather than those files.

@mathiasbynens PTAL. If this does not seem correct to you, can you suggest an alternative?

mathiasbynens · 2021-12-24T08:06:23Z

spec.html

@@ -35141,6 +35141,9 @@ <h1>
          <emu-note>
            <p>This algorithm differs from <a href="https://unicode.org/reports/tr44/#Matching_Symbolic">the matching rules for symbolic values listed in UAX44</a>: case, <emu-xref href="#sec-white-space">white space</emu-xref>, U+002D (HYPHEN-MINUS), and U+005F (LOW LINE) are not ignored, and the `Is` prefix is not supported.</p>
          </emu-note>
+          <emu-note>
+            <p>The spellings of entries in these tables (including casing) were chosen to match the first occurrence of each property in the files PropertyAliases.txt and PropertyValueAliases.txt in the Unicode Character Database at the time each entry was added to this specification. However, because the precise spellings in those files are not guaranteed to be stable, implementations are required to follow this table rather than those files.</p>


LGTM 👍 Some nits/thoughts:

Let’s add links to these files? https://unicode.org/Public/UCD/latest/ucd/PropertyAliases.txt and https://unicode.org/Public/UCD/latest/ucd/PropertyValueAliases.txt are stable URLs.

Should the file names be wrapped in <code>?

So like this?

Suggested change

<p>The spellings of entries in these tables (including casing) were chosen to match the first occurrence of each property in the files PropertyAliases.txt and PropertyValueAliases.txt in the Unicode Character Database at the time each entry was added to this specification. However, because the precise spellings in those files are not guaranteed to be stable, implementations are required to follow this table rather than those files.</p>

<p>The spellings of entries in these tables (including casing) were chosen to match the first occurrence of each property in the files <a href="https://unicode.org/Public/UCD/latest/ucd/PropertyAliases.txt"><code>PropertyAliases.txt</code></a> and <a href="https://unicode.org/Public/UCD/latest/ucd/PropertyValueAliases.txt"><code>PropertyValueAliases.txt</code></a> in the Unicode Character Database at the time each entry was added to this specification. However, because the precise spellings in those files are not guaranteed to be stable, implementations are required to follow this table rather than those files.</p>

Would that also apply to the other UCD file references (cf. #2594)?

@gibson042 Yes, please do.

…c39#2515) The following new values for the already-supported properties `Script` and `Script_Extensions` are added: - Cypro_Minoan (Cpmn) - Old_Uyghur (Ougr) - Tangsa (Tnsa) - Toto (Toto) - Vithkuqi (Vith) Issue: tc39#2514 Tests: tc39/test262#3199

…Category tables (tc39#2515)

mathiasbynens added the unicode Relates to upstream Unicode updates. label Sep 14, 2021

michaelficarra approved these changes Sep 14, 2021

View reviewed changes

ljharb added normative change Affects behavior required to correctly evaluate some ECMAScript source text needs test262 tests The proposal should specify how to test an implementation. Ideally via github.com/tc39/test262 labels Sep 14, 2021

michaelficarra requested a review from a team September 16, 2021 16:56

mathiasbynens added has test262 tests and removed needs test262 tests The proposal should specify how to test an implementation. Ideally via github.com/tc39/test262 labels Sep 30, 2021

ljharb requested review from syg and bakkot October 6, 2021 21:47

anba mentioned this pull request Oct 8, 2021

Normative: Add new numbering system "tnsa" tc39/ecma402#614

Merged

bakkot added the editor call to be discussed in the next editor call label Oct 16, 2021

michaelficarra added editor call to be discussed in the next editor call needs consensus This needs committee consensus before it can be eligible to be merged. and removed editor call to be discussed in the next editor call labels Nov 10, 2021

syg removed the editor call to be discussed in the next editor call label Nov 10, 2021

bakkot added the editor call to be discussed in the next editor call label Nov 15, 2021

michaelficarra removed the editor call to be discussed in the next editor call label Nov 24, 2021

jridgewell mentioned this pull request Dec 14, 2021

Dec 2021 babel/proposals#78

Open

mathiasbynens removed the needs consensus This needs committee consensus before it can be eligible to be merged. label Dec 15, 2021

michaelficarra added the editor call to be discussed in the next editor call label Dec 15, 2021

michaelficarra removed the editor call to be discussed in the next editor call label Dec 22, 2021

bakkot force-pushed the unicode-14-property-escapes branch 2 times, most recently from f13634e to 6f6c232 Compare December 23, 2021 18:11

mathiasbynens commented Dec 24, 2021

View reviewed changes

michaelficarra added the editor call to be discussed in the next editor call label Dec 30, 2021

bakkot force-pushed the unicode-14-property-escapes branch from 6f6c232 to 63bef6d Compare January 5, 2022 22:58

bakkot removed the editor call to be discussed in the next editor call label Jan 5, 2022

bakkot approved these changes Jan 5, 2022

View reviewed changes

bakkot added the ready to merge Editors believe this PR needs no further reviews, and is ready to land. label Jan 5, 2022

mathiasbynens and others added 2 commits January 5, 2022 15:05

Editorial: note provenance of spellings in Script Values and General …

2ee71d8

…Category tables (tc39#2515)

ljharb force-pushed the unicode-14-property-escapes branch from 63bef6d to 2ee71d8 Compare January 5, 2022 23:06

ljharb merged commit 2ee71d8 into tc39:main Jan 5, 2022

mathiasbynens deleted the unicode-14-property-escapes branch January 6, 2022 10:52

michaelficarra mentioned this pull request Jan 19, 2022

Editorial: Use <code> for values from other specifications #2594

Merged

ota-meshi mentioned this pull request Apr 10, 2022

Add support for Unicode properties Script values added in ES2022 eslint/eslint#15772

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Normative: List new Unicode v14 `Script`/`Script_Extensions` values #2515

Normative: List new Unicode v14 `Script`/`Script_Extensions` values #2515

mathiasbynens commented Sep 14, 2021

bakkot commented Nov 10, 2021

mathiasbynens commented Nov 11, 2021 •

edited

Loading

bakkot commented Nov 17, 2021 •

edited

Loading

mathiasbynens commented Nov 18, 2021

michaelficarra commented Nov 18, 2021

mathiasbynens commented Nov 18, 2021

michaelficarra commented Nov 18, 2021

mathiasbynens commented Nov 18, 2021

michaelficarra commented Nov 29, 2021

mathiasbynens commented Dec 15, 2021

bakkot commented Dec 15, 2021

michaelficarra commented Dec 15, 2021

mathiasbynens commented Dec 15, 2021 •

edited

Loading

markusicu commented Dec 15, 2021

bakkot commented Dec 23, 2021

mathiasbynens Dec 24, 2021

michaelficarra Dec 28, 2021

mathiasbynens Dec 28, 2021

gibson042 Dec 30, 2021

michaelficarra Jan 5, 2022

bakkot Jan 5, 2022

Normative: List new Unicode v14 Script/Script_Extensions values #2515

Normative: List new Unicode v14 Script/Script_Extensions values #2515

Conversation

mathiasbynens commented Sep 14, 2021

bakkot commented Nov 10, 2021

mathiasbynens commented Nov 11, 2021 • edited Loading

bakkot commented Nov 17, 2021 • edited Loading

mathiasbynens commented Nov 18, 2021

michaelficarra commented Nov 18, 2021

mathiasbynens commented Nov 18, 2021

michaelficarra commented Nov 18, 2021

mathiasbynens commented Nov 18, 2021

michaelficarra commented Nov 29, 2021

mathiasbynens commented Dec 15, 2021

bakkot commented Dec 15, 2021

michaelficarra commented Dec 15, 2021

mathiasbynens commented Dec 15, 2021 • edited Loading

markusicu commented Dec 15, 2021

bakkot commented Dec 23, 2021

mathiasbynens Dec 24, 2021

Choose a reason for hiding this comment

michaelficarra Dec 28, 2021

Choose a reason for hiding this comment

mathiasbynens Dec 28, 2021

Choose a reason for hiding this comment

gibson042 Dec 30, 2021

Choose a reason for hiding this comment

michaelficarra Jan 5, 2022

Choose a reason for hiding this comment

bakkot Jan 5, 2022

Choose a reason for hiding this comment

Normative: List new Unicode v14 `Script`/`Script_Extensions` values #2515

Normative: List new Unicode v14 `Script`/`Script_Extensions` values #2515

mathiasbynens commented Nov 11, 2021 •

edited

Loading

bakkot commented Nov 17, 2021 •

edited

Loading

mathiasbynens commented Dec 15, 2021 •

edited

Loading