Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support Unicode 15.1 via utf8proc 2.9.0 #51799

Merged
merged 1 commit into from
Oct 23, 2023
Merged

support Unicode 15.1 via utf8proc 2.9.0 #51799

merged 1 commit into from
Oct 23, 2023

Conversation

stevengj
Copy link
Member

@stevengj stevengj commented Oct 20, 2023

Similar to #47392, support Unicode 15.1 by bumping utf8proc to 2.9.0 (JuliaStrings/utf8proc#253).

This allows us to use 118 exciting new emoji characters as identifiers, including "edible mushroom" "\U1f344\u200d\U1f7eb" (but still no superscript "q").

Interestingly, they also updated the Unicode recommendations on programming-language identifiers (UAX#31) to finally "bless" identifiers beginning with and and/or ending with numeric sub/superscripts. They still don't recommend nearly the range of identifiers accepted by Julia, however.

@stevengj stevengj added the unicode Related to unicode characters and encodings label Oct 20, 2023
@stevengj
Copy link
Member Author

stevengj commented Oct 20, 2023

Actually, even with this PR we won't currently be able to use "edible mushroom" as an identifier, because we don't currently allow U+200D "Zero-width joiner" (ZWJ) as an identifier character, since it's in category Cf (Other, format).

It might actually be a good idea to allow ZWJ in identifiers (but not as the first character), since that is used to "glue" together emojis to make new emoji. In particular, we should look at the whole "Emoji Profile" in UAX#31. I'll look into creating a separate PR for this.

@giordano
Copy link
Contributor

For reference, that's issue #40071

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
unicode Related to unicode characters and encodings
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants