Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix isequal_normalized for combining-char reordering #52447

Merged
merged 9 commits into from
Dec 19, 2023

Conversation

stevengj
Copy link
Member

@stevengj stevengj commented Dec 8, 2023

Fixes #52408.

(Note that this function was added in Julia 1.8, in #42493.)

In the future it would be good to further optimize this function by adding a fast path for the common case of strings that are mostly ASCII characters. Perhaps simply skip ahead to the first byte that doesn't match before we begin doing decomposition etcetera.

@stevengj stevengj added unicode Related to unicode characters and encodings bugfix This change fixes an existing bug backport 1.8 Change should be backported to release-1.8 backport 1.9 Change should be backported to release-1.9 backport 1.10 Change should be backported to the 1.10 release labels Dec 8, 2023
@stevengj
Copy link
Member Author

stevengj commented Dec 9, 2023

CI failures seem unrelated.

@KristofferC KristofferC mentioned this pull request Dec 12, 2023
17 tasks
@stevengj
Copy link
Member Author

Should be good to merge?

@StefanKarpinski StefanKarpinski merged commit 3b250c7 into master Dec 19, 2023
7 checks passed
@StefanKarpinski StefanKarpinski deleted the sgj/isequal_normalized_fix branch December 19, 2023 12:55
Copy link
Member

@StefanKarpinski StefanKarpinski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (already merged)

KristofferC pushed a commit that referenced this pull request Dec 23, 2023
Fixes #52408.

(Note that this function was added in Julia 1.8, in #42493.)

In the future it would be good to further optimize this function by
adding a fast path for the common case of strings that are mostly ASCII
characters. Perhaps simply skip ahead to the first byte that doesn't
match before we begin doing decomposition etcetera.

(cherry picked from commit 3b250c7)
@aviatesk aviatesk removed the backport 1.10 Change should be backported to the 1.10 release label Dec 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 1.8 Change should be backported to release-1.8 backport 1.9 Change should be backported to release-1.9 bugfix This change fixes an existing bug unicode Related to unicode characters and encodings
Projects
None yet
Development

Successfully merging this pull request may close these issues.

isequal_normalized("בְּ", Unicode.normalize("בְּ")) == false
3 participants