Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ICU: OrdinalIgnoreCase comparison functions convert to upper instead of using case folding #26961

Closed
GrabYourPitchforks opened this issue Jan 31, 2020 · 3 comments

Comments

@GrabYourPitchforks
Copy link
Member

GrabYourPitchforks commented Jan 31, 2020

The CompareInfo.IndexOf(..., CompareOptions.OrdinalIgnoreCase) functions on ICU use u_toupper, though they should really use u_caseFold. Case mapping (e.g., u_toupper and u_tolower) are used when converting strings to a standard casing. Case folding (u_caseFold) should be used when comparing strings for ordinal / non-linguistic equality. In particular, we should use simple case folding instead of full case folding.

Example line that exhibits the problem:

return u_toupper(one) == u_toupper(two);

This means that, for instance, the strings "ß" (U+00DF LATIN SMALL LETTER SHARP S) and "ẞ" (U+1E9E LATIN CAPITAL LETTER SHARP S) will be treated as unequal under an OrdinalIgnoreCase comparer.

To be fair, the current behavior of performing an uppercase mapping does match the Windows NLS behavior, but the Windows NLS behavior is a legacy behavior that for compatibility reasons cannot be updated to match Unicode best practices as described in https://unicode.org/faq/casemap_charprop.html. In the ICU code paths we should follow the Unicode recommendations as often as we can.

@tarekgh
Copy link
Member

tarekgh commented Aug 12, 2020

I am working on optimizing ordinal cases in general but I am not going to change the way we do the toupper. the reason is it is easier to describe that the ordinal casing behavior is what UnicodeDatat.txt is providing. It is also case folding can introduce some cases that could be not expected for ordinal. I would not risk doing that and we can look in the future if we need to do it. closing it but feel free to let me know if you disagree.

@tarekgh tarekgh closed this as completed Aug 12, 2020
@GrabYourPitchforks
Copy link
Member Author

@tarekgh Sure, that sounds reasonable. Do you think there's still interest in a new dedicated "perform case folding" API? You've previously pointed out the one in corefxlab, but I'm trying to figure out when (if ever) we'd port it to runtime.

@tarekgh
Copy link
Member

tarekgh commented Aug 14, 2020

We should look at the case folding support APIs in the next releases. I am still seeing it is useful to support and we already have requests for it.

@ghost ghost locked as resolved and limited conversation to collaborators Dec 10, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.