Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.Net 5: For Thai, IndexOf(string) returns wrong index for some strings #59120

Closed
vsfeedback opened this issue Sep 14, 2021 · 6 comments
Closed

Comments

@vsfeedback
Copy link

This issue has been moved from a ticket on Developer Community.


[regression] [worked-in:.net core 3.1]
Using .NET 5 IndexOf(string) returns wrong index.
This is reproducible only for some input strings, when the culture is set to Thai.

Example:
CultureInfo.CurrentCulture = new CultureInfo("th");
var idxSeparator = "27".IndexOf("."); // returns 0 instead of -1
idxSeparator = "0".IndexOf("."); // returns -1 as expected
idxSeparator = "3".IndexOf("."); // returns -1 as expected
idxSeparator = "3345".IndexOf("."); // returns 0 instead of -1

Notes

  • in .netcore3.1 it returned -1 for all examples above (as I would expect)
  • In other languages this is not reproducible also in .Net 5
  • Indexof(char) does not have this problem

Original Comments

Feedback Bot on 9/7/2021, 01:21 AM:

We have directed your feedback to the appropriate engineering team for further evaluation. The team will review the feedback and notify you about the next steps.


Original Solutions

(no solutions)

@dotnet-issue-labeler dotnet-issue-labeler bot added area-System.Globalization untriaged New issue has not been triaged by the area owner labels Sep 14, 2021
@ghost
Copy link

ghost commented Sep 14, 2021

Tagging subscribers to this area: @tarekgh, @safern
See info in area-owners.md if you want to be subscribed.

Issue Details

This issue has been moved from a ticket on Developer Community.


[regression] [worked-in:.net core 3.1]
Using .NET 5 IndexOf(string) returns wrong index.
This is reproducible only for some input strings, when the culture is set to Thai.

Example:
CultureInfo.CurrentCulture = new CultureInfo("th");
var idxSeparator = "27".IndexOf("."); // returns 0 instead of -1
idxSeparator = "0".IndexOf("."); // returns -1 as expected
idxSeparator = "3".IndexOf("."); // returns -1 as expected
idxSeparator = "3345".IndexOf("."); // returns 0 instead of -1

Notes

  • in .netcore3.1 it returned -1 for all examples above (as I would expect)
  • In other languages this is not reproducible also in .Net 5
  • Indexof(char) does not have this problem

Original Comments

Feedback Bot on 9/7/2021, 01:21 AM:

We have directed your feedback to the appropriate engineering team for further evaluation. The team will review the feedback and notify you about the next steps.


Original Solutions

(no solutions)

Author: vsfeedback
Assignees: -
Labels:

area-System.Globalization, untriaged

Milestone: -

@GrabYourPitchforks
Copy link
Member

I added this to the big list at #43956.

@maryamariyan maryamariyan removed the untriaged New issue has not been triaged by the area owner label Sep 16, 2021
@maryamariyan maryamariyan added this to the 7.0.0 milestone Sep 16, 2021
@maryamariyan
Copy link
Member

maryamariyan commented Sep 16, 2021

It regressed in 5.0, would be good to also test in 6.0.

@GrabYourPitchforks @tarekgh is this something we'd need to look at in 6.0?

@tarekgh
Copy link
Member

tarekgh commented Sep 20, 2021

Doing someString.IndexOf(".") using Thai culture should always return 0 regardless of contents of the someString. The reason is in Thai culture, the character . is sort ignorable character which is treated as if it doesn't exist at all. So, I am seeing the behavior you are getting is correct.

var idxSeparator = "27".IndexOf("."); // returns 0 instead of -1
idxSeparator = "3345".IndexOf("."); // returns 0 instead of -1

That is expected result as I mentioned before.

idxSeparator = "0".IndexOf("."); // returns -1 as expected
idxSeparator = "3".IndexOf("."); // returns -1 as expected

I am not getting this result using .NET 5.0 nor .NET 6.0. Are you sure th culture was set as current culture before calling IndexOf in such cases? Please try these cases one more time and confirm what you are getting.

in .netcore3.1 it returned -1 for all examples above (as I would expect)

That is because in net core 3.1 we were using Windows NLS and in .NET 5.0 we started using ICU instead. It is expected to see some differences because of that. You may review the doc for more info. Also, you can switch back to NLS behavior if you need to as the doc describe.

In other languages this is not reproducible also in .Net 5

This normal as different languages behave differently. Note, Thai language is special case when it comes to word breaking as this language doesn't use spaces between words. So, you can expect some such other behaviors you are seeing too.

Indexof(char) does not have this problem

That is expected too. When you search using character, the search will be performed as ordinal search operation and not as linguistic search. Which means Thai culture will not be used at all at that time. You can achieve the same result using string if you do something like

"0".IndexOf(".", StringComparison.Ordinal);

@GrabYourPitchforks please review my reply here just in case you need to update anything again in #43956.

@GrabYourPitchforks
Copy link
Member

@tarekgh I don't have anything to add to your reply. It is very well stated. :)

Given that OP referred to this behavior as a "problem" and called out the clear contrast with the ordinal routine IndexOf(char), I think my original assumption that they wanted an ordinal instead of a linguistic operation is correct. So I'll leave the 43956 link as-is pending any clarification from OP that they meant something different.

@ghost
Copy link

ghost commented Oct 5, 2021

This issue has been automatically marked no recent activity because it has been marked as needs author feedback but has not had any activity for 14 days. It will be closed if no further activity occurs within 7 more days. Any new comment (by anyone, not necessarily the author) will remove no recent activity

@tarekgh tarekgh closed this as completed Oct 5, 2021
@ghost ghost removed the no-recent-activity label Oct 5, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Nov 4, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants