Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

string.EndsWith() behaves differently in .net core 3.1 and .net 5.0 #55843

Closed
soonu-kedari opened this issue Jul 5, 2021 · 6 comments
Closed
Assignees
Labels
area-System.Globalization untriaged New issue has not been triaged by the area owner

Comments

@soonu-kedari
Copy link

I have Api application currently in .net 5.0.
I am trying to remove extra "\0" from end of the string variable with below code:

string text = "SomeString1234567898765\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0";
while (text.EndsWith("\0"))
{
    text = text.Substring(0, text.Length - 1);
}

When I execute above code in .net core 3.1 it gives me SomeString1234567898765 which is expected result, but in .net 5.0 it gives me "" (empty string).

When did it begin and how often does it occur?
Ans: Every time

What errors do you see?
Ans: no error or exception however result is different when framework is changed.

What's the environment and are there recent changes?
Ans: Upgraded from .net core 3.1 to .net 5.0

What have you tried to troubleshoot this?
Ans: Ran code in both the frameworks i.e. in .net core 3.1 and .net 5.0

@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@joeloff joeloff transferred this issue from dotnet/sdk Jul 16, 2021
@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Jul 16, 2021
@ghost
Copy link

ghost commented Jul 17, 2021

Tagging subscribers to this area: @tarekgh, @safern
See info in area-owners.md if you want to be subscribed.

Issue Details

I have Api application currently in .net 5.0.
I am trying to remove extra "\0" from end of the string variable with below code:

string text = "SomeString1234567898765\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0";
while (text.EndsWith("\0"))
{
    text = text.Substring(0, text.Length - 1);
}

When I execute above code in .net core 3.1 it gives me SomeString1234567898765 which is expected result, but in .net 5.0 it gives me "" (empty string).

When did it begin and how often does it occur?
Ans: Every time

What errors do you see?
Ans: no error or exception however result is different when framework is changed.

What's the environment and are there recent changes?
Ans: Upgraded from .net core 3.1 to .net 5.0

What have you tried to troubleshoot this?
Ans: Ran code in both the frameworks i.e. in .net core 3.1 and .net 5.0

Author: soonu-kedari
Assignees: joeloff
Labels:

area-System.Globalization, untriaged

Milestone: -

@KalleOlaviNiemitalo
Copy link

KalleOlaviNiemitalo commented Jul 18, 2021

This looks similar to #50521 and #54951. Specifying StringComparison.Ordinal should fix it.

@KalleOlaviNiemitalo
Copy link

Although I wonder why "".EndsWith("\0") does not return true like "".EndsWith("") does.

@safern
Copy link
Member

safern commented Jul 18, 2021

Hello, thanks for the issue. This is a duplicate of: #46569

I'm going to paste the explanation of why this is the new behavior in .NET 5+ on windows.

This is by design on ICU as "\0" is a weightless character on ICU, and was discussed on this issue: #4673 (comment)

This has been the behavior in .NET Core for Unix systems since .NET Core 2.0, and as of .NET 5.0 we decided to move to use ICU by default on Windows as well to bring behavior on pair across all OSs.

You can look at the doc https://docs.microsoft.com/en-us/dotnet/standard/globalization-localization/globalization-icu to learn more about the change using ICU. The doc has the info how you can switch back to NLS behavior if you need to do so (however it is not recommended as long term that will be legacy).

Also, #43956 to make this change less painful for .NET 6.0 which is our LTS.

This is also a long thread that might be helpful understand some of the implications and motivation for the breaking change: #43736 (comment)

I'm going to close this issue, please let us know if you have more questions and thank you for opening the issue.

From @tarekgh:

Unicode collation has some characters which will be ignored during the cultural collation operations. Think about it as if these characters not exist at all in the string. The null character \0 is one of these characters. You can consult the Unicode standard for the whole list of ignored characters here https://www.unicode.org/charts/collation/chart_Ignored.html.

Usually for searching for such control characters, we always recommend using ordinal operation.

Please let us know if you have further questions. The thread on that issue should help understand. I'm going to close this as by design.

@safern safern closed this as completed Jul 18, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Aug 17, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Globalization untriaged New issue has not been triaged by the area owner
Projects
None yet
Development

No branches or pull requests

5 participants