From 0fcb9652d91051f5334b56cf2874d47d1fa821fe Mon Sep 17 00:00:00 2001 From: Genevieve Warren <24882762+gewarren@users.noreply.github.com> Date: Tue, 3 Nov 2020 17:34:56 -0800 Subject: [PATCH 1/3] nls -> icu behavior changes --- docs/fundamentals/toc.yml | 34 +- .../base-types/best-practices-display-data.md | 65 ++++ .../base-types/best-practices-strings.md | 70 +--- docs/standard/base-types/comparing.md | 221 +++++++------ .../string-comparison-net-5-plus.md | 309 ++++++++++++++++++ .../globalization-icu.md | 10 +- .../globalization-localization/toc.yml | 19 +- 7 files changed, 535 insertions(+), 193 deletions(-) create mode 100644 docs/standard/base-types/best-practices-display-data.md create mode 100644 docs/standard/base-types/string-comparison-net-5-plus.md diff --git a/docs/fundamentals/toc.yml b/docs/fundamentals/toc.yml index 31cf0cf41603e..c10f67711053e 100644 --- a/docs/fundamentals/toc.yml +++ b/docs/fundamentals/toc.yml @@ -1419,14 +1419,20 @@ items: href: ../standard/base-types/how-to-display-milliseconds-in-date-and-time-values.md - name: Display dates in non-Gregorian calendars href: ../standard/base-types/how-to-display-dates-in-non-gregorian-calendars.md - - name: Manipulate strings + - name: Strings items: - name: Character encoding in .NET href: ../standard/base-types/character-encoding-introduction.md - name: How to use character encoding classes href: ../standard/base-types/character-encoding.md - - name: Best practices for using strings - href: ../standard/base-types/best-practices-strings.md + - name: Best practices + items: + - name: Comparing strings + href: ../standard/base-types/best-practices-strings.md + - name: Displaying and persisting formatted data + href: ../standard/base-types/best-practices-display-data.md + - name: Behavior changes in .NET 5+ (Windows) + href: string-comparison-net-5-plus.md - name: Basic string operations items: - name: Overview @@ -1438,7 +1444,7 @@ items: href: ../standard/base-types/trimming.md - name: Pad strings href: ../standard/base-types/padding.md - - name: Compare strings + - name: Comparison methods href: ../standard/base-types/comparing.md - name: Change case href: ../standard/base-types/changing-case.md @@ -1503,16 +1509,16 @@ items: href: ../standard/base-types/how-to-strip-invalid-characters-from-a-string.md - name: Verify that strings are in valid email format href: ../standard/base-types/how-to-verify-that-strings-are-in-valid-email-format.md - - name: Parse strings - items: - - name: Overview - href: ../standard/base-types/parsing-strings.md - - name: Parse numeric strings - href: ../standard/base-types/parsing-numeric.md - - name: Parse date and time strings - href: ../standard/base-types/parsing-datetime.md - - name: Parse other strings - href: ../standard/base-types/parsing-other.md + - name: Parse (convert) strings + items: + - name: Overview + href: ../standard/base-types/parsing-strings.md + - name: Parse numeric strings + href: ../standard/base-types/parsing-numeric.md + - name: Parse date and time strings + href: ../standard/base-types/parsing-datetime.md + - name: Parse other strings + href: ../standard/base-types/parsing-other.md - name: Attributes items: - name: Overview diff --git a/docs/standard/base-types/best-practices-display-data.md b/docs/standard/base-types/best-practices-display-data.md new file mode 100644 index 0000000000000..295464070faa0 --- /dev/null +++ b/docs/standard/base-types/best-practices-display-data.md @@ -0,0 +1,65 @@ +--- +title: Best practices for displaying and persisting formatted data in .NET +description: Learn how to display and persist numeric and date data effectively in .NET applications. +ms.date: 05/01/2019 +ms.technology: dotnet-standard +dev_langs: + - "csharp" + - "vb" +--- +# Best practices for displaying and persisting formatted data + +This article examines how formatted data, such as numeric data and date-and-time data, is handled for display and for storage. + +When you develop with .NET, use culture-sensitive formatting to display non-string data, such as numbers and dates, in a user interface. Use formatting with the [invariant culture](xref:System.Globalization.CultureInfo.InvariantCulture) to persist non-string data in string form. Do not use culture-sensitive formatting to persist numeric or date-and-time data in string form. + +## Displaying formatted data + +When you display non-string data such as numbers and dates and times to users, format them by using the user's cultural settings. By default, the following all use the current thread culture in formatting operations: + +- Interpolated strings supported by the [C#](../../csharp/language-reference/tokens/interpolated.md) and [Visual Basic](../../visual-basic/programming-guide/language-features/strings/interpolated-strings.md) compilers. +- String concatenation operations that use the [C#](../../csharp/language-reference/operators/addition-operator.md#string-concatenation) or [Visual Basic](../../visual-basic/programming-guide/language-features/operators-and-expressions/concatenation-operators.md) concatenation operators or that call the method directly. +- The method. +- The `ToString` methods of the numeric types and the date and time types. + +To explicitly specify that a string should be formatted by using the conventions of a designated culture or the [invariant culture](xref:System.Globalization.CultureInfo.InvariantCulture), you can do the following: + +- When using the and `ToString` methods, call an overload that has a `provider` parameter, such as or , and pass it the property, a instance that represents the desired culture, or the property. + +- For string concatenation, do not allow the compiler to perform any implicit conversions. Instead, perform an explicit conversion by calling a `ToString` overload that has a `provider` parameter. For example, the compiler implicitly uses the current culture when converting a value to a string in the following code: + + [!code-csharp[Implicit String Conversion](./snippets/best-practices-strings/csharp/tostring/Program.cs#1)] + [!code-vb[Implicit String Conversion](./snippets/best-practices-strings/vb/tostring/Program.vb#1)] + + Instead, you can explicitly specify the culture whose formatting conventions are used in the conversion by calling the method, as the following code does: + + [!code-csharp[Explicit String Conversion](./snippets/best-practices-strings/csharp/tostring/Program.cs#2)] + [!code-vb[Implicit String Conversion](./snippets/best-practices-strings/vb/tostring/Program.vb#2)] + +- For string interpolation, rather than assigning an interpolated string to a instance, assign it to a . You can then call its method produce a result string that reflects the conventions of the current culture, or you can call the method to produce a result string that reflects the conventions of a specified culture. You can also pass the formattable string to the static method to produce a result string that reflects the conventions of the invariant culture. The following example illustrates this approach. (The output from the example reflects a current culture of en-US.) + + [!code-csharp[String interpolation](./snippets/best-practices-strings/csharp/formattable/Program.cs)] + [!code-vb[String interpolation](./snippets/best-practices-strings/vb/formattable/Program.vb)] + +## Persisting formatted data + +You can persist non-string data either as binary data or as formatted data. If you choose to save it as formatted data, you should call a formatting method overload that includes a `provider` parameter and pass it the property. The invariant culture provides a consistent format for formatted data that is independent of culture and machine. In contrast, persisting data that is formatted by using cultures other than the invariant culture has a number of limitations: + +- The data is likely to be unusable if it is retrieved on a system that has a different culture, or if the user of the current system changes the current culture and tries to retrieve the data. +- The properties of a culture on a specific computer can differ from standard values. At any time, a user can customize culture-sensitive display settings. Because of this, formatted data that is saved on a system may not be readable after the user customizes cultural settings. The portability of formatted data across computers is likely to be even more limited. +- International, regional, or national standards that govern the formatting of numbers or dates and times change over time, and these changes are incorporated into Windows operating system updates. When formatting conventions change, data that was formatted by using the previous conventions may become unreadable. + +The following example illustrates the limited portability that results from using culture-sensitive formatting to persist data. The example saves an array of date and time values to a file. These are formatted by using the conventions of the English (United States) culture. After the application changes the current thread culture to French (Switzerland), it tries to read the saved values by using the formatting conventions of the current culture. The attempt to read two of the data items throws a exception, and the array of dates now contains two incorrect elements that are equal to . + +[!code-csharp[Conceptual.Strings.BestPractices#21](~/samples/snippets/csharp/VS_Snippets_CLR/conceptual.strings.bestpractices/cs/persistence.cs#21)] +[!code-vb[Conceptual.Strings.BestPractices#21](~/samples/snippets/visualbasic/VS_Snippets_CLR/conceptual.strings.bestpractices/vb/persistence.vb#21)] + +However, if you replace the property with in the calls to and , the persisted date and time data is successfully restored, as the following output shows: + +```console +06.05.1758 21:26 +05.05.1818 07:19 +22.04.1870 23:54 +08.09.1890 06:47 +18.02.1905 15:12 +``` diff --git a/docs/standard/base-types/best-practices-strings.md b/docs/standard/base-types/best-practices-strings.md index 0ff522249f7d6..5384d6709af25 100644 --- a/docs/standard/base-types/best-practices-strings.md +++ b/docs/standard/base-types/best-practices-strings.md @@ -1,12 +1,12 @@ --- -title: "Best Practices for Using Strings in .NET" -description: Learn how to use strings effectively in .NET applications. +title: "Best Practices for Comparing Strings in .NET" +description: Learn how to compare strings effectively in .NET applications. ms.date: "05/01/2019" ms.technology: dotnet-standard -dev_langs: +dev_langs: - "csharp" - "vb" -helpviewer_keywords: +helpviewer_keywords: - "strings [.NET],searching" - "best practices,string comparison and sorting" - "strings [.NET],best practices" @@ -19,15 +19,15 @@ helpviewer_keywords: - "strings [.NET],comparing" ms.assetid: b9f0bf53-e2de-4116-8ce9-d4f91a1df4f7 --- -# Best Practices for Using Strings in .NET +# Best practices for comparing strings in .NET .NET provides extensive support for developing localized and globalized applications, and makes it easy to apply the conventions of either the current culture or a specific culture when performing common operations such as sorting and displaying strings. But sorting or comparing strings is not always a culture-sensitive operation. For example, strings that are used internally by an application typically should be handled identically across all cultures. When culturally independent string data, such as XML tags, HTML tags, user names, file paths, and the names of system objects, are interpreted as if they were culture-sensitive, application code can be subject to subtle bugs, poor performance, and, in some cases, security issues. -This topic examines the string sorting, comparison, and casing methods in .NET, presents recommendations for selecting an appropriate string-handling method, and provides additional information about string-handling methods. It also examines how formatted data, such as numeric data and date and time data, is handled for display and for storage. +This article examines the string sorting, comparison, and casing methods in .NET, presents recommendations for selecting an appropriate string-handling method, and provides additional information about string-handling methods. ## Recommendations for string usage -When you develop with .NET, follow these simple recommendations when you use strings: +When you develop with .NET, follow these simple recommendations when you compare strings: - Use overloads that explicitly specify the string comparison rules for string operations. Typically, this involves calling a method overload that has a parameter of type . - Use or for comparisons as your safe default for culture-agnostic string matching. @@ -39,12 +39,11 @@ When you develop with .NET, follow these simple recommendations when you use str - Use the and methods to sort strings, not to check for equality. - Use culture-sensitive formatting to display non-string data, such as numbers and dates, in a user interface. Use formatting with the [invariant culture](xref:System.Globalization.CultureInfo.InvariantCulture) to persist non-string data in string form. -Avoid the following practices when you use strings: +Avoid the following practices when you compare strings: - Do not use overloads that do not explicitly or implicitly specify the string comparison rules for string operations. - Do not use string operations based on in most cases. One of the few exceptions is when you are persisting linguistically meaningful but culturally agnostic data. - Do not use an overload of the or method and test for a return value of zero to determine whether two strings are equal. -- Do not use culture-sensitive formatting to persist numeric data or date and time data in string form. ## Specifying string comparisons explicitly @@ -147,7 +146,7 @@ The following example performs a culture-sensitive comparison of the string "Aa" [!code-vb[Conceptual.Strings.BestPractices#19](~/samples/snippets/visualbasic/VS_Snippets_CLR/conceptual.strings.bestpractices/vb/embeddednulls1.vb#19)] However, the strings are not considered equal when you use ordinal comparison, as the following example shows: - + [!code-csharp[Conceptual.Strings.BestPractices#20](~/samples/snippets/csharp/VS_Snippets_CLR/conceptual.strings.bestpractices/cs/embeddednulls2.cs#20)] [!code-vb[Conceptual.Strings.BestPractices#20](~/samples/snippets/visualbasic/VS_Snippets_CLR/conceptual.strings.bestpractices/vb/embeddednulls2.vb#20)] @@ -170,7 +169,7 @@ Both and overloads that do not include a argument (including the equality operator). In any case, we recommend that you call an overload that has a parameter. -### string operations that use the invariant culture +### String operations that use the invariant culture Comparisons with the invariant culture use the property returned by the static property. This behavior is the same on all systems; it translates any characters outside its range into what it believes are equivalent invariant characters. This policy can be useful for maintaining one set of string behavior across cultures, but it often provides unexpected results. @@ -298,51 +297,6 @@ The following example instantiates a object [!code-csharp[Conceptual.Strings.BestPractices#10](~/samples/snippets/csharp/VS_Snippets_CLR/conceptual.strings.bestpractices/cs/indirect2.cs#10)] [!code-vb[Conceptual.Strings.BestPractices#10](~/samples/snippets/visualbasic/VS_Snippets_CLR/conceptual.strings.bestpractices/vb/indirect2.vb#10)] -## Displaying and persisting formatted data - -When you display non-string data such as numbers and dates and times to users, format them by using the user's cultural settings. By default, the following all use the current thread culture in formatting operations: - -- Interpolated strings supported by the [C#](../../csharp/language-reference/tokens/interpolated.md) and [Visual Basic](../../visual-basic/programming-guide/language-features/strings/interpolated-strings.md) compilers. -- String concatenation operations that use the [C#](../../csharp/language-reference/operators/addition-operator.md#string-concatenation) or [Visual Basic](../../visual-basic/programming-guide/language-features/operators-and-expressions/concatenation-operators.md) concatenation operators or that call the method directly. -- The method. -- The `ToString` methods of the numeric types and the date and time types. - -To explicitly specify that a string should be formatted by using the conventions of a designated culture or the [invariant culture](xref:System.Globalization.CultureInfo.InvariantCulture), you can do the following: - -- When using the and `ToString` methods, call an overload that has a `provider` parameter, such as or , and pass it the property, a instance that represents the desired culture, or the property. - -- For string concatenation, do not allow the compiler to perform any implicit conversions. Instead, perform an explicit conversion by calling a `ToString` overload that has a `provider` parameter. For example, the compiler implicitly uses the current culture when converting a value to a string in the following code: - - [!code-csharp[Implicit String Conversion](./snippets/best-practices-strings/csharp/tostring/Program.cs#1)] - [!code-vb[Implicit String Conversion](./snippets/best-practices-strings/vb/tostring/Program.vb#1)] - - Instead, you can explicitly specify the culture whose formatting conventions are used in the conversion by calling the method, as the following code does: - - [!code-csharp[Explicit String Conversion](./snippets/best-practices-strings/csharp/tostring/Program.cs#2)] - [!code-vb[Implicit String Conversion](./snippets/best-practices-strings/vb/tostring/Program.vb#2)] - -- For string interpolation, rather than assigning an interpolated string to a instance, assign it to a . You can then call its method produce a result string that reflects the conventions of the current culture, or you can call the method to produce a result string that reflects the conventions of a specified culture. You can also pass the formattable string to the static method to produce a result string that reflects the conventions of the invariant culture. The following example illustrates this approach. (The output from the example reflects a current culture of en-US.) - - [!code-csharp[String interpolation](./snippets/best-practices-strings/csharp/formattable/Program.cs)] - [!code-vb[String interpolation](./snippets/best-practices-strings/vb/formattable/Program.vb)] - -You can persist non-string data either as binary data or as formatted data. If you choose to save it as formatted data, you should call a formatting method overload that includes a `provider` parameter and pass it the property. The invariant culture provides a consistent format for formatted data that is independent of culture and machine. In contrast, persisting data that is formatted by using cultures other than the invariant culture has a number of limitations: - -- The data is likely to be unusable if it is retrieved on a system that has a different culture, or if the user of the current system changes the current culture and tries to retrieve the data. -- The properties of a culture on a specific computer can differ from standard values. At any time, a user can customize culture-sensitive display settings. Because of this, formatted data that is saved on a system may not be readable after the user customizes cultural settings. The portability of formatted data across computers is likely to be even more limited. -- International, regional, or national standards that govern the formatting of numbers or dates and times change over time, and these changes are incorporated into Windows operating system updates. When formatting conventions change, data that was formatted by using the previous conventions may become unreadable. - -The following example illustrates the limited portability that results from using culture-sensitive formatting to persist data. The example saves an array of date and time values to a file. These are formatted by using the conventions of the English (United States) culture. After the application changes the current thread culture to French (Switzerland), it tries to read the saved values by using the formatting conventions of the current culture. The attempt to read two of the data items throws a exception, and the array of dates now contains two incorrect elements that are equal to . - -[!code-csharp[Conceptual.Strings.BestPractices#21](~/samples/snippets/csharp/VS_Snippets_CLR/conceptual.strings.bestpractices/cs/persistence.cs#21)] -[!code-vb[Conceptual.Strings.BestPractices#21](~/samples/snippets/visualbasic/VS_Snippets_CLR/conceptual.strings.bestpractices/vb/persistence.vb#21)] - -However, if you replace the property with in the calls to and , the persisted date and time data is successfully restored, as the following output shows: +## See also -```console -06.05.1758 21:26 -05.05.1818 07:19 -22.04.1870 23:54 -08.09.1890 06:47 -18.02.1905 15:12 -``` +- [Globalization in .NET apps](../globalization-localization/globalization.md) diff --git a/docs/standard/base-types/comparing.md b/docs/standard/base-types/comparing.md index 0391ebc73f059..7cb49c1e67c74 100644 --- a/docs/standard/base-types/comparing.md +++ b/docs/standard/base-types/comparing.md @@ -3,11 +3,11 @@ title: "Comparing Strings in .NET" description: Read about methods to compare strings in .NET. Learn about the Compare, CompareOrdinal, CompareTo, StartsWith, EndsWith, Equals, IndexOf, & LastIndexOf methods. ms.date: "03/30/2017" ms.technology: dotnet-standard -dev_langs: +dev_langs: - "csharp" - "vb" - "cpp" -helpviewer_keywords: +helpviewer_keywords: - "value comparisons of strings" - "LastIndexOf method" - "CompareTo method" @@ -20,132 +20,141 @@ helpviewer_keywords: - "StartsWith method" ms.assetid: 977dc094-fe19-4955-98ec-d2294d04a4ba --- -# Comparing Strings in .NET -.NET provides several methods to compare the values of strings. The following table lists and describes the value-comparison methods. - -|Method name|Use| -|-----------------|---------| -||Compares the values of two strings. Returns an integer value.| -||Compares two strings without regard to local culture. Returns an integer value.| -||Compares the current string object to another string. Returns an integer value.| -||Determines whether a string begins with the string passed. Returns a Boolean value.| -||Determines whether a string ends with the string passed. Returns a Boolean value.| -||Determines whether two strings are the same. Returns a Boolean value.| -||Returns the index position of a character or string, starting from the beginning of the string you are examining. Returns an integer value.| -||Returns the index position of a character or string, starting from the end of the string you are examining. Returns an integer value.| - -## Compare - The static method provides a thorough way of comparing two strings. This method is culturally aware. You can use this function to compare two strings or substrings of two strings. Additionally, overloads are provided that regard or disregard case and cultural variance. The following table shows the three integer values that this method might return. - -|Return value|Condition| -|------------------|---------------| -|A negative integer|The first string precedes the second string in the sort order.

-or-

The first string is `null`.| -|0|The first string and the second string are equal.

-or-

Both strings are `null`.| -|A positive integer

-or-

1|The first string follows the second string in the sort order.

-or-

The second string is `null`.| - +# Compare strings in .NET + +.NET provides several methods to compare the values of strings. The following table lists and describes the value-comparison methods. + +|Method name|Use| +|-----------------|---------| +||Compares the values of two strings. Returns an integer value.| +||Compares two strings without regard to local culture. Returns an integer value.| +||Compares the current string object to another string. Returns an integer value.| +||Determines whether a string begins with the string passed. Returns a Boolean value.| +||Determines whether a string ends with the string passed. Returns a Boolean value.| +||Determines whether a character or string occurs within another string. Returns a Boolean value.| +||Determines whether two strings are the same. Returns a Boolean value.| +||Returns the index position of a character or string, starting from the beginning of the string you are examining. Returns an integer value.| +||Returns the index position of a character or string, starting from the end of the string you are examining. Returns an integer value.| + +## Compare method + +The static method provides a thorough way of comparing two strings. This method is culturally aware. You can use this function to compare two strings or substrings of two strings. Additionally, overloads are provided that regard or disregard case and cultural variance. The following table shows the three integer values that this method might return. + +|Return value|Condition| +|------------------|---------------| +|A negative integer|The first string precedes the second string in the sort order.

-or-

The first string is `null`.| +|0|The first string and the second string are equal.

-or-

Both strings are `null`.| +|A positive integer

-or-

1|The first string follows the second string in the sort order.

-or-

The second string is `null`.| + > [!IMPORTANT] -> The method is primarily intended for use when ordering or sorting strings. You should not use the method to test for equality (that is, to explicitly look for a return value of 0 with no regard for whether one string is less than or greater than the other). Instead, to determine whether two strings are equal, use the method. - - The following example uses the method to determine the relative values of two strings. - +> The method is primarily intended for use when ordering or sorting strings. You should not use the method to test for equality (that is, to explicitly look for a return value of 0 with no regard for whether one string is less than or greater than the other). Instead, to determine whether two strings are equal, use the method. + + The following example uses the method to determine the relative values of two strings. + [!code-cpp[Conceptual.String.BasicOps#6](../../../samples/snippets/cpp/VS_Snippets_CLR/conceptual.string.basicops/cpp/compare.cpp#6)] [!code-csharp[Conceptual.String.BasicOps#6](../../../samples/snippets/csharp/VS_Snippets_CLR/conceptual.string.basicops/cs/compare.cs#6)] - [!code-vb[Conceptual.String.BasicOps#6](../../../samples/snippets/visualbasic/VS_Snippets_CLR/conceptual.string.basicops/vb/compare.vb#6)] - - This example displays `-1` to the console. - - The preceding example is culture-sensitive by default. To perform a culture-insensitive string comparison, use an overload of the method that allows you to specify the culture to use by supplying a *culture* parameter. For an example that demonstrates how to use the method to perform a culture-insensitive comparison, see [Performing Culture-Insensitive String Comparisons](../globalization-localization/performing-culture-insensitive-string-comparisons.md). - -## CompareOrdinal - The method compares two string objects without considering the local culture. The return values of this method are identical to the values returned by the **Compare** method in the previous table. - + [!code-vb[Conceptual.String.BasicOps#6](../../../samples/snippets/visualbasic/VS_Snippets_CLR/conceptual.string.basicops/vb/compare.vb#6)] + + This example displays `-1` to the console. + + The preceding example is culture-sensitive by default. To perform a culture-insensitive string comparison, use an overload of the method that allows you to specify the culture to use by supplying a *culture* parameter. For an example that demonstrates how to use the method to perform a culture-insensitive comparison, see [Culture-insensitive string comparisons](../globalization-localization/performing-culture-insensitive-string-comparisons.md). + +## CompareOrdinal method + +The method compares two string objects without considering the local culture. The return values of this method are identical to the values returned by the `Compare` method in the previous table. + > [!IMPORTANT] -> The method is primarily intended for use when ordering or sorting strings. You should not use the method to test for equality (that is, to explicitly look for a return value of 0 with no regard for whether one string is less than or greater than the other). Instead, to determine whether two strings are equal, use the method. - - The following example uses the **CompareOrdinal** method to compare the values of two strings. - +> The method is primarily intended for use when ordering or sorting strings. You should not use the method to test for equality (that is, to explicitly look for a return value of 0 with no regard for whether one string is less than or greater than the other). Instead, to determine whether two strings are equal, use the method. + + The following example uses the `CompareOrdinal` method to compare the values of two strings. + [!code-cpp[Conceptual.String.BasicOps#7](../../../samples/snippets/cpp/VS_Snippets_CLR/conceptual.string.basicops/cpp/compare.cpp#7)] [!code-csharp[Conceptual.String.BasicOps#7](../../../samples/snippets/csharp/VS_Snippets_CLR/conceptual.string.basicops/cs/compare.cs#7)] - [!code-vb[Conceptual.String.BasicOps#7](../../../samples/snippets/visualbasic/VS_Snippets_CLR/conceptual.string.basicops/vb/compare.vb#7)] - - This example displays `-32` to the console. - -## CompareTo - The method compares the string that the current string object encapsulates to another string or object. The return values of this method are identical to the values returned by the method in the previous table. - + [!code-vb[Conceptual.String.BasicOps#7](../../../samples/snippets/visualbasic/VS_Snippets_CLR/conceptual.string.basicops/vb/compare.vb#7)] + + This example displays `-32` to the console. + +## CompareTo method + +The method compares the string that the current string object encapsulates to another string or object. The return values of this method are identical to the values returned by the method in the previous table. + > [!IMPORTANT] -> The method is primarily intended for use when ordering or sorting strings. You should not use the method to test for equality (that is, to explicitly look for a return value of 0 with no regard for whether one string is less than or greater than the other). Instead, to determine whether two strings are equal, use the method. - - The following example uses the method to compare the `string1` object to the `string2` object. - +> The method is primarily intended for use when ordering or sorting strings. You should not use the method to test for equality (that is, to explicitly look for a return value of 0 with no regard for whether one string is less than or greater than the other). Instead, to determine whether two strings are equal, use the method. + + The following example uses the method to compare the `string1` object to the `string2` object. + [!code-cpp[Conceptual.String.BasicOps#8](../../../samples/snippets/cpp/VS_Snippets_CLR/conceptual.string.basicops/cpp/compare.cpp#8)] [!code-csharp[Conceptual.String.BasicOps#8](../../../samples/snippets/csharp/VS_Snippets_CLR/conceptual.string.basicops/cs/compare.cs#8)] - [!code-vb[Conceptual.String.BasicOps#8](../../../samples/snippets/visualbasic/VS_Snippets_CLR/conceptual.string.basicops/vb/compare.vb#8)] - - This example displays `-1` to the console. - - All overloads of the method perform culture-sensitive and case-sensitive comparisons by default. No overloads of this method are provided that allow you to perform a culture-insensitive comparison. For code clarity, we recommend that you use the **String.Compare** method instead, specifying for culture-sensitive operations or for culture-insensitive operations. For examples that demonstrate how to use the **String.Compare** method to perform both culture-sensitive and culture-insensitive comparisons, see [Performing Culture-Insensitive String Comparisons](../globalization-localization/performing-culture-insensitive-string-comparisons.md). - -## Equals - The **String.Equals** method can easily determine if two strings are the same. This case-sensitive method returns a **true** or **false** Boolean value. It can be used from an existing class, as illustrated in the next example. The following example uses the **Equals** method to determine whether a string object contains the phrase "Hello World". - + [!code-vb[Conceptual.String.BasicOps#8](../../../samples/snippets/visualbasic/VS_Snippets_CLR/conceptual.string.basicops/vb/compare.vb#8)] + + This example displays `-1` to the console. + + All overloads of the method perform culture-sensitive and case-sensitive comparisons by default. No overloads of this method are provided that allow you to perform a culture-insensitive comparison. For code clarity, we recommend that you use the `String.Compare` method instead, specifying for culture-sensitive operations or for culture-insensitive operations. For examples that demonstrate how to use the `String.Compare` method to perform both culture-sensitive and culture-insensitive comparisons, see [Performing Culture-Insensitive String Comparisons](../globalization-localization/performing-culture-insensitive-string-comparisons.md). + +## Equals method + +The method can easily determine if two strings are the same. This case-sensitive method returns a `true` or `false` Boolean value. It can be used from an existing class, as illustrated in the next example. The following example uses the `Equals` method to determine whether a string object contains the phrase "Hello World". + [!code-cpp[Conceptual.String.BasicOps#9](../../../samples/snippets/cpp/VS_Snippets_CLR/conceptual.string.basicops/cpp/compare.cpp#9)] [!code-csharp[Conceptual.String.BasicOps#9](../../../samples/snippets/csharp/VS_Snippets_CLR/conceptual.string.basicops/cs/compare.cs#9)] - [!code-vb[Conceptual.String.BasicOps#9](../../../samples/snippets/visualbasic/VS_Snippets_CLR/conceptual.string.basicops/vb/compare.vb#9)] - - This example displays `True` to the console. - - This method can also be used as a static method. The following example compares two string objects using a static method. - + [!code-vb[Conceptual.String.BasicOps#9](../../../samples/snippets/visualbasic/VS_Snippets_CLR/conceptual.string.basicops/vb/compare.vb#9)] + + This example displays `True` to the console. + + This method can also be used as a static method. The following example compares two string objects using a static method. + [!code-cpp[Conceptual.String.BasicOps#10](../../../samples/snippets/cpp/VS_Snippets_CLR/conceptual.string.basicops/cpp/compare.cpp#10)] [!code-csharp[Conceptual.String.BasicOps#10](../../../samples/snippets/csharp/VS_Snippets_CLR/conceptual.string.basicops/cs/compare.cs#10)] - [!code-vb[Conceptual.String.BasicOps#10](../../../samples/snippets/visualbasic/VS_Snippets_CLR/conceptual.string.basicops/vb/compare.vb#10)] - - This example displays `True` to the console. - -## StartsWith and EndsWith - You can use the **String.StartsWith** method to determine whether a string object begins with the same characters that encompass another string. This case-sensitive method returns **true** if the current string object begins with the passed string and **false** if it does not. The following example uses this method to determine if a string object begins with "Hello". - + [!code-vb[Conceptual.String.BasicOps#10](../../../samples/snippets/visualbasic/VS_Snippets_CLR/conceptual.string.basicops/vb/compare.vb#10)] + + This example displays `True` to the console. + +## StartsWith and EndsWith methods + +You can use the method to determine whether a string object begins with the same characters that encompass another string. This case-sensitive method returns `true` if the current string object begins with the passed string and `false` if it does not. The following example uses this method to determine if a string object begins with "Hello". + [!code-cpp[Conceptual.String.BasicOps#11](../../../samples/snippets/cpp/VS_Snippets_CLR/conceptual.string.basicops/cpp/compare.cpp#11)] [!code-csharp[Conceptual.String.BasicOps#11](../../../samples/snippets/csharp/VS_Snippets_CLR/conceptual.string.basicops/cs/compare.cs#11)] - [!code-vb[Conceptual.String.BasicOps#11](../../../samples/snippets/visualbasic/VS_Snippets_CLR/conceptual.string.basicops/vb/compare.vb#11)] - - This example displays `True` to the console. - - The **String.EndsWith** method compares a passed string to the characters that exist at the end of the current string object. It also returns a Boolean value. The following example checks the end of a string using the **EndsWith** method. - + [!code-vb[Conceptual.String.BasicOps#11](../../../samples/snippets/visualbasic/VS_Snippets_CLR/conceptual.string.basicops/vb/compare.vb#11)] + + This example displays `True` to the console. + + The method compares a passed string to the characters that exist at the end of the current string object. It also returns a Boolean value. The following example checks the end of a string using the `EndsWith` method. + [!code-cpp[Conceptual.String.BasicOps#12](../../../samples/snippets/cpp/VS_Snippets_CLR/conceptual.string.basicops/cpp/compare.cpp#12)] [!code-csharp[Conceptual.String.BasicOps#12](../../../samples/snippets/csharp/VS_Snippets_CLR/conceptual.string.basicops/cs/compare.cs#12)] - [!code-vb[Conceptual.String.BasicOps#12](../../../samples/snippets/visualbasic/VS_Snippets_CLR/conceptual.string.basicops/vb/compare.vb#12)] - - This example displays `False` to the console. - -## IndexOf and LastIndexOf - You can use the **String.IndexOf** method to determine the position of the first occurrence of a particular character within a string. This case-sensitive method starts counting from the beginning of a string and returns the position of a passed character using a zero-based index. If the character cannot be found, a value of –1 is returned. - - The following example uses the **IndexOf** method to search for the first occurrence of the '`l`' character in a string. - + [!code-vb[Conceptual.String.BasicOps#12](../../../samples/snippets/visualbasic/VS_Snippets_CLR/conceptual.string.basicops/vb/compare.vb#12)] + + This example displays `False` to the console. + +## IndexOf and LastIndexOf methods + +You can use the method to determine the position of the first occurrence of a particular character within a string. This case-sensitive method starts counting from the beginning of a string and returns the position of a passed character using a zero-based index. If the character cannot be found, a value of –1 is returned. + +The following example uses the `IndexOf` method to search for the first occurrence of the '`l`' character in a string. + [!code-cpp[Conceptual.String.BasicOps#13](../../../samples/snippets/cpp/VS_Snippets_CLR/conceptual.string.basicops/cpp/compare.cpp#13)] [!code-csharp[Conceptual.String.BasicOps#13](../../../samples/snippets/csharp/VS_Snippets_CLR/conceptual.string.basicops/cs/compare.cs#13)] - [!code-vb[Conceptual.String.BasicOps#13](../../../samples/snippets/visualbasic/VS_Snippets_CLR/conceptual.string.basicops/vb/compare.vb#13)] - - This example displays `2` to the console. - - The **String.LastIndexOf** method is similar to the **String.IndexOf** method except that it returns the position of the last occurrence of a particular character within a string. It is case-sensitive and uses a zero-based index. - - The following example uses the **LastIndexOf** method to search for the last occurrence of the '`l`' character in a string. - + [!code-vb[Conceptual.String.BasicOps#13](../../../samples/snippets/visualbasic/VS_Snippets_CLR/conceptual.string.basicops/vb/compare.vb#13)] + + This example displays `2` to the console. + + The method is similar to the `String.IndexOf` method except that it returns the position of the last occurrence of a particular character within a string. It is case-sensitive and uses a zero-based index. + + The following example uses the `LastIndexOf` method to search for the last occurrence of the '`l`' character in a string. + [!code-cpp[Conceptual.String.BasicOps#14](../../../samples/snippets/cpp/VS_Snippets_CLR/conceptual.string.basicops/cpp/compare.cpp#14)] [!code-csharp[Conceptual.String.BasicOps#14](../../../samples/snippets/csharp/VS_Snippets_CLR/conceptual.string.basicops/cs/compare.cs#14)] - [!code-vb[Conceptual.String.BasicOps#14](../../../samples/snippets/visualbasic/VS_Snippets_CLR/conceptual.string.basicops/vb/compare.vb#14)] - - This example displays `9` to the console. - - Both methods are useful when used in conjunction with the **String.Remove** method. You can use either the **IndexOf** or **LastIndexOf** methods to retrieve the position of a character, and then supply that position to the **Remove** method in order to remove a character or a word that begins with that character. - + [!code-vb[Conceptual.String.BasicOps#14](../../../samples/snippets/visualbasic/VS_Snippets_CLR/conceptual.string.basicops/vb/compare.vb#14)] + + This example displays `9` to the console. + + Both methods are useful when used in conjunction with the method. You can use either the `IndexOf` or `LastIndexOf` methods to retrieve the position of a character, and then supply that position to the `Remove` method in order to remove a character or a word that begins with that character. + ## See also +- [Best Practices for Using Strings in .NET](best-practices-strings.md) - [Basic String Operations](basic-string-operations.md) - [Performing Culture-Insensitive String Operations](../globalization-localization/performing-culture-insensitive-string-operations.md) -- [Sorting Weight Tables (for .NET on Windows)](https://www.microsoft.com/download/details.aspx?id=10921) -- [Default Unicode Collation Element Table (for .NET Core on Linux and macOS)](https://www.unicode.org/Public/UCA/latest/allkeys.txt) +- [Sorting Weight Tables](https://www.microsoft.com/download/details.aspx?id=10921) - used by .NET Framework and .NET Core 1.0-3.1 on Windows +- [Default Unicode Collation Element Table](https://www.unicode.org/Public/UCA/latest/allkeys.txt) - used by .NET 5 on all platforms, and by .NET Core on Linux and macOS diff --git a/docs/standard/base-types/string-comparison-net-5-plus.md b/docs/standard/base-types/string-comparison-net-5-plus.md new file mode 100644 index 0000000000000..bfa7673c76394 --- /dev/null +++ b/docs/standard/base-types/string-comparison-net-5-plus.md @@ -0,0 +1,309 @@ +--- +title: Behavior changes when comparing strings on .NET 5+ +description: Learn about string-comparison behavior changes in .NET 5 and later versions on Windows. +ms.date: 11/04/2020 +--- +# Behavior changes when comparing strings on .NET 5+ + +.NET 5.0 introduces a runtime behavioral change where globalization APIs [now use ICU by default](../../core/compatibility/3.1-5.0.md#globalization-apis-use-icu-libraries-on-windows) across all supported platforms. This is a departure from earlier versions of .NET Core and from .NET Framework, which utilize the operating system's national language support (NLS) functionality when running on Windows. For more information on these changes, including compatibility switches that can revert the behavior change, see [.NET globalization and ICU](../globalization-localization/globalization-icu.md). + +## Reason for change + +This change was introduced to unify .NET's globalization behavior across all supported operating systems. It also provides the ability for applications to bundle their own globalization libraries rather than depend on the OS's built-in libraries. For more information, see [the breaking change notification](../../core/compatibility/3.1-5.0.md#globalization-apis-use-icu-libraries-on-windows). + +## Behavioral differences + +If you use functions like `string.IndexOf(string)` without calling the overload that takes a argument, you might intend to perform an *ordinal* search, but instead you inadvertently take a dependency on culture-specific behavior. Since NLS and ICU implement different logic in their linguistic comparers, the results of methods like `string.IndexOf(string)` can return unexpected values. + +This can manifest itself even in places where you aren't always expecting globalization facilities to be active. For example, the following code can produce a different answer depending on the current runtime. + +```cs +string s = "Hello\r\nworld!"; +int idx = s.IndexOf("\n"); +Console.WriteLine(idx); + +// The above snippet prints: +// '6' when running on .NET Framework (Windows) +// '6' when running on .NET Core 2.x - 3.x (Windows) +// '-1' when running on .NET 5 (Windows) +// '-1' when running on .NET Core 2.x - 3.x or .NET 5 (non-Windows) +// '6' when running on .NET Core 2.x or .NET 5 (in invariant mode) +``` + +## Guard against unexpected behavior + +### Enable code analyzers + +Code analyzers can detect possibly buggy call sites. To help guard against any surprising behaviors, we recommend installing [the __Microsoft.CodeAnalysis.FxCopAnalyzers__ NuGet package](https://www.nuget.org/packages/Microsoft.CodeAnalysis.FxCopAnalyzers/) into your project. This package includes the code analysis rules __CA1307__ and __CA1309__, which help flag code that might inadvertently be using a linguistic comparer when an ordinal comparer was likely intended. + +For example: + +```cs +// +// Potentially incorrect code - answer might vary based on locale. +// +string s = GetString(); +// Produces analyzer warning CA1307. +int idx = s.IndexOf(","); +Console.WriteLine(idx); + +// +// Corrected code - matches the literal substring ",". +// +string s = GetString(); +int idx = s.IndexOf(",", StringComparison.Ordinal); +Console.WriteLine(idx); + +// +// Corrected code (alternative) - searches for the literal ',' character. +// +string s = GetString(); +int idx = s.IndexOf(','); +Console.WriteLine(idx); +``` + +Similarly, when instantiating a sorted collection of strings or sorting an existing string-based collection, specify an explicit comparer. + +```cs +// +// Potentially incorrect code - behavior might vary based on locale. +// +SortedSet mySet = new SortedSet(); +List list = GetListOfStrings(); +list.Sort(); + +// +// Corrected code - uses ordinal sorting; doesn't vary by locale. +// +SortedSet mySet = new SortedSet(StringComparer.Ordinal); +List list = GetListOfStrings(); +list.Sort(StringComparer.Ordinal); +``` + +For more information about these code analyzer rules, including when it might be appropriate to suppress these rules in your own code base, see the following articles: + +* [CA1307: Specify StringComparison for clarity](../../fundamentals/code-analysis/quality-rules/ca1307.md) +* [CA1309: Use ordinal StringComparison](../../fundamentals/code-analysis/quality-rules/ca1309.md) + +### Revert back to NLS behaviors + +To revert .NET 5 applications back to older NLS behaviors when running on Windows, follow the steps in [.NET Globalization and ICU](../globalization-localization/globalization-icu.md). This application-wide compatibility switch must be set at the application level. Individual libraries cannot opt-in or opt-out of this behavior. + +> [!TIP] +> We strongly recommend you use the __CA1307__ and __CA1309__ analyzer rules that were mentioned previously to help improve code hygiene and discover any existing latent bugs. + +### Affected APIs + +Most .NET applications won't encounter any unexpected behaviors due to the changes in .NET 5.0. However, due to the number of affected APIs and how foundational these APIs are to the wider .NET ecosystem, you should be aware of the potential for .NET 5.0 to introduce unwanted behaviors or to expose latent bugs that already exist in your application. + +The affected APIs include: + +* [`System.String.Compare`](https://docs.microsoft.com/dotnet/api/system.string.compare) +* [`System.String.EndsWith`](https://docs.microsoft.com/dotnet/api/system.string.endswith) +* [`System.String.IndexOf`](https://docs.microsoft.com/dotnet/api/system.string.indexof) +* [`System.String.StartsWith`](https://docs.microsoft.com/dotnet/api/system.string.startswith) +* [`System.String.ToLower`](https://docs.microsoft.com/dotnet/api/system.string.tolower) +* [`System.String.ToLowerInvariant`](https://docs.microsoft.com/dotnet/api/system.string.tolowerinvariant) +* [`System.String.ToUpper`](https://docs.microsoft.com/dotnet/api/system.string.toupper) +* [`System.String.ToUpperInvariant`](https://docs.microsoft.com/dotnet/api/system.string.toupperinvariant) +* [`System.Globalization.TextInfo`](https://docs.microsoft.com/dotnet/api/system.globalization.textinfo) (most members) +* [`System.Globalization.CompareInfo`](https://docs.microsoft.com/dotnet/api/system.globalization.compareinfo) (most members) +* [`System.Array.Sort`](https://docs.microsoft.com/dotnet/api/system.array.sort) (when sorting arrays of strings) +* [`System.Collections.Generic.List.Sort`](https://docs.microsoft.com/dotnet/api/system.collections.generic.list-1.sort) (when the list elements are strings) +* [`System.Collections.Generic.SortedDictionary`](https://docs.microsoft.com/dotnet/api/system.collections.generic.sorteddictionary-2) (when the keys are strings) +* [`System.Collections.Generic.SortedList`](https://docs.microsoft.com/dotnet/api/system.collections.generic.sortedlist-2) (when the keys are strings) +* [`System.Collections.Generic.SortedSet`](https://docs.microsoft.com/dotnet/api/system.collections.generic.sortedset-1) (when the set contains strings) + +> [!NOTE] +> This is not an exhaustive list of affected APIs. + +All of the above APIs use *linguistic* string searching and comparison using the thread's [current culture](xref:System.Threading.Thread.CurrentCulture), by default. The differences between *linguistic* and *ordinal* search and comparison are called out in the [Ordinal vs. linguistic search and comparison](#ordinal-vs-linguistic-search-and-comparison). + +Because ICU implements linguistic string comparisons differently from NLS, Windows-based applications that upgrade to .NET 5.0 from an earlier version of .NET Core or .NET Framework and that call one of the affected APIs may notice that the APIs begin exhibiting different behaviors. + +#### Exceptions + +* If an API accepts an explicit `StringComparison` or `CultureInfo` parameter, that parameter overrides the API's default behavior. +* `System.String` members where the first parameter is of type `char` (for example, ) use ordinal searching, unless the caller passes an explicit `StringComparison` argument that specifies `CurrentCulture[IgnoreCase]` or `InvariantCulture[IgnoreCase]`. + +For a more detailed analysis of the default behavior of each API, see the [Default search and comparison types](#default-search-and-comparison-types) section. + +## Ordinal vs. linguistic search and comparison + +*Ordinal* (also known as *non-linguistic*) search and comparison decomposes a string into its individual `char` elements and performs a char-by-char search or comparison. For example, the strings `"dog"` and `"dog"` compare as *equal* under an `Ordinal` comparer, since the two strings consist of the exact same sequence of chars. However, `"dog"` and `"Dog"` compare as *not equal* under an `Ordinal` comparer, because they don't consist of the exact same sequence of chars. That is, uppercase `'D'`'s code point `U+0044` occurs before lowercase `'d'`'s code point `U+0064`, resulting in `"dog"` sorting before `"Dog"`. + +An `OrdinalIgnoreCase` comparer still operates on a char-by-char basis, but it eliminates case differences while performing the operation. Under an `OrdinalIgnoreCase` comparer, the char pairs `'d'` and `'D'` compare as *equal*, as do the char pairs `'á'` and `'Á'`. But the unaccented char `'a'` compares as *not equal* to the accented char `'á'`. + +Some examples of this are provided in the following table: + +| String 1 | String 2 | `Ordinal` comparison | `OrdinalIgnoreCase` comparison | +|---|---|---|---| +| `"dog"` | `"dog"` | equal | equal | +| `"dog"` | `"Dog"` | not equal | equal | +| `"resume"` | `"Resume"` | not equal | equal | +| `"resume"` | `"résumé"` | not equal | not equal | + +Unicode also allows strings to have several different in-memory representations. For example, an e-acute (é) can be represented in two possible ways: + +* A single literal `'é'` character (also written as `'\u00E9'`). +* A literal unaccented `'e'` character, followed by a combining accent modifier character `'\u0301'`. + +This means that the following _four_ strings all result in `"résumé"` when displayed, even though their constituent pieces are different. The strings use a combination of literal `'é'` characters or literal unaccented `'e'` characters plus the combining accent modifier `'\u0301'`. + +* `"r\u00E9sum\u00E9"` +* `"r\u00E9sume\u0301"` +* `"re\u0301sum\u00E9"` +* `"re\u0301sume\u0301"` + +Under an ordinal comparer, none of these strings compare as equal to each other. This is because they all contain different underlying char sequences, even though when they're rendered to the screen, they all look the same. + +When performing a `string.IndexOf(..., StringComparison.Ordinal)` operation, the runtime looks for an exact substring match. The results are as follows. + +```cs +Console.WriteLine("resume".IndexOf("e", StringComparison.Ordinal)); // prints '1' +Console.WriteLine("r\u00E9sum\u00E9".IndexOf("e", StringComparison.Ordinal)); // prints '-1' +Console.WriteLine("r\u00E9sume\u0301".IndexOf("e", StringComparison.Ordinal)); // prints '5' +Console.WriteLine("re\u0301sum\u00E9".IndexOf("e", StringComparison.Ordinal)); // prints '1' +Console.WriteLine("re\u0301sume\u0301".IndexOf("e", StringComparison.Ordinal)); // prints '1' +Console.WriteLine("resume".IndexOf("E", StringComparison.OrdinalIgnoreCase)); // prints '1' +Console.WriteLine("r\u00E9sum\u00E9".IndexOf("E", StringComparison.OrdinalIgnoreCase)); // prints '-1' +Console.WriteLine("r\u00E9sume\u0301".IndexOf("E", StringComparison.OrdinalIgnoreCase)); // prints '5' +Console.WriteLine("re\u0301sum\u00E9".IndexOf("E", StringComparison.OrdinalIgnoreCase)); // prints '1' +Console.WriteLine("re\u0301sume\u0301".IndexOf("E", StringComparison.OrdinalIgnoreCase)); // prints '1' +``` + +Ordinal search and comparison routines are never affected by the current thread's culture setting. + +*Linguistic* search and comparison routines decompose a string into *collation elements* and perform searches or comparisons on these elements. There's not necessarily a 1:1 mapping between a string's characters and its constituent collation elements. For example, a string of length 2 may consist of only a single collation element. When two strings are compared in a linguistic-aware fashion, the comparer checks whether the two strings' collation elements have the same semantic meaning, even if the string's literal characters are different. + +Consider again the string `"résumé"` and its four different representations. The following table shows each representation broken down into its collation elements. + +| String | As collation elements | +|---|---| +| `"r\u00E9sum\u00E9"` | `"r" + "\u00E9" + "s" + "u" + "m" + "\u00E9"` | +| `"r\u00E9sume\u0301"` | `"r" + "\u00E9" + "s" + "u" + "m" + "e\u0301"` | +| `"re\u0301sum\u00E9"` | `"r" + "e\u0301" + "s" + "u" + "m" + "\u00E9"` | +| `"re\u0301sume\u0301"` | `"r" + "e\u00E9" + "s" + "u" + "m" + "e\u0301"` | + +A collation element corresponds loosely to what readers would think of as a single character or cluster of characters. It's conceptually similar to a [grapheme cluster](character-encoding-introduction.md#grapheme-clusters) but encompasses a somewhat larger umbrella. + +Under a linguistic comparer, exact matches aren't necessary. Collation elements are instead compared based on their semantic meaning. For example, a linguistic comparer tsreat the substrings `"\u00E9"` and `"e\u0301"` as equal since they both semantically mean "a lowercase e with an acute accent modifier." This allows the `IndexOf` method to match the substring `"e\u0301"` within a larger string that contains the semantically equivalent substring `"\u00E9"`, as shown in the following code sample. + +```cs +Console.WriteLine("r\u00E9sum\u00E9".IndexOf("e")); // prints '-1' (not found) +Console.WriteLine("r\u00E9sum\u00E9".IndexOf("e\u00E9")); // prints '1' +Console.WriteLine("\u00E9".IndexOf("e\u00E9")); // prints '0' +``` + +As a consequence of this, two strings of different lengths may compare as equal if a linguistic comparison is used. Callers should take care not to special-case logic that deals with string length in such scenarios. + +*Culture-aware* search and comparison routines are a special form of linguistic search and comparison routines. Under a culture-aware comparer, the concept of a collation element is extended to include information specific to the specified culture. + +For example, [in the Hungarian alphabet](https://en.wikipedia.org/wiki/Hungarian_alphabet), when the two characters \ appear back-to-back, they are considered their own unique letter distinct from either \ or \. This means that when \ is seen in a string, a Hungarian culture-aware comparer treats it as a single collation element. + +| String | As collation elements | Remarks | +|---|---|---| +| `"endz"` | `"e" + "n" + "d" + "z"` | (using a standard linguistic comparer) | +| `"endz"` | `"e" + "n" + "dz"` | (using a Hungarian culture-aware comparer) | + +When using a Hungarian culture-aware comparer, this means that the string `"endz"` *does not* end with the substring `"z"`, as <\dz\> and <\z\> are considered collation elements with different semantic meaning. + +```cs +// Set thread culture to Hungarian +CultureInfo.CurrentCulture = CultureInfo.GetCultureInfo("hu-HU"); +Console.WriteLine("endz".EndsWith("z")); // Prints 'False' + +// Set thread culture to invariant culture +CultureInfo.CurrentCulture = CultureInfo.InvariantCulture; +Console.WriteLine("endz".EndsWith("z")); // Prints 'True' +``` + +> [!NOTE] +> +> - Behavior: Linguistic and culture-aware comparers can undergo behavioral adjustments from time to time. Both ICU and the older Windows NLS facility are updated to account for how world languages change. For more information, see the blog post [Locale (culture) data churn](/archive/blogs/shawnste/locale-culture-data-churn). The *Ordinal* comparer's behavior will never change since it performs exact bitwise searching and comparison. However, the *OrdinalIgnoreCase* comparer's behavior may change as Unicode grows to encompass more character sets and corrects omissions in existing casing data. +> - Usage: The comparers `StringComparison.InvariantCulture` and `StringComparison.InvariantCultureIgnoreCase` are linguistic comparers that are not culture-aware. That is, these comparers understand concepts such as the accented character é having multiple possible underlying representations, and that all such representations should be treated equal. But non-culture-aware linguistic comparers won't contain special handling for \ as distinct from \ or \, as shown above. They also won't special-case characters like the German Eszett (ß). + +.NET also offers the *invariant globalization mode*. This opt-in mode disables code paths that deal with linguistic search and comparison routines. In this mode, all operations use *Ordinal* or *OrdinalIgnoreCase* behaviors, regardless of what `CultureInfo` or `StringComparison` argument the caller provides. For more information, see [Run-time configuration options for globalization](../../core/run-time-config/globalization.md) and [.NET Core Globalization Invariant Mode](https://github.com/dotnet/runtime/blob/master/docs/design/features/globalization-invariant-mode.md). + +For more information, see [Best practices for comparing strings in .NET](best-practices-strings.md). + +## Security implications + +If your app uses an affected API for filtering, we recommend enabling the CA1307 and CA1309 code analysis rules to help locate places where a linguistic search may have inadvertently been used instead of an ordinal search. Code patterns like the following may be susceptible to security exploits. + +```cs +// +// THIS SAMPLE CODE IS INCORRECT. +// DO NOT USE IT IN PRODUCTION. +// +public bool ContainsHtmlSensitiveCharacters(string input) +{ + if (input.IndexOf("<") >= 0) { return true; } + if (input.IndexOf("&") >= 0) { return true; } + return false; +} +``` + +Because the `string.IndexOf(string)` method uses a linguistic search by default, it's possible for a string to contain a literal `'<'` or `'&'` character and for the `string.IndexOf(string)` routine to return `-1`, indicating that the search substring was not found. Code analysis rules CA1307 and CA1309 flag such call sites and alert the developer that there's a potential problem. + +## Default search and comparison types + +The following table lists the default search and comparison types for various string and string-like APIs. If the caller provides an explicit `CultureInfo` or `StringComparison` parameter, that parameter will be honored over any default. + +| API | Default behavior | Remarks | +|---|---|---| +| `string.Compare` | CurrentCulture | | +| `string.CompareTo` | CurrentCulture | | +| `string.Contains` | Ordinal | | +| `string.EndsWith` | Ordinal | (when the first parameter is a `char`) | +| `string.EndsWith` | CurrentCulture | (when the first parameter is a `string`) | +| `string.Equals` | Ordinal | | +| `string.GetHashCode` | Ordinal | | +| `string.IndexOf` | Ordinal | (when the first parameter is a `char`) | +| `string.IndexOf` | CurrentCulture | (when the first parameter is a `string`) | +| `string.IndexOfAny` | Ordinal | | +| `string.LastIndexOf` | Ordinal | (when the first parameter is a `char`) | +| `string.LastIndexOf` | CurrentCulture | (when the first parameter is a `string`) | +| `string.LastIndexOfAny` | Ordinal | | +| `string.Replace` | Ordinal | | +| `string.Split` | Ordinal | | +| `string.StartsWith` | Ordinal | (when the first parameter is a `char`) | +| `string.StartsWith` | CurrentCulture | (when the first parameter is a `string`) | +| `string.ToLower` | CurrentCulture | | +| `string.ToLowerInvariant` | InvariantCulture | | +| `string.ToUpper` | CurrentCulture | | +| `string.ToUpperInvariant` | InvariantCulture | | +| `string.Trim` | Ordinal | | +| `string.TrimEnd` | Ordinal | | +| `string.TrimStart` | Ordinal | | +| `string == string` | Ordinal | | +| `string != string` | Ordinal | | + +Unlike `string` APIs, all `MemoryExtensions` APIs perform *Ordinal* searches and comparisons by default, with the following exceptions. + +| API | Default behavior | Remarks | +|---|---|---| +| `MemoryExtensions.ToLower` | CurrentCulture | (when passed a null `CultureInfo` argument) | +| `MemoryExtensions.ToLowerInvariant` | InvariantCulture | | +| `MemoryExtensions.ToUpper` | CurrentCulture | (when passed a null `CultureInfo` argument) | +| `MemoryExtensions.ToUpperInvariant` | InvariantCulture | | + +A consequence is that when converting code from consuming `string` to consuming `ReadOnlySpan`, behavioral changes may be introduced inadvertently. An example of this follows. + +```cs +string str = GetString(); +if (str.StartsWith("Hello")) { /* do something */ } // this is a CULTURE-AWARE (linguistic) comparison + +ReadOnlySpan span = s.AsSpan(); +if (span.StartsWith("Hello")) { /* do something */ } // this is an ORDINAL (non-linguistic) comparison +``` + +The recommended way to address this is to pass an explicit `StringComparison` parameter to these APIs. The code analysis rules CA1307 and CA1309 can assist with this. + +```cs +string str = GetString(); +if (str.StartsWith("Hello", StringComparison.Ordinal)) { /* do something */ } // ordinal comparison + +ReadOnlySpan span = s.AsSpan(); +if (span.StartsWith("Hello", StringComparison.Ordinal)) { /* do something */ } // ordinal comparison +``` diff --git a/docs/standard/globalization-localization/globalization-icu.md b/docs/standard/globalization-localization/globalization-icu.md index 6630b9f476f47..6e6fb90630afe 100644 --- a/docs/standard/globalization-localization/globalization-icu.md +++ b/docs/standard/globalization-localization/globalization-icu.md @@ -28,12 +28,12 @@ Starting with .NET 5.0, developers have more control over which underlying libra ## ICU on Windows -Windows 10 May 2019 Update and later versions include [icu.dll](/windows/win32/intl/international-components-for-unicode--icu-) as part of the OS, and .NET 5.0 and later versions use ICU by default. When running on Windows, .NET 5.0 and later versions try to load `icu.dll` and if it's available, uses it for the globalization implementation. If that library can't be found or loaded, such as when running on older versions of Windows, .NET 5.0 and later versions fall back to the NLS-based implementation. +Windows 10 May 2019 Update and later versions include [icu.dll](/windows/win32/intl/international-components-for-unicode--icu-) as part of the OS, and .NET 5.0 and later versions use ICU by default. When running on Windows, .NET 5.0 and later versions try to load `icu.dll` and, if it's available, use it for the globalization implementation. If the ICU library can't be found or loaded, such as when running on older versions of Windows, .NET 5.0 and later versions fall back to the NLS-based implementation. > [!NOTE] > Even when using ICU, the `CurrentCulture`, `CurrentUICulture`, and `CurrentRegion` members still use Windows operating system APIs to honor user settings. -### Using NLS instead of ICU +### Use NLS instead of ICU Using ICU instead of NLS may result in behavioral differences with some globalization-related operations. To revert back to using NLS, a developer can opt out of the ICU implementation. Applications can enable NLS mode in any of the following ways: @@ -66,7 +66,7 @@ For more information, see [Run-time config settings](../../core/run-time-config/ ## App-local ICU -Each release of ICU may bring with it bug fixes as well as updated Common Locale Data Repository (CLDR) data that describes the world's languages. Moving between versions of ICU can subtly impact app behavior when it comes to globalization-related operations. To help application developers ensure consistency across all deployments, .NET 5.0 and later versions enable apps on both Windows and Unix to carry and use their own copy of ICU. +Each release of ICU may bring with it bug fixes as well as updated Common Locale Data Repository (CLDR) data that describes the world's languages. Moving between versions of ICU can subtly impact app behavior when it comes to globalization-related operations. To help application developers ensure consistency across all deployments, .NET 5.0 and later versions enable apps on both Windows and Unix to carry and use their own copy of ICU. Applications can opt in to an app-local ICU implementation mode in any of the following ways: @@ -92,7 +92,7 @@ Applications can opt in to an app-local ICU implementation mode in any of the fo - By setting the environment variable `DOTNET_SYSTEM_GLOBALIZATION_APPLOCALICU` to the value `:` or ``. - ``: Optional suffix of less than 36 characters in length, following the public ICU packaging conventions. When building a custom ICU, you can customize it to produce the lib names and exported symbol names to contain a suffix, for example, `libicuucmyapp`, where `myapp` is the suffix. + ``: Optional suffix of fewer than 36 characters in length, following the public ICU packaging conventions. When building a custom ICU, you can customize it to produce the lib names and exported symbol names to contain a suffix, for example, `libicuucmyapp`, where `myapp` is the suffix. ``: A valid ICU version, for example, 67.1. This version is used to load the binaries and to get the exported symbols. @@ -115,7 +115,7 @@ This must be done for all the ICU binaries for the supported runtimes. Also, the ### macOS behavior -`macOS` has a different behavior for resolving dependent dynamic libraries from the load commands specified in the `match-o` file than the Linux loader. In the Linux loader, .NET can try `libicudata`, `libicuuc`, and `libicui18n` (in that order) to satisfy ICU dependency graph. However, on macOS, this doesn't work. When building ICU on macOS, you, by default, get a dynamic library with these load commands in `libicuuc`. For example.: +`macOS` has a different behavior for resolving dependent dynamic libraries from the load commands specified in the `match-o` file than the Linux loader. In the Linux loader, .NET can try `libicudata`, `libicuuc`, and `libicui18n` (in that order) to satisfy ICU dependency graph. However, on macOS, this doesn't work. When building ICU on macOS, you, by default, get a dynamic library with these load commands in `libicuuc`. The following snippet shows an example. ```sh ~/ % otool -L /Users/santifdezm/repos/icu-build/icu/install/lib/libicuuc.67.1.dylib diff --git a/docs/standard/globalization-localization/toc.yml b/docs/standard/globalization-localization/toc.yml index 0c61641c3e430..7d4ea0b50bf6d 100644 --- a/docs/standard/globalization-localization/toc.yml +++ b/docs/standard/globalization-localization/toc.yml @@ -12,16 +12,15 @@ - name: Culture-insensitive string operations href: culture-insensitive-string-operations.md items: - - name: Perform culture-insensitive string operations + - name: Overview href: performing-culture-insensitive-string-operations.md - items: - - name: String comparisons - href: performing-culture-insensitive-string-comparisons.md - - name: Case changes - href: performing-culture-insensitive-case-changes.md - - name: String operations in collections - href: performing-culture-insensitive-string-operations-in-collections.md - - name: String operations in arrays - href: performing-culture-insensitive-string-operations-in-arrays.md + - name: String comparisons + href: performing-culture-insensitive-string-comparisons.md + - name: Case changes + href: performing-culture-insensitive-case-changes.md + - name: String operations in collections + href: performing-culture-insensitive-string-operations-in-collections.md + - name: String operations in arrays + href: performing-culture-insensitive-string-operations-in-arrays.md - name: Best practices for developing world-ready apps href: best-practices-for-developing-world-ready-apps.md From 2ff50ccc111bdc843a0806021cce308d6a4a2a1b Mon Sep 17 00:00:00 2001 From: Genevieve Warren <24882762+gewarren@users.noreply.github.com> Date: Tue, 3 Nov 2020 20:26:19 -0800 Subject: [PATCH 2/3] xref links --- .../string-comparison-net-5-plus.md | 52 ++++++++++++------- 1 file changed, 32 insertions(+), 20 deletions(-) diff --git a/docs/standard/base-types/string-comparison-net-5-plus.md b/docs/standard/base-types/string-comparison-net-5-plus.md index bfa7673c76394..795ced84f094e 100644 --- a/docs/standard/base-types/string-comparison-net-5-plus.md +++ b/docs/standard/base-types/string-comparison-net-5-plus.md @@ -22,7 +22,8 @@ string s = "Hello\r\nworld!"; int idx = s.IndexOf("\n"); Console.WriteLine(idx); -// The above snippet prints: +// The snippet prints: +// // '6' when running on .NET Framework (Windows) // '6' when running on .NET Core 2.x - 3.x (Windows) // '-1' when running on .NET 5 (Windows) @@ -32,9 +33,11 @@ Console.WriteLine(idx); ## Guard against unexpected behavior +This section provides two options for dealing with unexpected behavior changes in .NET 5.0. + ### Enable code analyzers -Code analyzers can detect possibly buggy call sites. To help guard against any surprising behaviors, we recommend installing [the __Microsoft.CodeAnalysis.FxCopAnalyzers__ NuGet package](https://www.nuget.org/packages/Microsoft.CodeAnalysis.FxCopAnalyzers/) into your project. This package includes the code analysis rules __CA1307__ and __CA1309__, which help flag code that might inadvertently be using a linguistic comparer when an ordinal comparer was likely intended. +[Code analyzers](../../fundamentals/code-analysis/overview.md) can detect possibly buggy call sites. To help guard against any surprising behaviors, we recommend installing [the __Microsoft.CodeAnalysis.FxCopAnalyzers__ NuGet package](https://www.nuget.org/packages/Microsoft.CodeAnalysis.FxCopAnalyzers/) into your project. This package includes the code analysis rules [CA1307](../../fundamentals/code-analysis/quality-rules/ca1307.md) and [CA1309](../../fundamentals/code-analysis/quality-rules/ca1309.md), which help flag code that might inadvertently be using a linguistic comparer when an ordinal comparer was likely intended. For example: @@ -90,29 +93,29 @@ For more information about these code analyzer rules, including when it might be To revert .NET 5 applications back to older NLS behaviors when running on Windows, follow the steps in [.NET Globalization and ICU](../globalization-localization/globalization-icu.md). This application-wide compatibility switch must be set at the application level. Individual libraries cannot opt-in or opt-out of this behavior. > [!TIP] -> We strongly recommend you use the __CA1307__ and __CA1309__ analyzer rules that were mentioned previously to help improve code hygiene and discover any existing latent bugs. +> We strongly recommend you enable the [CA1307](../../fundamentals/code-analysis/quality-rules/ca1307.md) and [CA1309](../../fundamentals/code-analysis/quality-rules/ca1309.md) code analysis rules to help improve code hygiene and discover any existing latent bugs. For more information, see [Enable code analyzers](#enable-code-analyzers). -### Affected APIs +## Affected APIs Most .NET applications won't encounter any unexpected behaviors due to the changes in .NET 5.0. However, due to the number of affected APIs and how foundational these APIs are to the wider .NET ecosystem, you should be aware of the potential for .NET 5.0 to introduce unwanted behaviors or to expose latent bugs that already exist in your application. The affected APIs include: -* [`System.String.Compare`](https://docs.microsoft.com/dotnet/api/system.string.compare) -* [`System.String.EndsWith`](https://docs.microsoft.com/dotnet/api/system.string.endswith) -* [`System.String.IndexOf`](https://docs.microsoft.com/dotnet/api/system.string.indexof) -* [`System.String.StartsWith`](https://docs.microsoft.com/dotnet/api/system.string.startswith) -* [`System.String.ToLower`](https://docs.microsoft.com/dotnet/api/system.string.tolower) -* [`System.String.ToLowerInvariant`](https://docs.microsoft.com/dotnet/api/system.string.tolowerinvariant) -* [`System.String.ToUpper`](https://docs.microsoft.com/dotnet/api/system.string.toupper) -* [`System.String.ToUpperInvariant`](https://docs.microsoft.com/dotnet/api/system.string.toupperinvariant) -* [`System.Globalization.TextInfo`](https://docs.microsoft.com/dotnet/api/system.globalization.textinfo) (most members) -* [`System.Globalization.CompareInfo`](https://docs.microsoft.com/dotnet/api/system.globalization.compareinfo) (most members) -* [`System.Array.Sort`](https://docs.microsoft.com/dotnet/api/system.array.sort) (when sorting arrays of strings) -* [`System.Collections.Generic.List.Sort`](https://docs.microsoft.com/dotnet/api/system.collections.generic.list-1.sort) (when the list elements are strings) -* [`System.Collections.Generic.SortedDictionary`](https://docs.microsoft.com/dotnet/api/system.collections.generic.sorteddictionary-2) (when the keys are strings) -* [`System.Collections.Generic.SortedList`](https://docs.microsoft.com/dotnet/api/system.collections.generic.sortedlist-2) (when the keys are strings) -* [`System.Collections.Generic.SortedSet`](https://docs.microsoft.com/dotnet/api/system.collections.generic.sortedset-1) (when the set contains strings) +- +- +- +- +- +- +- +- +- (most members) +- (most members) +- (when sorting arrays of strings) +- (when the list elements are strings) +- (when the keys are strings) +- (when the keys are strings) +- (when the set contains strings) > [!NOTE] > This is not an exhaustive list of affected APIs. @@ -121,7 +124,7 @@ All of the above APIs use *linguistic* string searching and comparison using the Because ICU implements linguistic string comparisons differently from NLS, Windows-based applications that upgrade to .NET 5.0 from an earlier version of .NET Core or .NET Framework and that call one of the affected APIs may notice that the APIs begin exhibiting different behaviors. -#### Exceptions +### Exceptions * If an API accepts an explicit `StringComparison` or `CultureInfo` parameter, that parameter overrides the API's default behavior. * `System.String` members where the first parameter is of type `char` (for example, ) use ordinal searching, unless the caller passes an explicit `StringComparison` argument that specifies `CurrentCulture[IgnoreCase]` or `InvariantCulture[IgnoreCase]`. @@ -307,3 +310,12 @@ if (str.StartsWith("Hello", StringComparison.Ordinal)) { /* do something */ } // ReadOnlySpan span = s.AsSpan(); if (span.StartsWith("Hello", StringComparison.Ordinal)) { /* do something */ } // ordinal comparison ``` + +## See also + +- [Globalization breaking changes](../../core/compatibility/globalization.md) +- [Best practices for comparing strings in .NET](best-practices-strings.md) +- [How to compare strings in C#](../../csharp/how-to/compare-strings.md) +- [.NET globalization and ICU](../globalization-localization/globalization-icu.md) +- [Ordinal vs. culture-sensitive string operations](/dotnet/api/system.string#ordinal-vs-culture-sensitive-operations) +- [Overview of .NET source code analysis](../../fundamentals/code-analysis/overview.md) From 39c0a9a7344b4386f5132ddbf325bc4b5333591f Mon Sep 17 00:00:00 2001 From: Genevieve Warren <24882762+gewarren@users.noreply.github.com> Date: Tue, 3 Nov 2020 20:43:47 -0800 Subject: [PATCH 3/3] fix toc link --- docs/fundamentals/toc.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/fundamentals/toc.yml b/docs/fundamentals/toc.yml index c10f67711053e..7033a7345c17d 100644 --- a/docs/fundamentals/toc.yml +++ b/docs/fundamentals/toc.yml @@ -1432,7 +1432,7 @@ items: - name: Displaying and persisting formatted data href: ../standard/base-types/best-practices-display-data.md - name: Behavior changes in .NET 5+ (Windows) - href: string-comparison-net-5-plus.md + href: ../standard/base-types/string-comparison-net-5-plus.md - name: Basic string operations items: - name: Overview