Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comparer<string>.Default sort order different on Linux/OSX than Windows or Mono #20109

Closed
akoeplinger opened this issue Feb 5, 2017 · 20 comments
Labels
area-System.Globalization os-linux Linux OS (any supported distro)
Milestone

Comments

@akoeplinger
Copy link
Member

Found this while investigating #20007 (edit: see https://github.com/dotnet/corefx/issues/15825#issuecomment-277564885 where it looks like Comparer<string>.Default is the underlying culprit):

using System;
using System.Linq;

namespace ConsoleApp
{
    class Program
    {
        static void Main(string[] args)
        {
            var keys = new string[]
            {
                "M:System.UriBuilder.#ctor",
                "M:System.UriBuilder.#ctor(System.String)",
                "M:System.UriBuilder.#ctor(System.Uri)",
                "M:System.UriBuilder.#ctor(System.String,System.String)",
                "M:System.UriBuilder.#ctor(System.String,System.String,System.Int32)",
                "M:System.UriBuilder.#ctor(System.String,System.String,System.Int32,System.String)",
                "M:System.UriBuilder.#ctor(System.String,System.String,System.Int32,System.String,System.String)"
            };

            var items = new int[7] { 0, 1, 2, 3, 4, 5, 6 };

            var ordered = items.OrderBy(s => keys[s]).ToList();

            Console.WriteLine(String.Join(Environment.NewLine, ordered));
        }
    }
}

This prints the following on OSX/Linux:

0
6
5
4
3
1
2

However, on Windows .NET Core or .NET Framework, as well as Mono on OSX/Linux it prints:

0
1
3
4
5
6
2

I suspected a difference due to culture/language, but forcing CurrentCulture to "en-US" didn't change the output. Maybe I'm not setting it correctly?


$ dotnet --info
.NET Command Line Tools (2.0.0-alpha-004775)

Product Information:
 Version:            2.0.0-alpha-004775
 Commit SHA-1 hash:  796ebe1f1e

Runtime Environment:
 OS Name:     Mac OS X
 OS Version:  10.11
 OS Platform: Darwin
 RID:         osx.10.11-x64
 Base Path:   /usr/local/share/dotnet/sdk/2.0.0-alpha-004775/
@JonHanna
Copy link
Contributor

JonHanna commented Feb 5, 2017

I suspect the difference is with Comparer<string>.Default. Can you see if it behaves differently with the values involved?

@akoeplinger
Copy link
Member Author

@JonHanna looks like you're right:

foreach (string k in keys)
{
    foreach (string i in keys) Console.Write(Comparer<string>.Default.Compare(k, i));
    Console.WriteLine();
}

prints this on Windows/Mono:

0-1-1-1-1-1-1
10-1-1-1-1-1
1101111
11-10-1-1-1
11-110-1-1
11-1110-1
11-11110

and this on .NET Core OSX:

0-1-1-1-1-1-1
10-11111
1101111
1-1-10111
1-1-1-1011
1-1-1-1-101
1-1-1-1-1-10

@JonHanna
Copy link
Contributor

JonHanna commented Feb 6, 2017

OrderBy is doing what it should given that. The bug (or defined difference in behaviour between OSs) is in the comparer, so the issue title should probably be changed to get the right people's attention.

@akoeplinger akoeplinger changed the title Enumerable.OrderBy sort order different on Linux/OSX than Windows or Mono Comparer<string>.Default sort order different on Linux/OSX than Windows or Mono Feb 6, 2017
@akoeplinger
Copy link
Member Author

Changed the title. Though looking at it Comparer<string>.Default is just GenericComparer<string> which simply does stringA.CompareTo(stringB), and indeed if I change my sample above to use that I get the same output. So seems to me this really boils down to some core sorting data/algorithm that is different between the OSes.

@jkotas
Copy link
Member

jkotas commented Feb 6, 2017

Are both OSes same culture - what does Console.WriteLine(System.Globalization.CultureInfo.CurrentCulture); say?

@akoeplinger
Copy link
Member Author

@jkotas it's empty. But I get the same result if I force English with LANG=en_US env var or setting CultureInfo.CurrentCulture to en-US as I said in the first post.

@stephentoub
Copy link
Member

stephentoub commented Feb 6, 2017

A more concise repro:

using System;
using System.Globalization;

class Program
{
    static void Main()
    {
        Console.WriteLine(CultureInfo.CurrentCulture.CompareInfo.Name);
        Console.WriteLine(CultureInfo.CurrentCulture.CompareInfo.Compare(",", ")"));
    }
}

On my Windows machine this outputs:

en-US
1

On my Linux machine this outputs:

en-US
-1

@stephentoub
Copy link
Member

cc: @tarekgh, @ellismg, @eerhardt

@eerhardt
Copy link
Member

eerhardt commented Feb 6, 2017

On Linux, .NET Core uses ICU for string collation.

See http://demo.icu-project.org/icu-bin/collation.html

And type in , ) in the Input box:
image

So unless we are choosing the wrong culture, whatever ICU says should be the order is what is returned.

Note there is never a guarantee of culture information being consistent between OSs, or even different versions of the same OS. If you are looking for consistent ordering, Ordinal is your friend.

@akoeplinger
Copy link
Member Author

@eerhardt Mono uses ICU as well, so there shouldn't be a difference AFAIK.

If you select en-US-u-va-posix (type=standard): American English (Computer, Standard Sort Order) in the demo link you posted, then it reorders the input:

image

@eerhardt
Copy link
Member

eerhardt commented Feb 6, 2017

If you wanted to use en-us-posix culture, you can:

            Console.WriteLine(CultureInfo.CurrentCulture.CompareInfo.Name);
            Console.WriteLine(CultureInfo.CurrentCulture.CompareInfo.Compare(",", ")"));

            CultureInfo posix = new CultureInfo("en-us-posix");
            Console.WriteLine(posix.CompareInfo.Name);
            Console.WriteLine(posix.CompareInfo.Compare(",", ")"));
bash-3.2$ dotnet run
en-US
-1
en-US-POSIX
1

@tarekgh
Copy link
Member

tarekgh commented Feb 7, 2017

@eerhardt so you are saying this is by design? just want to know if there is anything we need to do here.

@eerhardt
Copy link
Member

eerhardt commented Feb 7, 2017

@eerhardt so you are saying this is by design?

We just use whatever ICU says. So yes, I kind of think this is "by design", unless we are saying we are using ICU incorrectly. We (.NET Core) don't actually control culture-based string comparisons.

@stephentoub
Copy link
Member

unless we are saying we are using ICU incorrectly

Or, it sounds like unless we're saying we think we should change the default culture. If Mono is also using ICU but is getting different results, is it using a different culture by default?

@eerhardt
Copy link
Member

eerhardt commented Feb 7, 2017

If Mono is also using ICU but is getting different results,

I did some searching last night in mono, and I couldn't find where ICU was actually being used.

I see this code: https://github.com/mono/mono/blob/7ca51073c638becc090e2c974ecaa739af62e4a3/mcs/class/referencesource/mscorlib/system/globalization/compareinfo.cs#L451-L456

#if MONO
            return internal_compare_switch (string1, 0, string1.Length, string2, 0, string2.Length, options);
#else
            return InternalCompareString(m_dataHandle, m_handleOrigin, m_sortName, string1, 0, string1.Length, string2, 0, string2.Length, GetNativeCompareFlags(options));
#endif

Where internal_compare_switch looks like:

https://github.com/mono/mono/blob/0bcbe39b148bb498742fc68416f8293ccd350fb6/mcs/class/corlib/ReferenceSources/CompareInfo.cs#L111-L118

			return UseManagedCollation ?
				internal_compare_managed (str1, offset1, length1,
				str2, offset2, length2, options) :
				internal_compare (str1, offset1, length1,
				str2, offset2, length2, options);

And "ManagedCollation" looks like it is on by default:

		static bool UseManagedCollation {
			get {
				if (!managedCollationChecked) {
					managedCollation = Environment.internalGetEnvironmentVariable ("MONO_DISABLE_MANAGED_COLLATION") != "yes" && MSCompatUnicodeTable.IsReady;
					managedCollationChecked = true;
				}

				return managedCollation;
			}
		}

That MSCompatUnicodeTable looks interesting though. Maybe Mono is trying to match Windows culture information?

Anyway, if it doesn't use "managed collation", it just does invariant:

https://github.com/mono/mono/blob/7f1b9c9e86a2544dbc7fcd2089899b99177101e2/mono/metadata/locales.c#L713

int ves_icall_System_Globalization_CompareInfo_internal_compare (MonoCompareInfo *this_obj, MonoString *str1, gint32 off1, gint32 len1, MonoString *str2, gint32 off2, gint32 len2, gint32 options)
{
	/* Do a normal ascii string compare, as we only know the
	 * invariant locale if we dont have ICU
	 */
	return(string_invariant_compare (str1, off1, len1, str2, off2, len2,
					 options));
}

@akoeplinger
Copy link
Member Author

Seems like I misremembered, we do use the CLDR data (which ICU uses too) for certain things like date/time formats, but not ICU directly. String comparison is done differently too, so this seems like it's working as expected.

Sorry about that, it makes sense now why .NET Core behaves different in this case. Thanks 👍

@wickdninja
Copy link

wickdninja commented Sep 6, 2018

We are in the process of migrating some legacy applications to core to run on Linux. Does anyone have a recommended workaround for this ( I need the sort order to be identical across platforms)? I would rather not change the internal behaviors of the project/s during the port. I would prefer to preserve the current behavior and possibly refactor out any needed workarounds during a future iteration. Thanks!

@tarekgh
Copy link
Member

tarekgh commented Sep 6, 2018

( I need the sort order to be identical across platforms)?

There is no good way to do that rather than build your own sorting component and use it. That is similar to what SQL is doing.

@wickdninja
Copy link

@tarekgh thanks for your quick response.

@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the 2.0.0 milestone Jan 31, 2020
@xentrax
Copy link

xentrax commented Feb 24, 2020

I cannot find any way to make XslCompiledTransform sort according to good old ASCII both on Linux and Windows. On Linux I can use CultureInfo("en-us-posix") but on Windows I don't know what to do.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Globalization os-linux Linux OS (any supported distro)
Projects
None yet
Development

No branches or pull requests

9 participants