Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove unnecessary indirect use of managed wcslen from AppContext setup #101318

Merged
merged 7 commits into from
May 7, 2024

Conversation

kg
Copy link
Member

@kg kg commented Apr 19, 2024

At least on mono, during AppContext setup we use the string::.ctor(char*) constructor, which calls a managed, vectorized implementation of wcslen - this pulls in a bunch of vector code to optimize a codepath that is cold:

Generating (wrapper managed-to-managed) string:.ctor (char*) (7 byte(s)) dotnet.native.js:1345:16
Generating string:Ctor (char*) (57 byte(s)) dotnet.native.js:1345:16
Generating string:wcslen (char*) (7 byte(s)) dotnet.native.js:1345:16
Generating System.SpanHelpers:IndexOfNullCharacter (char*) (339 byte(s)) dotnet.native.js:1345:16
Generating System.SpanHelpers:UnalignedCountVector128 (char*) (16 byte(s)) dotnet.native.js:1345:16
Generating System.SpanHelpers:GetCharVector128SpanLength (intptr,intptr) (14 byte(s)) dotnet.native.js:1345:16
Generating System.Runtime.Intrinsics.Vector128:AsByte<uint16> (System.Runtime.Intrinsics.Vector128`1<uint16>) (7 byte(s)) dotnet.native.js:1345:16
Generating System.Runtime.Intrinsics.Vector128:As<uint16, byte> (System.Runtime.Intrinsics.Vector128`1<uint16>) (23 byte(s)) dotnet.native.js:1345:16
Generating System.ThrowHelper:ThrowForUnsupportedIntrinsicsVector128BaseType<uint16> () (15 byte(s)) dotnet.native.js:1345:16
Generating System.Runtime.Intrinsics.Vector128`1<uint16>:get_IsSupported () (346 byte(s)) dotnet.native.js:1345:16
Generating System.ThrowHelper:ThrowForUnsupportedIntrinsicsVector128BaseType<byte> () (15 byte(s)) dotnet.native.js:1345:16
Generating System.Runtime.Intrinsics.Vector128`1<byte>:get_IsSupported () (346 byte(s)) dotnet.native.js:1345:16
Generating System.Buffer:Memmove<char> (char&,char&,uintptr) (81 byte(s)) dotnet.native.js:1345:16

We also use the default comparer when creating an instance of Dictionary<string, object>, which pulls in GenericEqualityComparer and NonRandomizedStringEqualityComparer stuff we don't need when we could just use an OrdinalComparer directly:

Generating System.Collections.Generic.EqualityComparer`1<string>:get_Default () (38 byte(s)) dotnet.native.js:1345:16
Generating System.Collections.Generic.EqualityComparer`1<string>:CreateComparer () (191 byte(s)) dotnet.native.js:1345:16
Generating System.Collections.Generic.GenericEqualityComparer`1<string>:.ctor () (7 byte(s)) dotnet.native.js:1345:16
Generating System.Collections.Generic.EqualityComparer`1<string>:.ctor () (7 byte(s)) dotnet.native.js:1345:16
Generating System.Threading.Interlocked:CompareExchange<System.Collections.Generic.EqualityComparer`1<string>> (System.Collections.Generic.EqualityComparer`1<string>&,System.Collections.Generic.EqualityComparer`1<string>,System.Collections.Generic.EqualityComparer`1<string>) (56 byte(s)) dotnet.native.js:1345:16
Generating System.Runtime.CompilerServices.Unsafe:IsNullRef<System.Collections.Generic.EqualityComparer`1<string>> (System.Collections.Generic.EqualityComparer`1<string>&) (16 byte(s)) dotnet.native.js:1345:16
Generating (wrapper managed-to-native) System.Threading.Interlocked:CompareExchange (object&,object&,object&,object&) (16 byte(s)) dotnet.native.js:1345:16
Generating (wrapper runtime-invoke) object:runtime_invoke_direct_void (object,intptr,intptr,intptr) (112 byte(s)) dotnet.native.js:1345:16
Generating System.Collections.Generic.NonRandomizedStringEqualityComparer:.cctor () (46 byte(s)) dotnet.native.js:1345:16
Generating System.Collections.Generic.NonRandomizedStringEqualityComparer/OrdinalComparer:.ctor (System.Collections.Generic.IEqualityComparer`1<string>) (8 byte(s)) dotnet.native.js:1345:16
Generating System.Collections.Generic.NonRandomizedStringEqualityComparer:.ctor (System.Collections.Generic.IEqualityComparer`1<string>) (14 byte(s)) dotnet.native.js:1345:16
Generating System.StringComparer:get_Ordinal () (6 byte(s)) dotnet.native.js:1345:16
Generating (wrapper runtime-invoke) object:runtime_invoke_direct_void (object,intptr,intptr,intptr) (112 byte(s)) dotnet.native.js:1345:16
Generating System.OrdinalCaseSensitiveComparer:.cctor () (11 byte(s)) dotnet.native.js:1345:16
Generating System.OrdinalCaseSensitiveComparer:.ctor () (8 byte(s)) dotnet.native.js:1345:16
Generating System.OrdinalComparer:.ctor (bool) (14 byte(s)) dotnet.native.js:1345:16
Generating System.StringComparer:.ctor () (7 byte(s)) dotnet.native.js:1345:16
Generating System.StringComparer:get_OrdinalIgnoreCase () (6 byte(s)) dotnet.native.js:1345:16
Generating (wrapper runtime-invoke) object:runtime_invoke_direct_void (object,intptr,intptr,intptr) (112 byte(s)) dotnet.native.js:1345:16
Generating System.OrdinalIgnoreCaseComparer:.cctor () (11 byte(s)) dotnet.native.js:1345:16
Generating System.OrdinalIgnoreCaseComparer:.ctor () (8 byte(s)) dotnet.native.js:1345:16
Generating System.Collections.Generic.NonRandomizedStringEqualityComparer/OrdinalIgnoreCaseComparer:.ctor (System.Collections.Generic.IEqualityComparer`1<string>) (8 byte(s)) dotnet.native.js:1345:16
Generating System.Collections.Generic.NonRandomizedStringEqualityComparer:GetStringComparer (object) (44 byte(s)) dotnet.native.js:1345:16

We already compute the length of the appcontext arguments when converting them to UTF-16, at least in mono, so this PR switches to a 5-argument variant of Setup that accepts an array of lengths, so it can use a simpler/faster string constructor overload.

A lot of this code may get pulled in later by user code or library code in i.e. Blazor, but this ideally lets us get much further into app startup before that happens, at which point important network requests or background operations may have started instead of being blocked on the creation of AppContext.

@kg kg marked this pull request as ready for review April 19, 2024 23:39
@kg kg requested review from lambdageek and thaystg as code owners April 19, 2024 23:39
@kg kg requested a review from stephentoub April 19, 2024 23:39
@kg
Copy link
Member Author

kg commented Apr 19, 2024

Pinging stephen since github suggested him. I'd love to get your thoughts or thoughts from one of Dictionary's other maintainers on the right way to perform this startup optimization without lowering our code quality. I tried a few things before arriving at this current solution.

@jkotas
Copy link
Member

jkotas commented Apr 20, 2024

A lot of this code may get pulled in later by user code or library code in i.e. Blazor, but this ideally lets us get much further into app startup before that happens, at which point important network requests or background operations may have started instead of being blocked on the creation of AppContext.

How many methods execute in a typical Blazor app before network or background operations start?

@kg
Copy link
Member Author

kg commented Apr 20, 2024

A lot of this code may get pulled in later by user code or library code in i.e. Blazor, but this ideally lets us get much further into app startup before that happens, at which point important network requests or background operations may have started instead of being blocked on the creation of AppContext.

How many methods execute in a typical Blazor app before network or background operations start?

I don't have a way to measure that right now, I'm still working on getting a blazor app to start with a modified runtime build. For a simple blazor-less mono-wasm application, before I started working on this we were looking at around 850 different methods having code generated, and it's around 710 with my current fix set, and I have a theory for how to prune another 50 or so from that by making static constructors cheaper.

From what I've seen in profiles of aot'd blazor my guess is it's probably in the neighborhood of 5000 methods or more. The question is whether we can move enough stuff later in startup for apps to realistically kick off network traffic, I'll have a better sense of that once I get blazor working with my local runtime binaries so I can get dense logs of method compile order.

The impact of this class of optimization is probably much smaller for AOT startup.

@kg kg force-pushed the appcontext-setup-wcslen branch from 5fb43c4 to ec93425 Compare April 23, 2024 02:52
@kg kg changed the title Remove unnecessary indirect use of managed wcslen/EqualityComparer.Default from AppContext setup Remove unnecessary indirect use of managed wcslen from AppContext setup Apr 23, 2024
@jkotas jkotas self-requested a review April 23, 2024 03:05
@kg
Copy link
Member Author

kg commented Apr 24, 2024

Should I change this in coreclr too? It looks like corhost.cpp calls this Setup method as well.

@kg kg force-pushed the appcontext-setup-wcslen branch from ab96e68 to 0844d15 Compare April 24, 2024 21:28
@jkotas jkotas dismissed their stale review April 24, 2024 21:41

Dictionary.cs change was reverted

@jkotas
Copy link
Member

jkotas commented Apr 24, 2024

Should I change this in coreclr too? It looks like corhost.cpp calls this Setup method as well.

I do not think that the managed wcslen is a big deal on CoreCLR (it is expected to be r2r precompiled). Also, the string sizes are not readily available in CoreCLR.

For CoreCLR, I would pass the UTF8 key/value pairs to the managed side directly to create the managed strings without creating the intermediate copies. It would require more significant refactoring.

@kg
Copy link
Member Author

kg commented Apr 24, 2024

Should I change this in coreclr too? It looks like corhost.cpp calls this Setup method as well.

I do not think that the managed wcslen is a big deal on CoreCLR (it is expected to be r2r precompiled). Also, the string sizes are not readily available in CoreCLR.

For CoreCLR, I would pass the UTF8 key/value pairs to the managed side directly to create the managed strings without creating the intermediate copies. It would require more significant refactoring.

OK, if it's already there then we aren't improving things by allowing it to be trimmed. Thanks for the context.

@lambdageek lambdageek requested a review from fanyang-mono April 26, 2024 18:29
@kg kg force-pushed the appcontext-setup-wcslen branch from a3391e6 to 1ea8882 Compare April 30, 2024 17:48
@kg kg merged commit 256f320 into dotnet:main May 7, 2024
153 of 158 checks passed
michaelgsharp pushed a commit to michaelgsharp/runtime that referenced this pull request May 9, 2024
…ext setup (dotnet#101318)

Remove unnecessary indirect use of managed wcslen from AppContext setup since it pulls in vector dependencies and we already know the length of each string
Ruihan-Yin pushed a commit to Ruihan-Yin/runtime that referenced this pull request May 30, 2024
…ext setup (dotnet#101318)

Remove unnecessary indirect use of managed wcslen from AppContext setup since it pulls in vector dependencies and we already know the length of each string
@github-actions github-actions bot locked and limited conversation to collaborators Jun 7, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants