-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache most popular string objects in String.InternalSubString #73540
Comments
I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label. |
Tagging subscribers to this area: @dotnet/area-system-runtime Issue DetailsI've analyzed a couple of memory snapshots using dotMemory and it feels to me that dotMemory gives hints where these duplicated strings come from and in case of "true" all are from Also, checked VS2022's snapshot, AvaloniaILSpy if you have good example to look at - please share My impression that it's worth special casing here it polls span's length and if it's 1 or 4 it runs a single compare with a constant to check if it's a known string. Related: "Improve Density of GC heap by String Interning (de-duping) on Gen2 GC" #9022
|
Prototype: EgorBo@1c73b3e |
String substring does not feel like the right place to do this optimization. Have you looked ať the top callers that create these strings? Would it make more sense to fix the callers instead? We do caller specific de-duplication in number of places, for example in integer formatting. |
In case of I assume it's here: https://github.com/dotnet/msbuild/blob/3ade6423189769545ddff2ffeeed37010ec57f4d/src/Build/Evaluation/Conditionals/Scanner.cs#L561-L641 |
Still, I think at least these two lines of code worth adding: https://github.com/EgorBo/runtime-1/blob/c487556f0a1131833ce22dfe40cb89fca88b5dcf/src/libraries/System.Private.CoreLib/src/System/String.Manipulation.cs#L1850-L1854 - cheap handling for '0'..'9', right?
runtime/src/libraries/System.Private.CoreLib/src/System/Number.Formatting.cs Lines 1594 to 1603 in 7f9ed8f
|
I am not sure. Substrings of this shape are likely going to be produced by allocation heavy parsers, they are unlikely to survive gen0 gc, and so getting rid of them is unlikely to make a difference in the app high water mark that tends to be dominated by gen2. This optimization is trying to improve certain inefficient allocation heavy parsers a tiny bit, but makes all substring calls to pay for it. |
We used to intern some hard coded strings like true and false in MSBuild including in that codepath ("OpportunisticIntern"). I can't look at the code at the moment but I believe it was removed a little while ago. @Forgind @rainersigwald |
Thanks for the feedback, I'll check if it's worth special-casing on msbuild's side
From my understanding, string interning doesn't help with String.Concat/String.Substring which are the main sources of string allocations |
msbuild has custom string deduplication engine that is able to take spans (that covers substrings) and string builders (that covers concatenated strings), without material if the string. Check description of dotnet/msbuild#5663 for details. Roslyn or S.R.Metadata have custom string interning optimized for the domain specific patterns as well. |
For this callsite we definitely have some work to do spanifying the parsing, but the strings themselves should be extremely short-lived because of the deduplication engine @jkotas mentioned. |
I've analyzed a couple of memory snapshots using dotMemory and it feels to me that
String.InternalSubString
is the main source of duplicated strings on the heap, e.g. total heap statistics fordotnet publish -c Release ...
(msbuild+roslyn+nuget+illink+crossgen) over a medium-size project:500K string objects, mostly from Substring
And here is a memory snapshot taken in a random place during that
dotnet publish
:Also, checked VS2022's snapshot, AvaloniaILSpy if you have good example to look at - please share
My impression that it's worth special casing
"0"
,"1"
,"true"
,"false"
inString.InternalSubString
and those paths should not hurt perf much since string compare operations are unrolled, e.g.:here it polls span's length and if it's 1 or 4 it runs a single compare with a constant to check if it's a known string.
Related: "Improve Density of GC heap by String Interning (de-duping) on Gen2 GC" #9022
The text was updated successfully, but these errors were encountered: