-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent .NET GC process memory usage reporting #49817
Comments
Tagging subscribers to this area: @dotnet/gc Issue DetailsTL;DRGC / dotnet-gcdump say process uses 400-500MB. Task Manager says 2.0-2.5GB. "Missing" >1GB memory is cleared filled with garbage managed strings. Whisky Tango Foxtrot ? ContextWe have a C# service that reads in the metadata of ~150K parquet files using ParquetSharp and creates an in-memory index of all this data. The reading is multithreaded and spread across 14/28 cores (physical/logical), with the data coming from a file share. Since there is a lot of data repetition, I have spent some time analyzing and improving the general memory usage (i.e. mostly by sharing immutable instances of arrays instead of maintaining separate copies when the data turns out to be identical). This is an ASP.NET Core 3.1 application, running as x64 on Windows 10 via JetBrains Rider (as I'm developing and testing it). All memory measurements and analysis are done after calling: GCSettings.LargeObjectHeapCompactionMode = GCLargeObjectHeapCompactionMode.CompactOnce;
GC.Collect(); The IssueThe reason why I'm raising this ticket is that I've never been able to match the memory usage reported by Task Manager (Details Tab) and the GC memory stats.
InvestigationAt first I suspected a memory leak in the native components of ParquetSharp, but couldn't find anything there. Also switching to non-server GC reduces the process size to ~1.5GB (I reverted back to server GC afterwards). In desperation I just opened in Rider the 2.5GB memory dump produced by Doing the string splitting and parsing using I think I can conclude at that point that:
Issue and QuestionsThis leaves with a lot of questions. The three top ones being:
TODOI'll see if I can reproduce this behaviour in a small demo application. Worth testing on .NET 5.0 as well.
|
I can suggest a few potential issues to look out for + some other tools that might help track down the mystery memory if it isn't one of my guesses.
If neither of these account for it a few other tools that might shed more light:
One more useful resource is the GC memdoc. Hopefully some of this will help things add up, but if the memory remains a mystery let us know what you found and we'll figure out where to look next. |
I've attached a demo running on .NET5: Results on a R5950X 128GB: Server GC
Workstation GC
TotalCommittedBytes seems to be the closest to what I was looking for, as it does account for the "missing" memory. This property was not available in .NET 3.1. The Workstation GC seems to be fairly good at releasing the working set pages back to the OS, but the Server GC hogs 10GB even though only 73MB are actually used by the GC. Is this intentional? Is there a way to force the GC to release the memory that is unused? |
Check this out: #48601 |
this is most likely due to how we handle the ephemeral segments which is described here. committed bytes can be roughly approximated to the |
I think I'm happy to close this ticket. The API to report the committed memory has been added in .NET 5, so it's not like I'm left wondering whether we have a native memory leak in our code anymore. Thanks for the link @Maoni0 , that's a very useful GC documentation. |
TL;DR
GC / dotnet-gcdump say process uses 400-500MB. Task Manager says 2.0-2.5GB. "Missing" >1GB memory is clearly filled with garbage managed strings. Whisky Tango Foxtrot ?
Context
We have a C# service that reads in the metadata of ~150K parquet files using ParquetSharp and creates an in-memory index of all this data. The reading is multithreaded and spread across 14/28 cores (physical/logical), with the data coming from a file share. Since there is a lot of data repetition, I have spent some time analyzing and improving the general memory usage (i.e. mostly by sharing immutable instances of arrays instead of maintaining separate copies when the data turns out to be identical).
This is an ASP.NET Core 3.1 application, running as x64 on Windows 10 via JetBrains Rider (as I'm developing and testing it).
All memory measurements and analysis are done after calling:
The Issue
The reason why I'm raising this ticket is that I've never been able to match the memory usage reported by Task Manager (Details Tab) and the GC memory stats.
GC.GetTotalMemory(forceFullCollection: true)
returns about ~400MB.GC.GetGCMemoryInfo().HeapSizeBytes
returns about ~400MB as well (usually slightly larger than the previous value).dotnet-gcdump
+eeheap
givesGC Heap Size
as ~500MB (spread across 28 heaps - which seems to match the number of logical cores).process.PagedMemorySize64
reports around 2GB.dotnet-dump
(not gcdump) creates a ~2.5GB file.Investigation
At first I suspected a memory leak in the native components of ParquetSharp, but couldn't find anything there. Also switching to non-server GC reduces the process size to ~1.5GB (I reverted back to server GC afterwards).
In desperation I just opened in Rider the 2.5GB memory dump produced by
dotnet-dump
and viewed it as a text file. A good chunk of the file happened to be C# strings. In fact, they were clearly garbage strings fromstring.Split()
operations; our custom Parquet metadata containing a lot of semi-colon-separated lists.Doing the string splitting and parsing using
ReadOnlySpan
views to avoid creating loads of temporary string reduced the total process memory usage from 2.0-2.5GB to 1.0GB. The values returned byGC.GetTotalMemory()
and `GC.GetGCMemoryInfo().HeapSizeBytes remain virtually unchanged.I think I can conclude at that point that:
Issue and Questions
This leaves with a lot of questions. The three top ones being:
dotnet-gcdump
?TODO
I'll see if I can reproduce this behaviour in a small demo application. Worth testing on .NET 5.0 as well.
The text was updated successfully, but these errors were encountered: