-
Notifications
You must be signed in to change notification settings - Fork 357
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to debug memory leak that is not clear on dotnet-dump? #4139
Comments
@Leonardo-Ferreira In the insights tab or using perfview or WinDBG + SOS you should check what your fragmentation looks like. Also, what does checking the "show dead objects" checkbox show? |
Dead object show a bit more data here... but not enough to account for the 300MB... The insights tab show a bit of waste on duplicated strings (26k objects totaling 603Kb) and some Sparse Arrays (24k objects totaling 1.36MB) im trying to install the SOS on my pod but I must say im getting my ass kicked... |
Tagging subscribers to this area: @tommcdon Issue DetailsI have an asp.net 7.0.9 api running on AKS and no one is using it... and yet, every minute, application insights reports that the memory usage grew a bit... it grows until the pod crashes and it starts again... take a look: I got a dump in a moment where the app insights was reporting that the app was using barely above 300MB and when I opened it on VS I got this: If you look closely, the first object is smaller than 1.5MB and the total sum is 53k objects and 4.8MB... I think its clear that I won't be able to track my leak here, so what should I do now?
|
You can open that dump in WinDBG on windows - no need to install within the container. You may need a copy of the binaries/pdbs for some metadata/commands. Alternatively dotnet-dump also works. |
Moving this issue to dotnet-diagnostics as this is a tooling question |
@hoyosjs I have been unable to reproduce this issue in windows, or even on docker... and the dump I've taken in the AKS pods, I did using dotnet-dump... |
@Leonardo-Ferreira we support Linux x64 cross-plat dumps in Visual Studio and Windbg, so you can copy the Linux dump to Windows and open it in VS/Windbg, for example. |
im really sorry @tommcdon, I did opened it on VS and posted reference screenshots, im just lost here kinda of going a bit crazy... when I try to use windbg, I get an error when using commands such as On a sidebar question, if the memory leaked, it is not supposed to be trackable via managed memory, right? |
|
Also, I'd collect a Full dump - in case this is native memory somewhere. |
@hoyosjs in all cases (full dump or heap dump) the answer to |
You need to install sos (https://learn.microsoft.com/en-us/dotnet/core/diagnostics/dotnet-sos, and make sure to run dotnet-sos install). After that it tells you a command that starts with |
And it's not reporting any fragmentation? What does !eeheap show? |
Also, what container flavor is this? |
@hoyosjs The container is the default image for asp.net7 from microsoft... i think is debian... !eeheap is:
im a bit new on this, if you wish, please educate me on the highlights |
The third to last line tells you what you knew - 12mb of objects are there. The next line tells you the GC had 21mb reported to the OS as memory it manages. The very last line tells you that the bookkeeping structures in the runtime + the managed heap take ~60 MB. This means two things might be happening: 1) there's some native memory that we don't know of - for example some library may be using memory we don't know of. 2) The allocators of the C library may be rooting memory and not returning it fast enough. The runtime thinks the memory is relinquished, but the OS is unaware of it.
Not sure how much it will help, but can you please run |
ok, lets say you are correct... so, there's still at least 250MB of "unaccounted" memory! the memory set is over 350MB!!! we can barely account for 100MB of it... ill run the maddress and post here |
There's definitely more paged memory in the dump than in the process (might be a symptom of dotnet/runtime#71472 (comment)) But let's ignore the first line for a minute. (cc @leculver)
If your pod is still alive, a copy of the |
Ambient allocations do happen - you running dotnet-counters tells the runtime to allocate potentially for example :) Gen2 garbage is not a concern. 10% fragmentation isn't terribly concerning either. |
Ok, for the |
Yes, that's what I meant |
Here is the smap after the 1st request with the new config. I also had to set |
@janvorli we are able to reproduce the problem with a simple .net 7 webapi + mongoldb driver connecting to a cosmosdb. looking into it I found DataDog/dd-trace-dotnet#2168 which fits both our environment and issue... |
I am sorry for confusing you, I meant either setting the Anyways, it is great to have a repro. I'd suggest waiting with further investigations until you get a response on that issue. |
Here are 3 smaps of this minimal app, one right after start, one after the first request and one when the app was at the mark of 150mb |
yes I have the |
@Leonardo-Ferreira it looks like the 3rd dump is actually from a different run of the app. The mappings of the dotnet executable are different and I don't have any other explanation for that. |
Is there a reasonable way how I can run the minimal app locally? |
after a couple of hours running i'm confident to report that rolling back the code to |
@janvorli if you could join the support request 2308080040008266001, I can share the mini app and the credentials for the database |
I have an update on this issue. I was debugging a very similar issue of an internal customer and I've found a very nice tool that has allowed me to find out the source of the leak. It is called heaptrack, you can find sources / doc here: https://github.com/KDE/heaptrack. It is also available as a package in standard Ubuntu repo. |
Hi @janvorli, thanks for findings. Does dotnet/runtime#74695 related to this issue? |
@westfin the team that owns that code is currently looking into it. But yes, that PR modified the code that gets the raw certificate from openssl and wraps it in a managed object, so it is related. |
Good, thanks again. How we can follow of the status of work? Any link? Or you can update status of work here? |
I will keep updating you here. |
Which versions of the runtime are affected by this? Also, does anyone know why |
Interesting that I found this issue today while looking into other method of using dotnet-dump to analyse my core dumps, which show the exact same issue as described here. According to the PR linked by @westfin, the issue should be resolved by updating to .Net 8.0, has anyone been able to confirm this? The SysInternals team release procdump for Linux in December, which I started using today to try and identify the cause. Running I have not looked at using heaptrack, but if other investigations fail, I will have a look at it and report back what I find. |
As an update from my side, we managed to migrate our application to .Net 8.0 and deployed it to one of the production servers. Since the fix hasn't been backported, it seems that the only fix for this issue would be to upgrade to .Net 8.0 |
@Leonardo-Ferreira I have found and fixed the leak in To test these, publish your app as self-contained and overwrite the If you'd rather go through the trouble and compile the lib yourself, they are coming from these branches
To build them, running |
@Leonardo-Ferreira do you think you will have time to check privates from @rzikm this week? |
Yeah, we do not need the validation anymore. |
While we have documentation and tutorials on dotnet-dump memory analysis, e.g. https://learn.microsoft.com/en-us/dotnet/core/diagnostics/debug-memory-leak#analyze-the-core-dump, we do not have tooling for native memory leaks, esp. on Linux. Since the request is tracked on #2906, closing this issue. |
I have an asp.net 7.0.9 api running on AKS and no one is using it... and yet, every minute, application insights reports that the memory usage grew a bit... it grows until the pod crashes and it starts again... take a look:
I got a dump in a moment where the app insights was reporting that the app was using barely above 300MB and when I opened it on VS I got this:
If you look closely, the first object is smaller than 1.5MB and the total sum is 53k objects and 4.8MB...
I think its clear that I won't be able to track my leak here, so what should I do now?
The text was updated successfully, but these errors were encountered: