Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to debug memory leak that is not clear on dotnet-dump? #4139

Closed
Leonardo-Ferreira opened this issue Aug 8, 2023 · 76 comments
Closed

How to debug memory leak that is not clear on dotnet-dump? #4139

Leonardo-Ferreira opened this issue Aug 8, 2023 · 76 comments
Labels
dotnet-dump question Further information is requested
Milestone

Comments

@Leonardo-Ferreira
Copy link

I have an asp.net 7.0.9 api running on AKS and no one is using it... and yet, every minute, application insights reports that the memory usage grew a bit... it grows until the pod crashes and it starts again... take a look:

I got a dump in a moment where the app insights was reporting that the app was using barely above 300MB and when I opened it on VS I got this:

1818767d-78c5-4c2a-a12c-f4bfb385012a

If you look closely, the first object is smaller than 1.5MB and the total sum is 53k objects and 4.8MB...

I think its clear that I won't be able to track my leak here, so what should I do now?

@hoyosjs
Copy link
Member

hoyosjs commented Aug 8, 2023

@Leonardo-Ferreira In the insights tab or using perfview or WinDBG + SOS you should check what your fragmentation looks like. Also, what does checking the "show dead objects" checkbox show?

@Leonardo-Ferreira
Copy link
Author

Dead object show a bit more data here... but not enough to account for the 300MB...

8e31770a-fa84-496c-8358-b42f3470741e

The insights tab show a bit of waste on duplicated strings (26k objects totaling 603Kb) and some Sparse Arrays (24k objects totaling 1.36MB)

im trying to install the SOS on my pod but I must say im getting my ass kicked...

@ghost
Copy link

ghost commented Aug 8, 2023

Tagging subscribers to this area: @tommcdon
See info in area-owners.md if you want to be subscribed.

Issue Details

I have an asp.net 7.0.9 api running on AKS and no one is using it... and yet, every minute, application insights reports that the memory usage grew a bit... it grows until the pod crashes and it starts again... take a look:

I got a dump in a moment where the app insights was reporting that the app was using barely above 300MB and when I opened it on VS I got this:

1818767d-78c5-4c2a-a12c-f4bfb385012a

If you look closely, the first object is smaller than 1.5MB and the total sum is 53k objects and 4.8MB...

I think its clear that I won't be able to track my leak here, so what should I do now?

Author: Leonardo-Ferreira
Assignees: -
Labels:

area-Diagnostics-coreclr, untriaged, needs-area-label

Milestone: -

@hoyosjs hoyosjs added the question Further information is requested label Aug 8, 2023
@hoyosjs
Copy link
Member

hoyosjs commented Aug 8, 2023

You can open that dump in WinDBG on windows - no need to install within the container. You may need a copy of the binaries/pdbs for some metadata/commands. Alternatively dotnet-dump also works.

@tommcdon
Copy link
Member

tommcdon commented Aug 8, 2023

Moving this issue to dotnet-diagnostics as this is a tooling question

@tommcdon tommcdon transferred this issue from dotnet/runtime Aug 8, 2023
@tommcdon tommcdon added this to the 8.0.0 milestone Aug 8, 2023
@Leonardo-Ferreira
Copy link
Author

@hoyosjs I have been unable to reproduce this issue in windows, or even on docker... and the dump I've taken in the AKS pods, I did using dotnet-dump...

@tommcdon
Copy link
Member

tommcdon commented Aug 8, 2023

@Leonardo-Ferreira we support Linux x64 cross-plat dumps in Visual Studio and Windbg, so you can copy the Linux dump to Windows and open it in VS/Windbg, for example.

@Leonardo-Ferreira
Copy link
Author

im really sorry @tommcdon, I did opened it on VS and posted reference screenshots, im just lost here kinda of going a bit crazy...

when I try to use windbg, I get an error when using commands such as !heap -? saying that no heap was exported... and this dump was collected using the command ./dotnet-dump collect -p 1 which was supposed to be a full dump, right? and now that I mentioned it, VS only shows "Debug Managed Memory" option...

On a sidebar question, if the memory leaked, it is not supposed to be trackable via managed memory, right?

@Leonardo-Ferreira
Copy link
Author

using dotnet counters monitor I was able to get this:
image

the working set for the process is 363MB but I still have no clue "where" that memory is going...

@Leonardo-Ferreira
Copy link
Author

Update: I just ran a dotnet dump collect --Type Heap and on windbg, when I do !address -f:Heap I get absolutely nothing, as if the heap is empty. Also, !heap -? continues to error out as "No export heap found"... when I open this dump on VS, it says "Manage Memory Debugging unavailable" and "Process heap information not present"!!!:
image

@hoyosjs
Copy link
Member

hoyosjs commented Aug 8, 2023

!heap might not work. But what does !dumpheap -stat tell you?

@hoyosjs
Copy link
Member

hoyosjs commented Aug 8, 2023

Also, I'd collect a Full dump - in case this is native memory somewhere.

@Leonardo-Ferreira
Copy link
Author

Leonardo-Ferreira commented Aug 8, 2023

@hoyosjs in all cases (full dump or heap dump) the answer to !dumpheap -stat is the same "No export dumpheap found"

@hoyosjs
Copy link
Member

hoyosjs commented Aug 8, 2023

You need to install sos (https://learn.microsoft.com/en-us/dotnet/core/diagnostics/dotnet-sos, and make sure to run dotnet-sos install). After that it tells you a command that starts with .load. then all should be there.

@Leonardo-Ferreira
Copy link
Author

thanks for that advice! I did as you told and loaded... now it is working, but it's showing basically the same thing VS does. take a look:
image

So, the dump is accounting for 12MB while the process was reporting a working set larger than 350MB...

@hoyosjs
Copy link
Member

hoyosjs commented Aug 8, 2023

And it's not reporting any fragmentation? What does !eeheap show?

@hoyosjs
Copy link
Member

hoyosjs commented Aug 8, 2023

Also, what container flavor is this?

@Leonardo-Ferreira
Copy link
Author

Leonardo-Ferreira commented Aug 8, 2023

@hoyosjs The container is the default image for asp.net7 from microsoft... i think is debian...

!eeheap is:

0:000> !eeheap
Loader Heap:
----------------------------------------
System Domain:        7fd5dc8b1fe0
LowFrequencyHeap:     7fd566230000(10000:6000) 7fd566220000(10000:10000) 7fd566200000(10000:10000) 7fd5661e0000(10000:10000) 7fd566170000(10000:10000) 7fd566150000(10000:10000) 7fd566130000(10000:10000) 7fd5660f0000(10000:10000) 7fd5660d0000(10000:10000) 7fd566090000(10000:10000) 7fd566050000(10000:10000) 7fd566010000(10000:10000) 7fd565fc0000(10000:10000) 7fd565f90000(10000:10000) 7fd565f80000(10000:10000) 7fd565f60000(10000:10000) 7fd565f30000(10000:10000) 7fd565ee0000(10000:10000) 7fd565ea0000(10000:10000) 7fd565e80000(10000:10000) 7fd565de0000(10000:10000) 7fd565c90000(10000:10000) 7fd565c80000(10000:10000) 7fd565a80000(10000:f000) 7fd5659e0000(10000:10000) 7fd5659a0000(10000:10000) 7fd565910000(10000:10000) 7fd565860000(10000:10000) 7fd565820000(10000:10000) 7fd5657d0000(10000:10000) 7fd5657a0000(10000:10000) 7fd565780000(10000:10000) 7fd565550000(10000:c000) 7fd5654c0000(10000:10000) 7fd565480000(10000:10000) 7fd565430000(10000:10000) 7fd565420000(10000:f000) 7fd565400000(10000:9000) 7fd565340000(10000:10000) 7fd565320000(10000:10000) 7fd5652f0000(10000:10000) 7fd5652e0000(10000:10000) 7fd5652c0000(10000:10000) 7fd5652a0000(10000:10000) 7fd565280000(10000:10000) 7fd565260000(10000:10000) 7fd5651a0000(10000:10000) 7fd565180000(10000:10000) 7fd565170000(10000:10000) 7fd565140000(10000:10000) 7fd565130000(10000:10000) 7fd565110000(10000:10000) 7fd5650f0000(10000:10000) 7fd5650d0000(10000:10000) 7fd565030000(10000:10000) 7fd564ff0000(10000:10000) 7fd564fb0000(10000:10000) 7fd564f80000(10000:10000) 7fd564f40000(10000:10000) 7fd564f10000(10000:10000) 7fd564ed0000(10000:10000) 7fd564ea0000(10000:10000) 7fd564e70000(10000:10000) 7fd564c50000(10000:10000) 7fd564c20000(10000:10000) 7fd564bd0000(10000:10000) 7fd564bb0000(10000:10000) 7fd564b80000(10000:10000) 7fd564b50000(10000:10000) 7fd564b10000(10000:10000) 7fd564ad0000(10000:10000) 7fd564a10000(20000:20000) 7fd5649d0000(20000:1c000) 7fd564650000(10000:f000) 7fd564580000(10000:e000) 7fd564560000(20000:20000) 7fd564510000(10000:f000) 7fd5644d0000(10000:10000) 7fd5644b0000(10000:10000) 7fd564450000(10000:10000) 7fd564430000(10000:10000) 7fd5643f0000(10000:10000) 7fd5643b0000(10000:10000) 7fd564230000(10000:10000) 7fd5641d0000(10000:10000) 7fd5641a0000(10000:10000) 7fd564100000(10000:10000) 7fd5640e0000(20000:1d000) 7fd5640c0000(10000:9000) 7fd5640a0000(20000:20000) 7fd564070000(10000:10000) 7fd563f80000(10000:c000) 7fd563ef0000(10000:10000) 7fd563ec0000(10000:10000) 7fd563e80000(10000:10000) 7fd563dc0000(10000:10000) 7fd563d70000(10000:10000) 7fd563d50000(10000:10000) 7fd563d00000(10000:10000) 7fd563c30000(10000:10000) 7fd563bf0000(10000:10000) 7fd563b30000(10000:10000) 7fd563ac0000(10000:10000) 7fd563a80000(10000:10000) 7fd563a50000(10000:10000) 7fd563a40000(10000:b000) 7fd563970000(10000:f000) 7fd563950000(10000:10000) 7fd5637f0000(10000:10000) 7fd563740000(10000:f000) 7fd5636f0000(10000:10000) 7fd5635c0000(10000:10000) 7fd563520000(10000:10000) 7fd563430000(10000:10000) 7fd5633e0000(10000:10000) 7fd563390000(10000:10000) 7fd5632c0000(10000:10000) 7fd563200000(20000:20000) 7fd562f60000(10000:7000) 7fd562e20000(10000:10000) 7fd562de0000(10000:10000) 7fd562dd0000(10000:10000) 7fd562d80000(10000:c000) 7fd562d60000(10000:10000) 7fd562b00000(10000:10000) 7fd562a90000(10000:10000) 7fd5629c0000(10000:10000) 7fd5628c0000(10000:10000) 7fd5627f0000(10000:10000) 7fd5627b0000(10000:10000) 7fd5626e0000(10000:10000) 7fd562690000(40000:40000) 7fd561df0000(10000:1000) 7fd561c90000(10000:10000) 7fd561b50000(10000:10000) 7fd561aa0000(10000:10000) 7fd561a40000(10000:10000) 7fd561870000(10000:f000) 7fd561820000(10000:10000) 7fd561740000(10000:e000) 7fd561700000(10000:10000) 7fd5616c0000(20000:1f000) 7fd561280000(10000:9000) 7fd561270000(10000:f000) 7fd561230000(20000:13000) 7fd560fb0000(10000:2000) 7fd560fa0000(10000:10000) 7fd560f50000(10000:b000) 7fd560f40000(10000:10000) 7fd560f30000(10000:c000) 7fd560ef0000(40000:40000) 7fd560ee0000(10000:3000) 7fd560e60000(10000:10000) 7fd560c30000(10000:e000) 7fd560be0000(40000:40000) 7fd560bd0000(10000:5000) 7fd560bc0000(10000:10000) 7fd560ba0000(10000:10000) 7fd560b80000(10000:10000) 7fd560b70000(10000:10000) 7fd560b60000(10000:e000) 7fd560980000(10000:10000) 7fd560660000(10000:f000) 7fd5605b0000(10000:e000) 7fd5603a0000(10000:f000) 7fd560100000(10000:f000) 7fd560000000(10000:d000) 7fd55fd40000(20000:1f000) 7fd55f9f0000(10000:c000) 7fd55f7c0000(10000:10000) 7fd55f7b0000(10000:10000) 7fd55f5c0000(10000:5000) 7fd55f4a0000(10000:10000) 7fd55f300000(10000:10000) 7fd55f2b0000(10000:10000) 7fd55f290000(10000:10000) 7fd55eff0000(10000:e000) 7fd55ed70000(10000:10000) 7fd55eae0000(10000:e000) 7fd55e900000(10000:10000) 7fd55e8f0000(10000:10000) 7fd55e8e0000(10000:10000) 7fd55e7d0000(20000:1f000) 7fd55e4b0000(10000:3000) 7fd55e230000(10000:10000) 7fd55dff0000(10000:10000) 7fd55dfd0000(10000:10000) 7fd55def0000(10000:10000) 7fd55de00000(70000:70000) 7fd55d210000(3000:1000) Size: 0xc9b000 (13217792) bytes total, 0xbe000 (778240) bytes wasted.
HighFrequencyHeap:    7fd5661f0000(10000:10000) 7fd566180000(10000:10000) 7fd566160000(10000:10000) 7fd566120000(10000:10000) 7fd566110000(10000:10000) 7fd5660e0000(10000:10000) 7fd5660b0000(10000:10000) 7fd566080000(10000:10000) 7fd566060000(10000:10000) 7fd566040000(10000:10000) 7fd566020000(10000:10000) 7fd566000000(10000:a000) 7fd565ff0000(10000:d000) 7fd565fd0000(10000:10000) 7fd565fb0000(10000:10000) 7fd565f70000(10000:10000) 7fd565f40000(10000:10000) 7fd565f20000(10000:10000) 7fd565f00000(10000:10000) 7fd565ed0000(10000:10000) 7fd565eb0000(10000:10000) 7fd565e50000(10000:10000) 7fd565dc0000(10000:10000) 7fd565da0000(10000:10000) 7fd565b10000(10000:10000) 7fd565aa0000(10000:10000) 7fd565a70000(10000:10000) 7fd565a50000(10000:10000) 7fd5659d0000(10000:10000) 7fd5659b0000(10000:10000) 7fd565920000(10000:10000) 7fd565890000(10000:10000) 7fd565880000(10000:10000) 7fd565850000(10000:10000) 7fd565830000(10000:10000) 7fd5657e0000(10000:10000) 7fd5657c0000(10000:10000) 7fd565790000(10000:10000) 7fd565570000(10000:10000) 7fd5654d0000(10000:10000) 7fd5654a0000(10000:10000) 7fd565490000(10000:10000) 7fd565460000(10000:10000) 7fd565440000(10000:10000) 7fd565410000(10000:10000) 7fd565380000(10000:10000) 7fd565350000(10000:10000) 7fd565330000(10000:10000) 7fd565300000(10000:10000) 7fd5652b0000(10000:10000) 7fd565270000(10000:10000) 7fd565160000(10000:10000) 7fd565120000(10000:10000) 7fd5650e0000(10000:10000) 7fd5650b0000(10000:10000) 7fd565010000(10000:10000) 7fd565000000(10000:10000) 7fd564fe0000(10000:10000) 7fd564fc0000(10000:10000) 7fd564fa0000(10000:10000) 7fd564f70000(10000:10000) 7fd564f50000(10000:10000) 7fd564f30000(10000:10000) 7fd564ef0000(10000:10000) 7fd564ee0000(10000:10000) 7fd564eb0000(10000:10000) 7fd564e90000(10000:10000) 7fd564c60000(10000:10000) 7fd564c30000(10000:10000) 7fd564c10000(10000:10000) 7fd564bf0000(10000:10000) 7fd564bc0000(10000:10000) 7fd564b90000(10000:10000) 7fd564b70000(10000:10000) 7fd564b40000(10000:10000) 7fd564b30000(10000:10000) 7fd564b00000(10000:10000) 7fd564ae0000(10000:10000) 7fd564a30000(10000:10000) 7fd5649f0000(10000:10000) 7fd564670000(10000:10000) 7fd5645a0000(10000:10000) 7fd564550000(10000:10000) 7fd564530000(10000:10000) 7fd564500000(10000:10000) 7fd5644f0000(10000:10000) 7fd5644c0000(10000:10000) 7fd564490000(10000:10000) 7fd564480000(10000:10000) 7fd564440000(10000:10000) 7fd564410000(10000:10000) 7fd564400000(10000:10000) 7fd5643d0000(10000:10000) 7fd5643c0000(10000:10000) 7fd564240000(10000:10000) 7fd564220000(10000:10000) 7fd564200000(10000:10000) 7fd5641e0000(10000:10000) 7fd5641b0000(10000:10000) 7fd564120000(10000:10000) 7fd5640d0000(10000:10000) 7fd564080000(10000:10000) 7fd563fa0000(10000:10000) 7fd563f70000(10000:10000) 7fd563ed0000(10000:10000) 7fd563e90000(10000:10000) 7fd563e60000(10000:10000) 7fd563e50000(10000:10000) 7fd563da0000(10000:10000) 7fd563d90000(10000:10000) 7fd563d30000(10000:10000) 7fd563d20000(10000:10000) 7fd563c60000(10000:10000) 7fd563c40000(10000:10000) 7fd563c20000(10000:10000) 7fd563c00000(10000:10000) 7fd563be0000(10000:10000) 7fd563bc0000(10000:10000) 7fd563b10000(10000:10000) 7fd563af0000(10000:10000) 7fd563ad0000(10000:10000) 7fd563aa0000(10000:10000) 7fd563a60000(10000:10000) 7fd5639a0000(10000:10000) 7fd563980000(10000:10000) 7fd563940000(10000:10000) 7fd563800000(10000:10000) 7fd563750000(10000:10000) 7fd563730000(10000:10000) 7fd563710000(10000:10000) 7fd563700000(10000:10000) 7fd5636d0000(10000:10000) 7fd563660000(10000:10000) 7fd563530000(10000:10000) 7fd563500000(10000:10000) 7fd5634f0000(10000:10000) 7fd5634d0000(10000:10000) 7fd563450000(10000:10000) 7fd563410000(10000:10000) 7fd563400000(10000:10000) 7fd5633d0000(10000:10000) 7fd5633b0000(10000:10000) 7fd5633a0000(10000:10000) 7fd563300000(10000:10000) 7fd5632f0000(10000:10000) 7fd5632d0000(10000:10000) 7fd563250000(10000:10000) 7fd563230000(10000:10000) 7fd562f70000(10000:10000) 7fd562e50000(10000:10000) 7fd562e30000(10000:10000) 7fd562e10000(10000:10000) 7fd562df0000(10000:10000) 7fd562db0000(10000:10000) 7fd562da0000(10000:10000) 7fd562d70000(10000:10000) 7fd562ca0000(10000:10000) 7fd562c20000(10000:10000) 7fd562b70000(10000:10000) 7fd562af0000(10000:10000) 7fd562ad0000(10000:10000) 7fd562ab0000(10000:10000) 7fd562a80000(10000:10000) 7fd562a60000(10000:10000) 7fd5629d0000(10000:10000) 7fd5629a0000(10000:10000) 7fd5628d0000(10000:10000) 7fd562820000(10000:10000) 7fd562810000(10000:10000) 7fd5627e0000(10000:10000) 7fd5627c0000(10000:10000) 7fd5627a0000(10000:10000) 7fd5626d0000(10000:10000) 7fd561d20000(10000:f000) 7fd561c80000(10000:10000) 7fd561b60000(10000:10000) 7fd561ab0000(10000:10000) 7fd561a90000(10000:10000) 7fd561a70000(10000:10000) 7fd561a30000(10000:10000) 7fd561850000(10000:10000) 7fd561840000(10000:10000) 7fd561780000(10000:10000) 7fd561760000(10000:10000) 7fd561750000(10000:10000) 7fd561730000(10000:10000) 7fd561710000(10000:10000) 7fd5616e0000(10000:10000) 7fd561290000(10000:f000) 7fd561250000(10000:10000) 7fd560f80000(10000:10000) 7fd560f60000(10000:10000) 7fd560c50000(10000:10000) 7fd560c20000(10000:10000) 7fd560bb0000(10000:f000) 7fd560b50000(10000:10000) 7fd560aa0000(10000:10000) 7fd560680000(10000:10000) 7fd5605c0000(10000:10000) 7fd560590000(10000:10000) 7fd5601c0000(10000:10000) 7fd55fef0000(10000:10000) 7fd55fa30000(10000:10000) 7fd55fa10000(10000:10000) 7fd55f920000(10000:10000) 7fd55f7d0000(10000:10000) 7fd55f5d0000(10000:10000) 7fd55f540000(10000:10000) 7fd55f400000(10000:10000) 7fd55f320000(10000:10000) 7fd55f2f0000(10000:10000) 7fd55f2d0000(10000:10000) 7fd55f2a0000(10000:10000) 7fd55f0d0000(10000:10000) 7fd55efd0000(10000:10000) 7fd55ee70000(10000:10000) 7fd55ed90000(10000:10000) 7fd55e910000(10000:10000) 7fd55e1c0000(10000:10000) 7fd55e000000(10000:10000) 7fd55dfc0000(10000:10000) 7fd55dfa0000(10000:10000) 7fd55dee0000(10000:10000) 7fd55dec0000(10000:10000) 7fd55dea0000(10000:10000) 7fd55de70000(10000:10000) 7fd55d214000(9000:6000) Size: 0xe1a000 (14786560) bytes total, 0xf000 (61440) bytes wasted.
StubHeap:             7fd564680000(10000:5000) 7fd55d21d000(3000:3000) Size: 0x8000 (32768) bytes total.
IndirectionCellHeap:  7fd564f20000(10000:8000) 7fd55d220000(6000:6000) Size: 0xe000 (57344) bytes total.
LookupHeap:           7fd563990000(10000:b000) 7fd55d22f000(4000:4000) Size: 0xf000 (61440) bytes total.
ResolveHeap:          7fd55d264000(57000:2f000) Size: 0x2f000 (192512) bytes total.
DispatchHeap:         7fd55d233000(31000:15000) Size: 0x15000 (86016) bytes total.
CacheEntryHeap:       7fd565190000(10000:4000) 7fd55d226000(9000:9000) Size: 0xd000 (53248) bytes total.
Total size:           Size: 0x1b2b000 (28487680) bytes total, 0xcd000 (839680) bytes wasted.
----------------------------------------
Domain 1:             55e084656040
No unique loader heaps found.
----------------------------------------
JIT Manager:          55e084659010
LoaderCodeHeap:       7fd55df20000(80000:68000) Size: 0x68000 (425984) bytes total.
LoaderCodeHeap:       7fd562e60000(80000:72000) Size: 0x72000 (466944) bytes total.
LoaderCodeHeap:       7fd5639c0000(80000:78000) Size: 0x78000 (491520) bytes total.
LoaderCodeHeap:       7fd563dd0000(80000:7d000) Size: 0x7d000 (512000) bytes total.
LoaderCodeHeap:       7fd564250000(80000:7e000) Size: 0x7e000 (516096) bytes total.
LoaderCodeHeap:       7fd564a50000(80000:7e000) Size: 0x7e000 (516096) bytes total.
LoaderCodeHeap:       7fd564c70000(200000:1f6000) Size: 0x1f6000 (2056192) bytes total.
LoaderCodeHeap:       7fd565580000(200000:1cb000) Size: 0x1cb000 (1880064) bytes total.
HostCodeHeap:         7fd55fa40000(10000:10000) Size: 0x10000 (65536) bytes total.
HostCodeHeap:         7fd566190000(2000:2000) Size: 0x2000 (8192) bytes total.
Total size:           Size: 0x69e000 (6938624) bytes total.
----------------------------------------

========================================
Number of GC Heaps: 1
----------------------------------------
Small object heap
         segment            begin        allocated        committed allocated size     committed size    
generation 0:
    7fd5d927bef0     7fd4d4c00020     7fd4d4ffffd0     7fd4d5000000 0x3fffb0 (4194224) 0x400000 (4194304)
    7fd5d927c050     7fd4d5400020     7fd4d55fd188     7fd4d57d1000 0x1fd168 (2085224) 0x3d1000 (4001792)
generation 1:
    7fd5d927b8c0     7fd4d2800020     7fd4d284bca0     7fd4d2c00000 0x4bc80 (310400)   0x400000 (4194304)
generation 2:
    7fd5d927b760     7fd4d2000020     7fd4d23fffd8     7fd4d2400000 0x3fffb8 (4194232) 0x400000 (4194304)
    7fd5d927b810     7fd4d2400020     7fd4d2538b70     7fd4d27c2000 0x138b50 (1280848) 0x3c2000 (3940352)
Large object heap
         segment            begin        allocated        committed allocated size     committed size    
    7fd5d927b970     7fd4d2c00020     7fd4d2c54718     7fd4d2c55000 0x546f8 (345848)   0x55000 (348160)  
Pinned object heap
         segment            begin        allocated        committed allocated size     committed size    
    7fd5d927b1e0     7fd4d0000020     7fd4d004dae0     7fd4d0051000 0x4dac0 (318144)   0x51000 (331776)  
------------------------------
GC Allocated Heap Size:    Size: 0xc23a58 (12728920) bytes.
GC Committed Heap Size:    Size: 0x1439000 (21204992) bytes.

Total bytes consumed by CLR: 0x3602000 (56631296)

im a bit new on this, if you wish, please educate me on the highlights

@hoyosjs
Copy link
Member

hoyosjs commented Aug 8, 2023

The third to last line tells you what you knew - 12mb of objects are there. The next line tells you the GC had 21mb reported to the OS as memory it manages. The very last line tells you that the bookkeeping structures in the runtime + the managed heap take ~60 MB. This means two things might be happening: 1) there's some native memory that we don't know of - for example some library may be using memory we don't know of. 2) The allocators of the C library may be rooting memory and not returning it fast enough. The runtime thinks the memory is relinquished, but the OS is unaware of it.

One problem I have also seen in the past (no idea how common it is or whether it is affecting your scenario), is that in some situations the default glibc memory allocator does not do a good job returning virtual memory to the OS even though the application was correctly calling free() on all memory it had allocated with malloc(). We diagnosed that particular issue by changing the environment variable MALLOC_ARENA_MAX and observing the memory usage was now staying much lower. This developer encountered a similar problem with a better writeup:
Consider lowering MALLOC_ARENA_MAX to prevent native memory OOM · Issue #8993 · prestodb/presto (github.com)

Not sure how much it will help, but can you please run !sos maddress -summary? It will try to get some heuristics on memory usage.

@Leonardo-Ferreira
Copy link
Author

ok, lets say you are correct... so, there's still at least 250MB of "unaccounted" memory! the memory set is over 350MB!!! we can barely account for 100MB of it...

ill run the maddress and post here

@Leonardo-Ferreira
Copy link
Author

Leonardo-Ferreira commented Aug 8, 2023

heres the result

Enumerating and tagging the entire address space and caching the result...
Subsequent runs of this command should be faster.
Warning:  Could not find a memory range for 7fd565410000 - HighFrequencyHeap.
This crash dump may not be a full dump!

 +----------------------------------------------------------------------+ 
 | Memory Type         |          Count |         Size |   Size (bytes) | 
 +----------------------------------------------------------------------+ 
 | PAGE_READWRITE      |          1,489 |     201.41mb |    211,197,952 | 
 | Stack               |             21 |     151.62mb |    158,980,096 | 
 | Image               |            811 |     149.86mb |    157,143,552 | 
 | PAGE_EXECUTE_READ   |          1,572 |      29.43mb |     30,863,360 | 
 | PAGE_READONLY       |            167 |      25.74mb |     26,992,640 | 
 | HighFrequencyHeap   |            348 |      14.63mb |     15,339,520 | 
 | LowFrequencyHeap    |            210 |      13.46mb |     14,114,816 | 
 | GCHeap              |              6 |       8.23mb |      8,626,176 | 
 | LoaderCodeHeap      |              8 |       7.00mb |      7,340,032 | 
 | ResolveHeap         |              2 |     348.00kb |        356,352 | 
 | DispatchHeap        |              2 |     196.00kb |        200,704 | 
 | CacheEntryHeap      |              2 |     100.00kb |        102,400 | 
 | IndirectionCellHeap |              2 |      88.00kb |         90,112 | 
 | LookupHeap          |              2 |      80.00kb |         81,920 | 
 | StubHeap            |              2 |      76.00kb |         77,824 | 
 | HostCodeHeap        |              2 |      72.00kb |         73,728 | 
 +----------------------------------------------------------------------+ 
 | [TOTAL]             |          4,646 |     602.32mb |    631,581,184 | 
 +----------------------------------------------------------------------+ 

despite the warning above, that is a full dump... anyway that's more what I was expecting to see!!!

edit 1:
after seeing so many images, I ran a !address -f:Image and im seeing a lot of duplications eg:
image
in my point of view, line 3 is a duplicate of line 1... you could argue that actually all of those are duplicates but 1 and 3 are definitely duplicates... and I see a lot more of those

@Leonardo-Ferreira
Copy link
Author

using dotnet counters monitor I was able to get this: image

the working set for the process is 363MB but I still have no clue "where" that memory is going...

looking back at this, I had it running for a quite longer now and I can see a clear consistent thing: the Allocation Rate is always at least 20kb/s (even though no requests are reaching this API) and of course, the GC Heap Size grows about the same.

So, I had an idea: lets run a trace for like 1min, cause then I should be able to see a function being called very frequently... because, this is a api not being used by anyone, you should only see "background" activity... but that didn't work out as expected and there was no clear indicator of what was going on

@hoyosjs
Copy link
Member

hoyosjs commented Aug 9, 2023

There's definitely more paged memory in the dump than in the process (might be a symptom of dotnet/runtime#71472 (comment)) But let's ignore the first line for a minute. (cc @leculver)

  • The second line is interesting, as it says this is running 21 threads and that would make it ~7-8mb per stack. That's about our limit. That would mean it's very deep stacks. Does the dump support that
  • I don't see the dupes you mean. Those are different memory ranges. If you are talking about the fact that there's multiple entries from the same file, that's normal. Different sections of the file.

If your pod is still alive, a copy of the /proc/<pid>/maps file would help see what's paged in according to the kernel (preferably before collecting the dump).

@hoyosjs
Copy link
Member

hoyosjs commented Aug 9, 2023

Ambient allocations do happen - you running dotnet-counters tells the runtime to allocate potentially for example :) Gen2 garbage is not a concern. 10% fragmentation isn't terribly concerning either.

@Leonardo-Ferreira
Copy link
Author

Ok, for the MALLOC config, set it up, run the app and get a smaps?
I'll set it up, be back with results shortly

@janvorli
Copy link
Member

Yes, that's what I meant

@Leonardo-Ferreira
Copy link
Author

Here is the smap after the 1st request with the new config. I also had to set COMPlus_GCHeapHardLimit to C800000 otherwise Valgrind wouldnt start...
smapAfter1stRequestWithNewConfig.txt

@Leonardo-Ferreira
Copy link
Author

@janvorli we are able to reproduce the problem with a simple .net 7 webapi + mongoldb driver connecting to a cosmosdb. looking into it I found DataDog/dd-trace-dotnet#2168 which fits both our environment and issue...

@janvorli
Copy link
Member

otherwise Valgrind wouldnt start...

I am sorry for confusing you, I meant either setting the MALLOC_MMAP_THRESHOLD_ or using valgrind, not doing that together (although that should not hurt).
Anyways, it seems that the MALLOC_MMAP_THRESHOLD_ had no effect. Just to double check, have you used the exact name for the env variable (including the _ at the end - I've missed that once in the past)?

Anyways, it is great to have a repro. I'd suggest waiting with further investigations until you get a response on that issue.

@Leonardo-Ferreira
Copy link
Author

Here are 3 smaps of this minimal app, one right after start, one after the first request and one when the app was at the mark of 150mb
smapCleanStart.txt
smapNew1stRequest.txt
smap150mb.txt

@Leonardo-Ferreira
Copy link
Author

otherwise Valgrind wouldnt start...

I am sorry for confusing you, I meant either setting the MALLOC_MMAP_THRESHOLD_ or using valgrind, not doing that together (although that should not hurt). Anyways, it seems that the MALLOC_MMAP_THRESHOLD_ had no effect. Just to double check, have you used the exact name for the env variable (including the _ at the end - I've missed that once in the past)?

Anyways, it is great to have a repro. I'd suggest waiting with further investigations until you get a response on that issue.

yes I have the _ at the end... I thought that perhaps you made a typo and checked the command but saw that there is a extra _

@janvorli
Copy link
Member

@Leonardo-Ferreira it looks like the 3rd dump is actually from a different run of the app. The mappings of the dotnet executable are different and I don't have any other explanation for that.

@janvorli
Copy link
Member

Is there a reasonable way how I can run the minimal app locally?

@Leonardo-Ferreira
Copy link
Author

after a couple of hours running i'm confident to report that rolling back the code to asp.net:6.0-alpine fixed the issue. Other than a couple of log messages the code remains the same, and stable at 135MB.

@Leonardo-Ferreira
Copy link
Author

@janvorli if you could join the support request 2308080040008266001, I can share the mini app and the credentials for the database

@janvorli
Copy link
Member

I have an update on this issue. I was debugging a very similar issue of an internal customer and I've found a very nice tool that has allowed me to find out the source of the leak. It is called heaptrack, you can find sources / doc here: https://github.com/KDE/heaptrack. It is also available as a package in standard Ubuntu repo.
The leak is coming from OpenSSL code invoked by the crypto / ssl code in the runtime (the CryptoNative_SslGetPeerCertificate function), so I guess we somewhere incorrectly maintain a refcount on a certificate. OpenSSL has APIs to bump / release the refcount that our runtime invokes.

@westfin
Copy link

westfin commented Oct 20, 2023

Hi @janvorli, thanks for findings. Does dotnet/runtime#74695 related to this issue?

@janvorli
Copy link
Member

@westfin the team that owns that code is currently looking into it. But yes, that PR modified the code that gets the raw certificate from openssl and wraps it in a managed object, so it is related.

@westfin
Copy link

westfin commented Oct 20, 2023

Good, thanks again. How we can follow of the status of work? Any link? Or you can update status of work here?

@janvorli
Copy link
Member

I will keep updating you here.

@ptasev
Copy link

ptasev commented Oct 20, 2023

Which versions of the runtime are affected by this?

Also, does anyone know why !heap gives me No export heap found? I'm using WinDbg Preview from the MS Store and that extension should be loaded from ext.dll. The file was recorded using createdump -u on linux.

@Dragonsangel
Copy link

Interesting that I found this issue today while looking into other method of using dotnet-dump to analyse my core dumps, which show the exact same issue as described here.
I have had a Support Case open for this issue for a couple of months now under 2305080050001970 (@janvorli is it possible to verify if that is the same as the issue here?)

According to the PR linked by @westfin, the issue should be resolved by updating to .Net 8.0, has anyone been able to confirm this?
While investigating my dumps, I did find many references to SafeX509Handle, but ignored them due to the application performing many SSL operations.

The SysInternals team release procdump for Linux in December, which I started using today to try and identify the cause. Running procdump with the -s 600 -n 3 -restrack (captured 3 dumps with leak traces, every 600 seconds) options reports an ever-increasing number of small leaks in the last two leak traces.
Currently, I am trying to see what the memory contains at the reported leaks. But stumbled on this issue while looking for instructions as to how to do that.

I have not looked at using heaptrack, but if other investigations fail, I will have a look at it and report back what I find.

@Dragonsangel
Copy link

As an update from my side, we managed to migrate our application to .Net 8.0 and deployed it to one of the production servers.
The application is no longer leaking memory.

Since the fix hasn't been backported, it seems that the only fix for this issue would be to upgrade to .Net 8.0

@rzikm
Copy link
Member

rzikm commented Jan 8, 2024

@Leonardo-Ferreira I have found and fixed the leak in CryptoNative_SslGetPeerCertificate @janvorli mentioned. Would you be able to verify the fix by running your application with these binaries?

ocsp-fix.zip

To test these, publish your app as self-contained and overwrite the libSystem.Security.Cryptography.Native.OpenSsl.so library in the publish output.

If you'd rather go through the trouble and compile the lib yourself, they are coming from these branches

To build them, running ./build.sh -s Libs.Native -lc release should be enough.

@karelz
Copy link
Member

karelz commented Jan 11, 2024

@Leonardo-Ferreira do you think you will have time to check privates from @rzikm this week?
Monday is cut off for February servicing and we would like to include the fix there. Having a verification from you would be highly desirable. Thank you!

@Leonardo-Ferreira
Copy link
Author

@karelz sorry but I didn't had time to check the privates from @rzikm... is this still needed? I saw that the PR was merged, so I guess we are good to go for the march update?

@karelz
Copy link
Member

karelz commented Feb 9, 2024

Yeah, we do not need the validation anymore.
If you try the privates, it might still confirm to you, if it fixes the problem you have.

@tommcdon
Copy link
Member

While we have documentation and tutorials on dotnet-dump memory analysis, e.g. https://learn.microsoft.com/en-us/dotnet/core/diagnostics/debug-memory-leak#analyze-the-core-dump, we do not have tooling for native memory leaks, esp. on Linux. Since the request is tracked on #2906, closing this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Sep 13, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
dotnet-dump question Further information is requested
Projects
None yet
Development

No branches or pull requests

10 participants