-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GC does not release memory easily on kubernetes cluster in workstation mode #49317
Comments
Tagging subscribers to this area: @dotnet/gc Issue DetailsDescription
Configuration
Other information
|
Hey @Kiechlus, do you observe that it eventually get collected, or doesnt unless there is memory pressure on the K8s cluster? |
@mangod9 When I issue another download of a 1 GB file, the memory does not rise, so it must have been collected. But without such pressure it just stays as is. |
does any of the options listed here work for you? |
@Maoni0 will those traces help to analyse the issue? In this case we can try to get them. |
@Kiechlus yes, this is always the first step for diagnosing memory perf problems. |
Hi @Maoni0 Setup
Observations
Expected outcome
|
Hi, my team is struggling with a similar issue; we're currently working on gathering some traces from our app. The premise, however, is exactly the same - we're uploading a large file into Azure Blob Storage and, while on local dev env everything works fine and after some time there's full GC invoked, on our k8s cluster we get frequent OOMs. |
Can somebody provide some sample code for what the download or upload looks like? There are some known issues in with ASP.NET Core's memory pool not releasing memory that might be the case here but its possible that the code could be tweaked to avoid the memory bloat in the first place. |
Hi, @L-Dogg we are still facing this issue. |
Thanks for your replies. We're trying to prepare a minimal example, I just hope such an example will be enough to reproduce this behaviour. |
@Kiechlus can you collect a gc-verbose trace to see where your allocations are coming from? Are the connections HTTPS connections? |
@Kiechlus somehow I missed this issue...sorry about that. I just took a look at the trace you collected. it does release memory, if you open the trace in perfview and open the GCStats view you'll see this at GC#10, the memory usage went from 831mb to 9.8mb. but there's allocations in LOH again which made the memory go up again. what would be your desired behavior? you are not under high memory pressure so GC doesn't need to aggressively shrink the heap size. |
@Kiechlus It seems like you're churning the LOH, why is that? Are you using Streams or are you allocating large buffers?
Are you using streams or are you allocating big arrays? Also are you using IFormFile or are you using the MultipartReader? |
@davidfowl We are allocating the stream like this: @Maoni0 You are right, we ran into OOM only in very few occasions. So memory is freed under pressure. But what we would need that it is freed immedeately after the controller returns and the Stream got deallocated. Because the needs in a Kubernetes cluster are different. There are e.g. three big machines and Kubernetes schedules many pods on them. Based on different metrics it creates or destroys pods (horizontal pod autoscaling) @egorchabala. If now some pod does not free memory eventhough it could, that means Kubernetes cannot use that memory for scheduling other pods and the autoscaling does not work. Also the memory monitoring is more difficult. Is there any possibility to make GC release memory immedeately as soon as it is possible eventhough there is no high pressure? Do you still need a different trace or something the like? @L-Dogg we are currently using version |
Don't do this. This is the source of your problems. Don't buffer a gig in memory. Why aren't you streaming ? |
@davidfowl We are using this client-side encryption https://docs.microsoft.com/de-de/azure/storage/common/storage-client-side-encryption?tabs=dotnet. |
Without seeing any code snippets, it's obviously harder to recommend something but I would imagine you have something like this: See this last step: // Download and decrypt the encrypted contents from the blob.
MemoryStream targetStream = new MemoryStream(fileLength);
blob.DownloadTo(targetStream ); Then something is either copying the targetStream to the HttpResponse? If yes, then avoid the temporary stream and just copy it to the response directly. |
Hi @davidfowl thanks for your reply! This is consumed by the service, goes through some layers, and in the end in the Controller it is:
I'm still not sure how to avoid writing it to some temporary stream. But would be very great if we could solve it. |
Where is the temporary stream doing all the buffering? |
@davidfowl Do you mean this?
|
Why isn't this code taking in the target Stream? |
@davidfowl Do you mean in the Controller we should do something like this?
If this makes a difference we will for sure try. |
The controller should look like this: Response.ContentLength = fileLength;
await blobLib.DownloadToAsync(Response.Body); |
I've sent email to @mangod9 with the dump link. I can also create new dump and trace and send them if you request. Best Regards. |
This issue has been automatically marked |
We are still struggling with high memory usage on only Linux k8s environments. I would appreciate it if this issue is not closed until the problem is solved. |
This issue has been automatically marked |
This issue has been automatically marked |
This issue will now be closed since it had been marked |
After lots of research, dumping, tracing, I suppose we've found some clues about this not releasing memory issue by seeing those simmilar github issues:
We've solved this issue with these malloc env settings:
We'are using lots of dynamic code compilations with roslyn as well as some il emit codes. But I don't know why we have to set these malloc settings in linux container environment. There isn't any issue with Windows by the way. Best Regards. |
By the way I'm leaving the settings we used in linux container here in case someone else encounters this issue. But as mentioned in the github issues I posted above, these settings will vary from application to application. So you need to experiment a bit with these settings. By playing with these settings, we were able to make the production environment running in Linux container closer to the Windows environment:
In Windows we only set |
Adding @janvorli as well. Wonder if this is related to W^X possibly. @oruchreis what .NET version were you using? |
Hi @mangod9 , it is .net 7 with latest minor version. also we had tried old libclrgc.so which didn't have much effect. As a reminder, I mentioned the whole scenario in my previous messages on this thread and sent you and maoni the dump and trace as per your request. |
Should I create a new topic to better follow the issue about this specific env settings and the issue of not releasing memory? |
@oruchreis |
The general issue with glibc is that it does not return the memory to OS if the arena is not completely continuously empty. In other words, it does not punch holes in the allocated chunk automatically and require manual calls of malloc_trim() function, supposedly to prevent memory fragmentation. Example C code and the description of the issue is available here for example: https://stackoverflow.com/questions/38644578/understanding-glibc-malloc-trimming This is a very common issue for many applications which may have huge peak memory consumption but low regular memory usage. It hit me in squid, pm2, node.js. The simplest solution is to use jemalloc, an alternative heap allocator. It's usually as easy as LD_PRELOAD. Or use distro which doesn't use glibc, such as Alpine with musl libc. |
Like I've mentioned above, I've tried many heap allocator like mimalloc, jemalloc,tcmalloc etc.. but it didn't effect much. We're using those two malloc settings in production with net 7 right now, and we didn't notice any memory pressure fortunately. I understand these values differ from app to app. I couldn't try alpine yet, because we use some native libraries which doesn't work with musl. I'll try alpine in future when I recompile native dependecies with musl. But this is a major issue for .net I think, we should not be setting c native configs such as malloc_arenas_max. If it is necessary to set these settings in order to use .net on a distro that uses glibc, shouldn't it be mentioned in the documentation? |
Description
Generation 2G
collection, smaller generations do not releaseConfiguration
htop
inside container:Other information
The text was updated successfully, but these errors were encountered: