Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak in 2.2 and 2.1 (2.0 is not affected) using Docker #10200

Closed
PrefabPanda opened this issue May 13, 2019 · 21 comments
Closed

Memory leak in 2.2 and 2.1 (2.0 is not affected) using Docker #10200

PrefabPanda opened this issue May 13, 2019 · 21 comments
Assignees
Labels
area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions
Milestone

Comments

@PrefabPanda
Copy link

PrefabPanda commented May 13, 2019

Description:
RAM use in Docker container keeps climbing until an OOM is triggered. Does not happen in 2.0.

Steps to reproduce the behavior:

  1. From Visual Studio 2017 create a brand-new ASP.NET Core Web Application
  2. Choose Web API, Docker support (Linux), Core 2.1 or 2.2
  3. Add Orchestration support (choose docker-compose)
  4. Add the following to the .yml file:
    mem_reservation: 128m
    mem_limit: 256m
    memswap_limit: 256m
    cpus: 1
    (will need to change the yml version to 2.4 to support the memory limits)
  5. Using a batch file or similar call it using curl 1,000 times
    e.g.
    curl -X GET "https://localhost:44329/api/values/7" -H "accept: text/plain" --insecure
  6. Monitor using 'docker stats'
  7. Note the memory is not released, run it 1,000 times more and the container will die of an OOM error (check 'docker 'events'). Run multiple threads for the fastest results to make it break.

Expected behavior:
GC to release memory and/or respect Docker RAM limits

Additional context:

  • Already tried setting ServerGarbageCollection to false (makes no difference)
  • Only thing that works so far is going back to 2.0 which is EOL
  • This is the template project, not my own code so not related to HttpClient or similiar
  • Happens when hosting in MobyLinux on Windows 10 or Ubuntu Linux on Docker CE
@Eilon
Copy link
Member

Eilon commented May 13, 2019

@sebastienros

@PrefabPanda
Copy link
Author

A couple of observations - I can make it OOM much quicker if I set MobyLinux to only use 2GB rather than 4GB of RAM

@sebastienros
Copy link
Member

I commented on the other issue but I meant to comment here.

This is not supported in 2.2: https://github.com/dotnet/coreclr/issues/18971
The idea is that your limits are too low for 2.2 and it won't work. Try with 512MB. This is solved in 3.0 only.

@PrefabPanda
Copy link
Author

@sebastienros - setting the limits to 512MB hasn't helped, same problem.

I will have to try 3.0 preview a try to see if it helps.

@PrefabPanda
Copy link
Author

@sebastienros - can I get this re-opened please? I have tried 3.0 preview 4, no difference. Container still OOMs.

I'm using the preview 4 'stretch-slim' images.

I've got this set:

false

and this in my compose:

mem_reservation: 512m
mem_limit: 512m
memswap_limit: 512m
cpus: 1

Still the only thing that works is going back to core 2.0

@dcarr42
Copy link

dcarr42 commented May 15, 2019

@PrefabPanda Make sure ServerGC is not running by logging out on startup. I would like to see the limit doubled again. Any chance you can attach a debugger to get some gc stats?

https://github.com/dotnet/coreclr/blob/master/Documentation/project-docs/linux-performance-tracing.md

@sebastienros
Copy link
Member

Reopening as it might also repro on preview 4.

@PrefabPanda why didn't you use preview5? You would have to specifically set the preview4 tag to get it, the latest docker image and current default for 3.0 is preview5.

@PrefabPanda
Copy link
Author

@sebastienros - I originally was trying preview 5, but was getting an error when it attempted to start debugging. It looks like the default images VS2019 wants to use aren't preview 5 compatible:

mcr.microsoft.com/dotnet/core/aspnet:3.0-stretch-slim
mcr.microsoft.com/dotnet/core/sdk:3.0-stretch

(they give me an error saying they can only find preview 4).

So I dropped -stretch-slim and -stretch from the tag and now it runs.

@DAllanCarr - Sadly I've repeated the test using a 512MB and 1GB container - same OOM problem. I have left the test app doing workstation GC (I have noticed that the setting doesn't take unless you do a clean solution).

When running the tests I've got the debugger attached already. I'll attempt to grab some of the output.

@sebastienros
Copy link
Member

sebastienros commented May 22, 2019

I reproduced the same docker image, and set it to -m 128m --cpu=1.
I ran it on a 12 logical cores machine, hence the CPU% to 8-9% as we set cpu=1
The load was originating from a separate machine, with 256 concurrent connections, during 10 minutes, and each measure is a summary of 10s.

There results are the following:

RPS CPU (%) Memory (MB) Avg. Latency (ms)
8,296 9 73 34.57
9,921 8 79 27.53
9,521 8 95 28.61
9,920 8 116 25.22
9,636 9 128 25.73
10,543 8 84 25.66
9,627 8 93 27.54
9,800 8 102 24.94
9,445 8 88 27.87
9,615 9 100 28.35
10,520 8 106 25.09
9,591 8 106 28.79
9,938 8 96 24.57
10,819 8 100 24.55
9,913 9 118 25.98
9,511 8 118 28.79
10,067 8 101 25.96
9,336 8 92 27.32
9,955 8 102 25.33
10,202 8 103 27.97
10,817 8 110 22.73
9,096 8 119 28.26
9,168 8 118 30.09
9,625 8 98 25.21
10,003 8 111 26.55
9,452 8 86 28.48
9,435 8 86 26.12
9,765 8 96 27.55
9,482 9 80 26.57
9,549 8 99 28.36
9,383 8 99 28.26
10,205 8 100 23.81
10,140 9 89 24.11
9,292 8 90 28.17
11,341 8 90 21.15
10,651 8 103 23.42
9,261 8 116 30.88
9,544 8 89 28.25
10,864 9 88 24.88
9,987 8 88 28.66
10,143 8 99 26.83
9,311 8 110 29.34
11,331 8 110 22.1
9,602 8 97 27.83
9,848 8 97 26.8
10,829 8 81 23.26
9,622 8 89 27.67
10,492 9 89 24.12
10,149 8 95 26.98
9,618 8 97 27.43
9,595 9 97 28.68
10,050 9 99 26.35
10,856 9 87 23.18
10,108 8 86 25.51
9,057 8 86 28.34

As you can see everything went normally. There were not a single bad response or socket error during the run.

Here is the docker file I used, where webtemplate is the name of the project I generated using dotnet new webapi.

FROM mcr.microsoft.com/dotnet/core/sdk:3.0.100-preview5
WORKDIR /app

COPY . .
RUN dotnet publish -c Release -o out

WORKDIR /app/out

ENV ASPNETCORE_URLS https://+:5000
ENV ASPNETCORE_HTTPS_PORT=5000
EXPOSE 5000
ENTRYPOINT ["dotnet", "webtemplate.dll"]

@sebastienros
Copy link
Member

sebastienros commented May 22, 2019

Here is the full source of the application I ran.

webtemplate.zip

For my own notes, the command line used to benchmark:

dotnet run -- @C:\temp\.environments\aspnet-lin `
--source C:\temp\dockerperf\webtemplate\ --docker-file Dockerfile --docker-context . --path "/api/values/7" --no-warmup --duration 10 --span 0.00:10:00 --no-clean --headers none `
--header "accept=text/plain" -nsl -m https --arg "--memory=128m --cpus=1" -wf

@PrefabPanda
Copy link
Author

Thank you @sebastienros - I will try your sample project and report back. I notice you are using a different SDK image, so I'll be trying that too.

@PrefabPanda
Copy link
Author

@sebastienros - I have given your suggestion regarding turning https off, sadly no difference. Though I can't see your comment now, so assume you've deleted it. For reference, here is my code again with the https turned off.

WebApplication1.zip

@sebastienros
Copy link
Member

I have started running our reliability tests on https to detect a leak, and they are very stable, so it's probably not a leak or the memory would keep growing. I have a test that runs for a full week, which will give us even more information.

What it could be though is that the internal cache that is used for loading certificates, and any other pools (arrays, string builders, ...), might require a fixed amount of memory which can't be released by the GC, and this might be over 128mb. I will try to figure out at what point the docker images start failing on my side when paging is disabled.

Just to be clear, your Docker settings do disable paging, which is a good way to actually test memory limits. And I could also repro the issue with this setting at 128mb.

mem_limit: 256m
memswap_limit: 256m

@sebastienros
Copy link
Member

When running the tests I've got the debugger attached already.

Have you ever reproed it without the debugger attached?

@kakins
Copy link

kakins commented May 31, 2019

I'm experiencing a similar issue in .NET Core 2.2, but I'm not using Docker. Should this be the thread that I follow? The previous thread here is now closed: #1976

@sebastienros
Copy link
Member

The current recommendation is you figure out how much memory is necessary based on your actual scenario. I can't reproduce any memory leak, even after running the application on https for a complete week. A proof of a memory leak would be to show a memory profile listing which instances are leaking overtime.

@kakins
Copy link

kakins commented Jun 4, 2019

@sebastienros actually your article on .NET Core garbage collection helped a lot. I believe for me it was a combination of properly disposing EF DbContext and understanding GC when it comes to short-lived objects

@analogrelay
Copy link
Contributor

Apologies, I've lost track of the thread a little bit. I think it looks like there isn't a clear actionable work item for the servers right now so I'm moving this to discussions for now. Please feel free to let me know (tag me, etc.) if that changes!

@lsouzaoliveira
Copy link

I'm experiencing a similar issue in .NET Core 2.2, but I'm not using Docker. Should this be the thread that I follow? The previous thread here is now closed: #1976

Me too

@sebastienros
Copy link
Member

sebastienros commented Jun 10, 2019

@crauadams Please file a new issue of your own. This one will probably be closed as we can't repro any leak.

@analogrelay
Copy link
Contributor

Closing as per @sebastienros 's comment. @crauadams please do file a new issue if you have data and/or a repro for us to look at!

@ghost ghost locked as resolved and limited conversation to collaborators Dec 3, 2019
@amcasey amcasey added area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions and removed area-runtime labels Aug 24, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions
Projects
None yet
Development

No branches or pull requests

9 participants