-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
'dotnet restore' performance regression using SocketsHttpHandler #27215
Comments
We’ve discovered what seems to be an IO thread starvation problem with SocketsHttpHandler in the NuGet client libraries (NuGet.Protocol.dll). Background: In 2.1, the default HttpClientHandler changed from WinHttpHandler to the new SocketsHttpHandler. SocketsHttpHandler is built on managed sockets and thus uses our IOCP and IO threads to process completions. WinHttpHandler does not use the .NET threadpool. It is built on top of native WinHTTP and so uses the native threadpool. Therefore, it doesn’t hit this problem. The Nuget client is issuing many (100s) of async HTTP requests in a short period of time (a couple seconds). As these HTTP requests complete, the Nuget app’s async continuation (running on an IO thread) seems to be doing FileStream operations that cause the IO thread to block waiting on an async file completion. In particular, it’s blocking here: (There may be other places where IO threads are getting blocked, but this seems to be by far the most common.) In this particular scenario, the Nuget code is calling Seek on the FileStream, and that’s apparently forcing a buffer flush. And since Seek is a synchronous call, the flush needs to be synchronous as well which also causes a blocking of the IO thread. // Validate the content before putting it into the cache.
fileStream.Seek(0, SeekOrigin.Begin);
ensureValidContents?.Invoke(fileStream); As the IO threads block, we seem to (sometimes) get into a thread starvation death spiral. The file completions they are waiting for are queued behind a bunch of HTTP request completions. As the request completions are processed, they block more threads. Eventually, if we’re unlucky, all IO thread block. We will make progress slowly as the threadpool injects a new IO thread every half second, but these soon block for the same reason, and our progress is slowed to a crawl until the completions finally get drained (or we hit timeouts that cause a bunch of cancellations). Setting min IO threads to something large like 128 (on an 8 proc machine) fixes this issue. Increasing min IO threads is a workaround, and may be a viable short-term fix. But it seems like there’s a deeper issue here. |
We also discovered another place in FileStream where async operations actually end up being blocked by sync operations: |
In order to workaround this behavior, we were able to determine a small fix to the NuGet client: Change this line: to pass in 1 for the Old using (var fileStream = new FileStream(
result.NewFile,
FileMode.Create,
FileAccess.ReadWrite,
FileShare.None,
BufferSize,
useAsync: true)) New using (var fileStream = new FileStream(
result.NewFile,
FileMode.Create,
FileAccess.ReadWrite,
FileShare.None,
1,
useAsync: true)) That'll effectively disable the FileStream's buffering and prevent sync blocking of the IO threadpool threads. Using the fix, building a new NuGet.Protocol.dll binary, and dropping it into the repro results in performance back to baseline. On my dev machine, I'm getting 20-25 second performance (same as WinHttpHandler) instead of taking 3-5 minutes. |
Given that it has fairly nasty impact on certain customer(s), we need to find an "easy" way to detect that we're hitting this problem - either from dump, or via some counters/events. Changing code and seeing if it helps should be the last resort detection mechanism. @kouvel we may need your help to look into IO threadpool to see what we can tweak to avoid these kind of problems. |
@davidsh how did that code end up running on an IO thread? |
I will forward you the longer email thread from which I summarized the above. |
From: @stephentoub
|
@davidsh should this move out of 2.1.x milestone? |
Probably since it is not being actively championed. |
The conclusion was to fix nuget client - so that part is external. Closing. |
The right fix, which is what the "fix nuget client" is working around, is to address FileStream's inappropriate blocking. |
Original problem report
From: Matt Mitchell
Sent: Thursday, August 2, 2018 10:03 PM
To: Geoffrey Kizer; David Shulman
Cc: Stephen Toub
Subject: 2-5x perf regression (and probably correctness bug) with new http sockets in .NET Core 2.1
Hey,
I’ve been running the 2.2 builds for a while now, and was noticing a number of restore failures when running CLI testing. The restore failures manifest as a number of endpoint timeouts (endpoint did not respond in 100 seconds) and SSL connection termination errors. This did not happen in 2.1, so I started poking around. After much debugging and some assistance from David, it was discovered that the failures go away when DOTNET_SYSTEM_NET_HTTP_USESOCKETSHTTPHANDLER is set to 0. In fact the restore gets much much faster. I poked around and discovered this is also the case with 2.1.3, 2.1.2, and presumably 2.1.* since not many networking changes have gone in. Here are some high level points of what I know.
• A restore of the default mvc app using a simple nuget.config (only nuget.org) takes about 1:30 with the new sockets on.
• A restore of the default mvc app using a simple nuget.config (only nuget.org) takes about :36 with the new sockets off.
• A restore of the default mvc app using CLI’s nuget.config for the 2.2. build (which contains ~10 sources) takes about 5:00 with the new sockets on.
• A restore of the default mvc app using CLI’s nuget.config for the 2.2. build (which contains ~10 sources) takes about :50 with the new sockets off.
Update
On my dev machine, I was able to repro this performance regression. Using WinHttpHandler, the scenario runs in about 23 seconds. But using SocketsHttpHandler, it runs between 3- 5 minutes.
The text was updated successfully, but these errors were encountered: