-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix some regressions in ASP.NET benchmarks #46120
Conversation
- dotnet#44265 seems to have caused large regressions on Windows and Linux-arm64. During that change we had tested adding the `Sleep(1)` to some `ConcurrentQueue` operations in contending cases, and not spin-waiting at all in forward-progressing cases. Not spin-waiting at all where possible in contending cases seemed to be better or equal for the most part (compared with spin-waiting without `Sleep(1)`), so I have removed spin-waiting in forward-progressing cases in `ConcurrentQueue`. - There were some regressions from the portable thread pool on Windows. I have moved/tweaked a slight delay that I had added early on, after changes thereafter it lost its intention, with the changes it goes back to the original intention and seems to resolve some of the gap, but maybe not all of it in some tests. We'll check the graphs after this change and see if there is more to investigate. There are also other things to improve on Windows, and many of those may be separate from the portable thread pool but some may be relevant to the changes in perf characteristics.
RPS numbers for some ASP.NET perf tests:
I have omitted
|
I have a hard time reading this table ;) but I assume this PR has the best numbers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you!
Looks great, even on Linux, net6 being faster than net5 on Fortunes for instance. |
The "Diff before-5" column indicates the diff from before this change (in .NET 6) to 5.0.1, showing the regressions including all changes in-between. The "Diff after-5" column indicates the diff from after this change to 5.0.1, to me the most relevant column, though not directly tied to this change, shows if cumulative regressions since have been recovered. And "Diff after-before" indicates the diff from after this change to before this change, with no other changes involved. |
The numbers for the JsonPlatform benchmark for the AMD machine (around 1.2kk RPS) look much better than what is reported in the Power BI dashboard : and what I am getting when I run them using crank: crank --config https://raw.githubusercontent.com/aspnet/Benchmarks/master/scenarios/platform.benchmarks.yml --scenario json --profile aspnet-citrine-amd --application.framework net6.0 @kouvel were you using any custom settings like |
Ah yea I forgot to mention, I disabled hill climbing. I did a few runs here and there with hill climbing and the error margin was so high between runs that I didn't bother anymore, in normal mode it would take a lot of runs of each test to weed out the error and make any reasonable determination of improvement or regression. The hill climbing issue is known and needs to be fixed, I don't think it's worth the time to compare with hill climbing enabled at the moment, as after it is fixed it would behave closer to hill climbing disabled during the benchmark (though maybe not fully, will have to see, but at least the intention of the fix is to reduce the extra error to unnoticeable levels). There is a work item for fixing it in .NET 6, not sure yet about prioritization. |
We should be able to see from new graphs what the range is after the change compared to before, hopefully that will show if anything has not been fully fixed. I'm expecting that two tests still have regressions, but there could be more. |
I don't think all of the high error margin is due to hill climbing, for instance even with it disabled I saw ~80K RPS differences in |
What about For some reason it's now the default setting for PlaintextPlatform and JsonPlatform benchmarks (I believe that it's wrong, we then have 1 epoll thread per core, not 1 per 30 cores which is the default setting that 99.9999% of our customers are using in production)). It can have a big impact on the benchmark results when you are changing somethinng related to
I am really happy to hear that! Is there a GH issue that I could follow or for now it's just somewhere in our internal docs? |
@sebastienros are we aware of this issue? cc @roji |
I didn't set
Just in our internal docs at the moment, I haven't gotten around to filing GH issues for the work items yet, will do that soon. |
Then it depends on which config file you have used. I can see that the docs mention scenarios:
plaintext:
application:
job: platformbenchmarks
environmentVariables:
DOTNET_SYSTEM_NET_SOCKETS_INLINE_COMPLETIONS: 1
load: but the aspnet/benchmarks repo has one more config for platform benchmarks that does not set it: https://github.com/aspnet/Benchmarks/blob/master/src/BenchmarksApps/Kestrel/PlatformBenchmarks/platform.benchmarks.yml @kouvel you can just remove the env var or set it to
Yes, exactly. |
I would like to understand when that changed and what the gains were/are from setting it. |
99.9999% of our customer don't use Platform variants either ;) We made it the default for these 2 benchmarks as it makes them faster, so this is what we are setting on TE and what we are tracking. The middleware variants don't have this flag, so do not the other Platform ones. I still want to track this one since it's the one we'll submit to TE if it's faster, but if you prefer them to be without this ENV by default I can add the special ones as a new scenario name also in the charts. |
Ah ok I see that now. I used the config file from |
@kouvel great, thanks for checking! |
@sebastienros my main concern with this setting is that we can potentially miss a regression similar to the one that I've introduced with my first JsonPlatform is very specific as it produces the biggest workload for ThreadPool. Plaintext has a higher RPS, but to get the number of socket reads (which are scheduled to TP, writes are not as they always succeed with the first try) we have to divide the RPS by 16 (due to pipelining). So for the current TE network setup, it's 437k (7kk/16) vs 1.2kk socket reads. By monitoring the JsonPlatform with default settings we ensure that the architectural changes that we have introduced in .NET 5 (1 pool thread with parallel TP work items scheduling) are not regressing in extreme scenarios. |
Another thing about
I am not sure when exactly, but my understanding is that has given us an extra 20-30k RPS for JsonPlatform. |
We don't cheat. Platform is the set of benchmarks where we apply all the tricks we can to get the fastest benchmarks. The middleware ones are the scenarios where we try to follow what users would do. And we have always used release or go-live versions. This setting is available, the same way we used to enable tiered compilation when it was just a flag. If there are other knobs we can change to make the Platform benchmarks faster on these machines, we should do it. |
At least for regular tracking purposes it would be nice to maybe use a different config file that does not set |
I will make |
Sleep(1)
is closer toSleep(15)
so it may be even worse, but in both cases the change has the effect of removing thread pool worker threads from the system for relatively long periods of time, and in some cases the fewer remaining threads are not enough to get the expected throughput. During that change we had tested adding theSleep(1)
to someConcurrentQueue
operations in contending cases, and not spin-waiting at all in forward-progressing cases, compared with the prior default of spin-waiting withoutSleep(1)
. Not spin-waiting at all where possible in contending cases seemed to be better or equal for the most part compared with the prior default, so I have removed spin-waiting in forward-progressing cases inConcurrentQueue
.Fixes #45716