Fix some regressions in ASP.NET benchmarks #46120

kouvel · 2020-12-16T07:51:03Z

ConcurrentQueueSegment allows spinning threads to sleep. #44265 seems to have caused large regressions on Windows and Linux-arm64 on machines with a larger processor count. On Windows, Sleep(1) is closer to Sleep(15) so it may be even worse, but in both cases the change has the effect of removing thread pool worker threads from the system for relatively long periods of time, and in some cases the fewer remaining threads are not enough to get the expected throughput. During that change we had tested adding the Sleep(1) to some ConcurrentQueue operations in contending cases, and not spin-waiting at all in forward-progressing cases, compared with the prior default of spin-waiting without Sleep(1). Not spin-waiting at all where possible in contending cases seemed to be better or equal for the most part compared with the prior default, so I have removed spin-waiting in forward-progressing cases in ConcurrentQueue.
There were some smaller regressions from the portable thread pool on Windows. I have moved/tweaked a slight delay that I had added early on, after changes thereafter it lost its intention, with these changes it goes back to the original intention and seems to resolve some of the gap, but maybe not all of it in some tests. We'll check the graphs after this change and see if there is more to investigate. There are also other things to improve on Windows, and many of those may be separate from the portable thread pool but some may be relevant to the changes in perf characteristics.

- dotnet#44265 seems to have caused large regressions on Windows and Linux-arm64. During that change we had tested adding the `Sleep(1)` to some `ConcurrentQueue` operations in contending cases, and not spin-waiting at all in forward-progressing cases. Not spin-waiting at all where possible in contending cases seemed to be better or equal for the most part (compared with spin-waiting without `Sleep(1)`), so I have removed spin-waiting in forward-progressing cases in `ConcurrentQueue`. - There were some regressions from the portable thread pool on Windows. I have moved/tweaked a slight delay that I had added early on, after changes thereafter it lost its intention, with the changes it goes back to the original intention and seems to resolve some of the gap, but maybe not all of it in some tests. We'll check the graphs after this change and see if there is more to investigate. There are also other things to improve on Windows, and many of those may be separate from the portable thread pool but some may be relevant to the changes in perf characteristics.

kouvel · 2020-12-16T07:56:20Z

RPS numbers for some ASP.NET perf tests:

.	.	5.0.1	Before	Diff before-5	After	Diff after-5	Diff after-before
citrine-win	PlaintextPlatform	9176549.3	6702841.8	-27.0%	`9064924`.3	-1.2%	35.2%
citrine-win	JsonPlatform	757732.6	594388.2	-21.6%	759449.8	0.2%	27.8%
citrine-win	FortunesPlatform	336643.7	280116.8	-16.8%	328945.8	-2.3%	17.4%
citrine-win	Plaintext	6027880.2	5938829.0	-1.5%	5985553.0	-0.7%	0.8%
citrine-win	Json	708022.2	617337.3	-12.8%	698703.7	-1.3%	13.2%
citrine-win	Fortunes	290025.7	279021.6	-3.8%	290303.8	0.1%	4.0%
citrine-arm	PlaintextPlatform	7900866.8	3075682.5	-61.1%	`8065507`.0	2.1%	162.2%
citrine-arm	JsonPlatform	630127.4	392300.3	-37.7%	646741.0	2.6%	64.9%
citrine-arm	FortunesPlatform	72910.5	63670.3	-12.7%	77050.3	5.7%	21.0%
citrine-arm	Plaintext	2980570.8	2998581.8	0.6%	3019476.0	1.3%	0.7%
citrine-arm	Json	410387.0	380828.4	-7.2%	418505.7	2.0%	9.9%
citrine-arm	Fortunes	58481.8	61590.2	5.3%	59983.5	2.6%	-2.6%
citrine-amd	PlaintextPlatform	13610332.8	12421484.9	-8.7%	13618838.5	0.1%	9.6%
citrine-amd	JsonPlatform	`1236138`.0	`1278186`.0	3.4%	1364998.7	10.4%	6.8%
citrine-amd	FortunesPlatform	449874.2	446096.6	-0.8%	465650.0	3.5%	4.4%
citrine-amd	Plaintext	8241441.3	7864252.3	-4.6%	`8222975`.3	-0.2%	4.6%
citrine-amd	Json	705725.7	739578.3	4.8%	949315.8	34.5%	28.4%
citrine-amd	Fortunes	385240.7	353941.3	-8.1%	399461.0	3.7%	12.9%
citrine-lin	PlaintextPlatform	13690889.4	12787144.2	-6.6%	13734629.5	0.3%	7.4%
citrine-lin	JsonPlatform	1259474.3	`1253970`.7	-0.4%	1276371.5	1.3%	1.8%
citrine-lin	FortunesPlatform	439379.7	433790.7	-1.3%	451493.0	2.8%	4.1%
citrine-lin	Plaintext	5122837.0	5373168.8	4.9%	5363460.8	4.7%	-0.2%
citrine-lin	Json	983669.8	970836.0	-1.3%	988557.7	0.5%	1.8%

I have omitted citrine-lin/Fortunes because it wasn't yielding appropriate results when compared with historical graphs, doesn't seem to be working as expected at the moment.

citrine-win/PlaintextPlatform and citrine-win/FortunesPlatform may still be slight regressions compared to .NET 5, not sure for what reason, possibly the portable thread pool. The difference in numbers is somewhat consistent, but the error margin across runs in those benchmarks is fairly high, so I can't really tell. There may be something to investigate further there, will leave that for a separate investigation once it's more clear that there is a regression.

kouvel · 2020-12-16T07:56:45Z

CC @alexcovington

sebastienros · 2020-12-16T20:39:32Z

I have a hard time reading this table ;) but I assume this PR has the best numbers.

janvorli

LGTM, thank you!

sebastienros · 2020-12-16T21:20:34Z

Looks great, even on Linux, net6 being faster than net5 on Fortunes for instance.

kouvel · 2020-12-16T22:27:38Z

I have a hard time reading this table ;) but I assume this PR has the best numbers.

The "Diff before-5" column indicates the diff from before this change (in .NET 6) to 5.0.1, showing the regressions including all changes in-between. The "Diff after-5" column indicates the diff from after this change to 5.0.1, to me the most relevant column, though not directly tied to this change, shows if cumulative regressions since have been recovered. And "Diff after-before" indicates the diff from after this change to before this change, with no other changes involved.

adamsitnik · 2020-12-17T06:49:14Z

The numbers for the JsonPlatform benchmark for the AMD machine (around 1.2kk RPS) look much better than what is reported in the Power BI dashboard :

and what I am getting when I run them using crank:

 crank --config https://raw.githubusercontent.com/aspnet/Benchmarks/master/scenarios/platform.benchmarks.yml --scenario json --profile aspnet-citrine-amd --application.framework net6.0

@kouvel were you using any custom settings like DOTNET_SYSTEM_NET_SOCKETS_INLINE_COMPLETIONS: 1 or Disabled Hill Climbing when running the PlaintextPlatform and JsonPlatform benchmarks? If so, I wonder how our default settings (no env var not set) perform before and after your changes

kouvel · 2020-12-17T06:59:09Z

Ah yea I forgot to mention, I disabled hill climbing. I did a few runs here and there with hill climbing and the error margin was so high between runs that I didn't bother anymore, in normal mode it would take a lot of runs of each test to weed out the error and make any reasonable determination of improvement or regression. The hill climbing issue is known and needs to be fixed, I don't think it's worth the time to compare with hill climbing enabled at the moment, as after it is fixed it would behave closer to hill climbing disabled during the benchmark (though maybe not fully, will have to see, but at least the intention of the fix is to reduce the extra error to unnoticeable levels). There is a work item for fixing it in .NET 6, not sure yet about prioritization.

kouvel · 2020-12-17T07:03:11Z

We should be able to see from new graphs what the range is after the change compared to before, hopefully that will show if anything has not been fully fixed. I'm expecting that two tests still have regressions, but there could be more.

kouvel · 2020-12-17T07:09:36Z

I don't think all of the high error margin is due to hill climbing, for instance even with it disabled I saw ~80K RPS differences in citrine-win/PlaintextPlatform at different times, consistently for some time. I'm not sure where that would be coming from. Anyway hill climbing definitely adds a lot of error, I had pretty stable results without it and high error with it, when running in quick succession with somewhat repeatable results.

adamsitnik · 2020-12-17T10:14:50Z

I disabled hill climbing.

What about DOTNET_SYSTEM_NET_SOCKETS_INLINE_COMPLETIONS? Was it set to 1?

For some reason it's now the default setting for PlaintextPlatform and JsonPlatform benchmarks (I believe that it's wrong, we then have 1 epoll thread per core, not 1 per 30 cores which is the default setting that 99.9999% of our customers are using in production)).

It can have a big impact on the benchmark results when you are changing somethinng related to ThreadPool. I would just feel safer about any changes if they were benchmarked with the default settings (with the exception of hill climbing on machines with many cores).

There is a work item for fixing it in .NET 6

I am really happy to hear that! Is there a GH issue that I could follow or for now it's just somewhere in our internal docs?

adamsitnik · 2020-12-17T10:15:19Z

I have omitted citrine-lin/Fortunes because it wasn't yielding appropriate results when compared with historical graphs, doesn't seem to be working as expected at the moment.

@sebastienros are we aware of this issue?

cc @roji

kouvel · 2020-12-17T14:44:02Z

What about DOTNET_SYSTEM_NET_SOCKETS_INLINE_COMPLETIONS? Was it set to 1?
For some reason it's now the default setting for PlaintextPlatform and JsonPlatform benchmarks

I didn't set DOTNET_SYSTEM_NET_SOCKETS_INLINE_COMPLETIONS. If that's the default for PlaintextPlatform and JsonPlatform should I explicitly set it to 0 to override? Is it just those two tests that are affected?

I am really happy to hear that! Is there a GH issue that I could follow or for now it's just somewhere in our internal docs?

Just in our internal docs at the moment, I haven't gotten around to filing GH issues for the work items yet, will do that soon.

adamsitnik · 2020-12-17T14:56:10Z

I didn't set DOTNET_SYSTEM_NET_SOCKETS_INLINE_COMPLETIONS. If that's the default for PlaintextPlatform and JsonPlatform should I explicitly set it to 0 to override?

Then it depends on which config file you have used. I can see that the docs mention https://raw.githubusercontent.com/aspnet/Benchmarks/master/scenarios/platform.benchmarks.yml sets it to 1 in an explicit way:

scenarios:
  plaintext:
    application:
      job: platformbenchmarks
      environmentVariables:
        DOTNET_SYSTEM_NET_SOCKETS_INLINE_COMPLETIONS: 1
    load:

but the aspnet/benchmarks repo has one more config for platform benchmarks that does not set it: https://github.com/aspnet/Benchmarks/blob/master/src/BenchmarksApps/Kestrel/PlatformBenchmarks/platform.benchmarks.yml

@kouvel you can just remove the env var or set it to 0

Is it just those two tests that are affected?

Yes, exactly.

stephentoub · 2020-12-17T14:59:15Z

What about DOTNET_SYSTEM_NET_SOCKETS_INLINE_COMPLETIONS? Was it set to 1?
For some reason it's now the default setting for PlaintextPlatform and JsonPlatform benchmarks

I would like to understand when that changed and what the gains were/are from setting it.

sebastienros · 2020-12-17T15:02:18Z

For some reason it's now the default setting for PlaintextPlatform and JsonPlatform benchmarks (I believe that it's wrong, we then have 1 epoll thread per core, not 1 per 30 cores which is the default setting that 99.9999% of our customers are using in production)).

99.9999% of our customer don't use Platform variants either ;) We made it the default for these 2 benchmarks as it makes them faster, so this is what we are setting on TE and what we are tracking. The middleware variants don't have this flag, so do not the other Platform ones.

I still want to track this one since it's the one we'll submit to TE if it's faster, but if you prefer them to be without this ENV by default I can add the special ones as a new scenario name also in the charts.

kouvel · 2020-12-17T15:05:14Z

Then it depends on which config file you have used.

Ah ok I see that now. I used the config file from src/BenchmarksApps/Kestrel/PlatformBenchmarks/platform.benchmarks.yml that does not set DOTNET_SYSTEM_NET_SOCKETS_INLINE_COMPLETIONS.

adamsitnik · 2020-12-17T15:11:47Z

that does not set

@kouvel great, thanks for checking!

adamsitnik · 2020-12-17T15:22:35Z

@sebastienros my main concern with this setting is that we can potentially miss a regression similar to the one that I've introduced with my first epoll PR (#33669) where all benchmarks have improved except JsonPlatform.

JsonPlatform is very specific as it produces the biggest workload for ThreadPool. Plaintext has a higher RPS, but to get the number of socket reads (which are scheduled to TP, writes are not as they always succeed with the first try) we have to divide the RPS by 16 (due to pipelining). So for the current TE network setup, it's 437k (7kk/16) vs 1.2kk socket reads. By monitoring the JsonPlatform with default settings we ensure that the architectural changes that we have introduced in .NET 5 (1 pool thread with parallel TP work items scheduling) are not regressing in extreme scenarios.

adamsitnik · 2020-12-17T15:26:15Z

Another thing about DOTNET_SYSTEM_NET_SOCKETS_INLINE_COMPLETIONS=1 is that my understanding was that we try to not cheat by setting custom settings and using non-released builds. If we changed our view on that I can provide few more extra settings (like disabling hill climbing and lowering the number of threads in Thread Pool) that would give us another 50k RPS for JSON.

I would like to understand when that changed and what the gains were/are from setting it

I am not sure when exactly, but my understanding is that has given us an extra 20-30k RPS for JsonPlatform.

sebastienros · 2020-12-17T15:31:43Z

we try to not cheat by setting custom settings and using non-released builds

We don't cheat. Platform is the set of benchmarks where we apply all the tricks we can to get the fastest benchmarks. The middleware ones are the scenarios where we try to follow what users would do. And we have always used release or go-live versions. This setting is available, the same way we used to enable tiered compilation when it was just a flag.

If there are other knobs we can change to make the Platform benchmarks faster on these machines, we should do it.

kouvel · 2020-12-17T15:37:57Z

At least for regular tracking purposes it would be nice to maybe use a different config file that does not set DOTNET_SYSTEM_NET_SOCKETS_INLINE_COMPLETIONS, as @adamsitnik points out JsonPlatform so far has been very useful to find regressions in the thread pool but with that env var set the thread pool would be used much less and we may miss regressions. Some thread pool changes don't show up as large enough changes in Json as they do in JsonPlatform.

sebastienros · 2020-12-17T15:40:45Z

I will make 0 the default then, and track an additional scenario for TE purposes.

kouvel added the area-System.Threading label Dec 16, 2020

kouvel added this to the 6.0.0 milestone Dec 16, 2020

kouvel requested review from stephentoub and janvorli December 16, 2020 07:51

kouvel self-assigned this Dec 16, 2020

janvorli approved these changes Dec 16, 2020

View reviewed changes

kouvel merged commit 8e6415d into dotnet:master Dec 16, 2020

kouvel deleted the TpWinFix branch December 16, 2020 23:27

DrewScoggins mentioned this pull request Jan 5, 2021

[Perf] Changes at 12/17/2020 1:42:02 AM DrewScoggins/performance-2#3698

Open

kouvel mentioned this pull request Jan 7, 2021

[Perf][Mono] 50 % Regression System.Collections.Concurrent.AddRemoveFromSameThreads.ConcurrentQueue, other Queue Benchmarks #46684

Closed

eiriktsarpalis mentioned this pull request Jan 14, 2021

Regressions in RentReturn ArrayPool and ConcurrentQueue #46714

Closed

ghost locked as resolved and limited conversation to collaborators Jan 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix some regressions in ASP.NET benchmarks #46120

Fix some regressions in ASP.NET benchmarks #46120

kouvel commented Dec 16, 2020 •

edited

Loading

kouvel commented Dec 16, 2020 •

edited

Loading

kouvel commented Dec 16, 2020

sebastienros commented Dec 16, 2020

janvorli left a comment

sebastienros commented Dec 16, 2020

kouvel commented Dec 16, 2020

adamsitnik commented Dec 17, 2020

kouvel commented Dec 17, 2020

kouvel commented Dec 17, 2020

kouvel commented Dec 17, 2020 •

edited

Loading

adamsitnik commented Dec 17, 2020

adamsitnik commented Dec 17, 2020

kouvel commented Dec 17, 2020

adamsitnik commented Dec 17, 2020

stephentoub commented Dec 17, 2020

sebastienros commented Dec 17, 2020

kouvel commented Dec 17, 2020

adamsitnik commented Dec 17, 2020

adamsitnik commented Dec 17, 2020

adamsitnik commented Dec 17, 2020

sebastienros commented Dec 17, 2020

kouvel commented Dec 17, 2020

sebastienros commented Dec 17, 2020

Fix some regressions in ASP.NET benchmarks #46120

Fix some regressions in ASP.NET benchmarks #46120

Conversation

kouvel commented Dec 16, 2020 • edited Loading

kouvel commented Dec 16, 2020 • edited Loading

kouvel commented Dec 16, 2020

sebastienros commented Dec 16, 2020

janvorli left a comment

Choose a reason for hiding this comment

sebastienros commented Dec 16, 2020

kouvel commented Dec 16, 2020

adamsitnik commented Dec 17, 2020

kouvel commented Dec 17, 2020

kouvel commented Dec 17, 2020

kouvel commented Dec 17, 2020 • edited Loading

adamsitnik commented Dec 17, 2020

adamsitnik commented Dec 17, 2020

kouvel commented Dec 17, 2020

adamsitnik commented Dec 17, 2020

stephentoub commented Dec 17, 2020

sebastienros commented Dec 17, 2020

kouvel commented Dec 17, 2020

adamsitnik commented Dec 17, 2020

adamsitnik commented Dec 17, 2020

adamsitnik commented Dec 17, 2020

sebastienros commented Dec 17, 2020

kouvel commented Dec 17, 2020

sebastienros commented Dec 17, 2020

kouvel commented Dec 16, 2020 •

edited

Loading

kouvel commented Dec 16, 2020 •

edited

Loading

kouvel commented Dec 17, 2020 •

edited

Loading