Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression in JsonPlatform due to AVX-512 changes #85543

Closed
sebastienros opened this issue Apr 28, 2023 · 6 comments · Fixed by #86510
Closed

Regression in JsonPlatform due to AVX-512 changes #85543

sebastienros opened this issue Apr 28, 2023 · 6 comments · Fixed by #86510
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI tenet-performance Performance related issue
Milestone

Comments

@sebastienros
Copy link
Member

#85389 seems to have introduced a regression on JsonPlatform.

Investigation details

In the dashboard I selected the two points of the regression:

image

Then went at the bottom to get the crank command lines to re-run in case it is a false positive:

image

Executed the command for first point:

crank --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/scenarios/platform.benchmarks.yml --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/build/azure.profile.yml --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/build/ci.profile.yml --scenario json --profile intel-lin-app --profile intel-load2-load --application.framework net8.0 --application.collectDependencies true --application.options.collectCounters true --application.aspNetCoreVersion 8.0.0-preview.5.23227.6 --application.runtimeVersion 8.0.0-preview.5.23227.12 --application.sdkVersion 8.0.100-preview.5.23227.8

Got this:

| Requests/sec | 1,213,012 |

Then the second point:

crank --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/scenarios/platform.benchmarks.yml --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/build/azure.profile.yml --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/build/ci.profile.yml --scenario json --profile intel-lin-app --profile intel-load2-load --application.framework net8.0 --application.collectDependencies true --application.options.collectCounters true --application.aspNetCoreVersion 8.0.0-preview.5.23227.10 --application.runtimeVersion 8.0.0-preview.5.23227.25 --application.sdkVersion 8.0.100-preview.5.23228.1

Got:

| Requests/sec | 1,110,844 |

Which means it's not a false positive. Next step is to get the minimal set of commits. Assuming it's in the runtime I got check all the builds available between the ones that we just ran: --application.runtimeVersion 8.0.0-preview.5.23227.12 and --application.runtimeVersion 8.0.0-preview.5.23227.25

On this page https://pkgs.dev.azure.com/dnceng/public/_packaging/dotnet8/nuget/v3/flat2/Microsoft.NetCore.App.Runtime.linux-x64/index.json I tried the next one after 23227.12 which is 23227.22. (I would usually do a dichotomy if there are many options)

And with this new version of the runtime

crank --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/scenarios/platform.benchmarks.yml --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/build/azure.profile.yml --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/build/ci.profile.yml --scenario json --profile intel-lin-app --profile intel-load2-load --application.framework net8.0 --application.collectDependencies true --application.options.collectCounters true --application.aspNetCoreVersion 8.0.0-preview.5.23227.6 --application.runtimeVersion 8.0.0-preview.5.23227.22 --application.sdkVersion 8.0.100-preview.5.23227.8

Which gave me

| Requests/sec | 1,103,846 |

We can assume that the regression is between 23227.12 and 23227.22. When we run these commands it outputs the commit of the runtime

| .NET Runtime Version | 8.0.0-preview.5.23227.22+953d290 |

So we get this delta:

0f06ede...953d290

It contains the AVX changes and other changes that might not seem to have an impact on perf.

Then I ran the same command with the regression, adding the --application.environmentVariables DOTNET_EnableAVX512F=0 argument and got

| Requests/sec | 1,213,012 |

This seems to prove that this is the issue.

Now If I wanted to run the same things on aspnet-perf-lin because it has a more recent CPU and the profiles might be different I would remove all the --profile arguments from the commands and add --profile aspnet-perf-lin instead.

If you make a local fix in the form of local binaries, you can check they have an impact by using this argument

--application.options.outputFiles c:\builds\release*.* and it will upload these files in the self-contained published folder to patch the ones taken from the selected runtime. So it matters that you build from a branch/commit that is close to the one you set the version to.

@sebastienros sebastienros added the tenet-performance Performance related issue label Apr 28, 2023
@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Apr 28, 2023
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Apr 28, 2023
@teo-tsirpanis teo-tsirpanis added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Apr 28, 2023
@ghost
Copy link

ghost commented Apr 28, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

#85389 seems to have introduced a regression on JsonPlatform.

Investigation details

In the dashboard I selected the two points of the regression:

image

Then went at the bottom to get the crank command lines to re-run in case it is a false positive:

image

Executed the command for first point:

crank --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/scenarios/platform.benchmarks.yml --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/build/azure.profile.yml --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/build/ci.profile.yml --scenario json --profile intel-lin-app --profile intel-load2-load --application.framework net8.0 --application.collectDependencies true --application.options.collectCounters true --application.aspNetCoreVersion 8.0.0-preview.5.23227.6 --application.runtimeVersion 8.0.0-preview.5.23227.12 --application.sdkVersion 8.0.100-preview.5.23227.8

Got this:

| Requests/sec | 1,213,012 |

Then the second point:

crank --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/scenarios/platform.benchmarks.yml --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/build/azure.profile.yml --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/build/ci.profile.yml --scenario json --profile intel-lin-app --profile intel-load2-load --application.framework net8.0 --application.collectDependencies true --application.options.collectCounters true --application.aspNetCoreVersion 8.0.0-preview.5.23227.10 --application.runtimeVersion 8.0.0-preview.5.23227.25 --application.sdkVersion 8.0.100-preview.5.23228.1

Got:

| Requests/sec | 1,110,844 |

Which means it's not a false positive. Next step is to get the minimal set of commits. Assuming it's in the runtime I got check all the builds available between the ones that we just ran: --application.runtimeVersion 8.0.0-preview.5.23227.12 and --application.runtimeVersion 8.0.0-preview.5.23227.25

On this page https://pkgs.dev.azure.com/dnceng/public/_packaging/dotnet8/nuget/v3/flat2/Microsoft.NetCore.App.Runtime.linux-x64/index.json I tried the next one after 23227.12 which is 23227.22. (I would usually do a dichotomy if there are many options)

And with this new version of the runtime

crank --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/scenarios/platform.benchmarks.yml --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/build/azure.profile.yml --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/build/ci.profile.yml --scenario json --profile intel-lin-app --profile intel-load2-load --application.framework net8.0 --application.collectDependencies true --application.options.collectCounters true --application.aspNetCoreVersion 8.0.0-preview.5.23227.6 --application.runtimeVersion 8.0.0-preview.5.23227.22 --application.sdkVersion 8.0.100-preview.5.23227.8

Which gave me

| Requests/sec | 1,103,846 |

We can assume that the regression is between 23227.12 and 23227.22. When we run these commands it outputs the commit of the runtime

| .NET Runtime Version | 8.0.0-preview.5.23227.22+953d290 |

So we get this delta:

0f06ede...953d290

It contains the AVX changes and other changes that might not seem to have an impact on perf.

Then I ran the same command with the regression, adding the --application.environmentVariables DOTNET_EnableAVX512F=0 argument and got

| Requests/sec | 1,213,012 |

This seems to prove that this is the issue.

Now If I wanted to run the same things on aspnet-perf-lin because it has a more recent CPU and the profiles might be different I would remove all the --profile arguments from the commands and add --profile aspnet-perf-lin instead.

If you make a local fix in the form of local binaries, you can check they have an impact by using this argument

--application.options.outputFiles c:\builds\release*.* and it will upload these files in the self-contained published folder to patch the ones taken from the selected runtime. So it matters that you build from a branch/commit that is close to the one you set the version to.

Author: sebastienros
Assignees: -
Labels:

tenet-performance, area-CodeGen-coreclr, untriaged

Milestone: -

@danmoseley
Copy link
Member

danmoseley commented Apr 28, 2023

Cc @EgorBo for #85389

@EgorBo
Copy link
Member

EgorBo commented Apr 28, 2023

I measured some TE benchmarks by hands in #85389 and they were ok but probably didn't do this one. Essentially, the problem is the same as #84577 and we don't have a good solution for it as we don't want to maintain an "allow list" to use AVX-512. All the optimizations which use AVX-512 can potentially slow down old intel cpus like our Xeon Gold 5120.

@dotnet/avx512-contrib for thoughts on this

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Apr 29, 2023
@tannergooding
Copy link
Member

@EgorBo, I've put up a prototype for this here: #85551

Basically it just adds a DOTNET_PreferredVectorBitWidth variable which defaults to 256-bits on pre Ice-Lake. While we've tried to avoid such a bit of functionality for a while, this is a feature that other compilers do have.

It correspondingly adds a DOTNET_MaxVectorTBitWidth variable which will allow users to opt-in to larger or smaller sizes of Vector<T> without requiring the entire ISA to be disabled.

@JulieLeeMSFT JulieLeeMSFT removed the untriaged New issue has not been triaged by the area owner label May 1, 2023
@JulieLeeMSFT JulieLeeMSFT added this to the 8.0.0 milestone May 1, 2023
@sebastienros
Copy link
Member Author

I also detected this regression on Fortunes EF and Fortunes Dapper but not Fortunes (with only ADO.NET)

@tannergooding
Copy link
Member

This should be resolved with #86457

@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label May 19, 2023
@tannergooding tannergooding reopened this May 19, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Jun 19, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI tenet-performance Performance related issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants