Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Perf] Windows/x64: Regressions in System.MathBenchmarks.Double #80359

Closed
performanceautofiler bot opened this issue Dec 20, 2022 · 9 comments
Closed

[Perf] Windows/x64: Regressions in System.MathBenchmarks.Double #80359

performanceautofiler bot opened this issue Dec 20, 2022 · 9 comments
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI tenet-performance Performance related issue
Milestone

Comments

@performanceautofiler
Copy link

Run Information

Architecture x64
OS Windows 10.0.18362
Baseline af8694704e807d0147a3033c3e250be2966951fe
Compare 56e22a4337b0256d200b6b698aa70349c9c7f2c5
Diff Diff

Regressions in System.MathBenchmarks.Double

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio Baseline ETL Compare ETL
FusedMultiplyAdd - Duration of single invocation 7.07 μs 7.46 μs 1.06 0.00 False 55296.12756264237 50618.473895582334 0.9154072107172253 Trace Trace

graph
Test Report

Repro

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.MathBenchmarks.Double*'

Payloads

Baseline
Compare

Histogram

System.MathBenchmarks.Double.FusedMultiplyAdd


Description of detection logic

IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline.
IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small.
IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small.
IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline.
IsRegressionWindowed: Marked as regression because 7.463699113984675 > 7.32029771608462.
IsChangePoint: Marked as a change because one of 11/1/2022 2:33:16 AM, 12/16/2022 6:44:06 PM, 12/20/2022 4:30:45 AM falls between 12/11/2022 2:11:02 PM and 12/20/2022 4:30:45 AM.
IsRegressionStdDev: Marked as regression because -34.69916961037376 (T) = (0 -7483.441097082762) / Math.Sqrt((7321.549745472779 / (46)) + (584.8922091499769 / (11))) is less than -2.0040447832881556 = MathNet.Numerics.Distributions.StudentT.InvCDF(0, 1, (46) + (11) - 2, .025) and -0.07246223200251571 = (6977.81317959289 - 7483.441097082762) / 6977.81317959289 is less than -0.05.
IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small.
IsChangeEdgeDetector: Marked not as a regression because Edge Detector said so.

```### Baseline Jit Disasm

```assembly
; System.MathBenchmarks.Double.FusedMultiplyAdd()
       jmp       qword ptr [7FFAE065F918]; System.MathBenchmarks.Double.FusedMultiplyAddTest()
; Total bytes of code 6
; System.MathBenchmarks.Double.FusedMultiplyAddTest()
       push      rsi
       sub       rsp,80
       vzeroupper
       vmovaps   [rsp+70],xmm6
       vmovaps   [rsp+60],xmm7
       vmovaps   [rsp+50],xmm8
       xor       eax,eax
       mov       [rsp+28],rax
       vxorps    xmm4,xmm4,xmm4
       vmovdqa   xmmword ptr [rsp+30],xmm4
       vmovdqa   xmmword ptr [rsp+40],xmm4
       vxorps    xmm6,xmm6,xmm6
       vmovsd    xmm0,qword ptr [7FFADFF34240]
       vmovsd    xmm1,qword ptr [7FFADFF34248]
       vmovsd    xmm2,qword ptr [7FFADFF34250]
       xor       ecx,ecx
       vmovsd    xmm3,qword ptr [7FFADFF34258]
       vmovsd    xmm4,qword ptr [7FFADFF34260]
M01_L00:
       vmovaps   xmm5,xmm0
       vmovaps   xmm7,xmm1
       vmovaps   xmm8,xmm2
       vfmadd213sd xmm5,xmm7,xmm8
       vaddsd    xmm6,xmm5,xmm6
       vaddsd    xmm0,xmm0,xmm3
       vaddsd    xmm1,xmm1,xmm4
       vaddsd    xmm2,xmm2,xmm3
       inc       ecx
       cmp       ecx,1388
       jl        short M01_L00
       vmovsd    xmm0,qword ptr [7FFADFF34268]
       vsubsd    xmm0,xmm0,xmm6
       vandps    xmm0,xmm0,[7FFADFF34270]
       vucomisd  xmm6,xmm6
       jp        short M01_L01
       vucomisd  xmm0,qword ptr [7FFADFF34280]
       ja        short M01_L01
       vmovaps   xmm6,[rsp+70]
       vmovaps   xmm7,[rsp+60]
       vmovaps   xmm8,[rsp+50]
       add       rsp,80
       pop       rsi
       ret
M01_L01:
       lea       rcx,[rsp+28]
       mov       edx,20
       mov       r8d,2
       call      qword ptr [7FFAE0517030]
       mov       ecx,5E71
       mov       rdx,7FFAE066B498
       call      CORINFO_HELP_STRCNS
       mov       rdx,rax
       lea       rcx,[rsp+28]
       call      qword ptr [7FFAE0517120]
       mov       ecx,5E93
       mov       rdx,7FFAE066B498
       call      CORINFO_HELP_STRCNS
       mov       r9,rax
       vmovsd    xmm1,qword ptr [7FFADFF34268]
       lea       rcx,[rsp+28]
       mov       r8d,14
       call      qword ptr [7FFAE08479F0]
       mov       ecx,5E9B
       mov       rdx,7FFAE066B498
       call      CORINFO_HELP_STRCNS
       mov       rdx,rax
       lea       rcx,[rsp+28]
       call      qword ptr [7FFAE0517120]
       mov       ecx,5E93
       mov       rdx,7FFAE066B498
       call      CORINFO_HELP_STRCNS
       mov       r9,rax
       lea       rcx,[rsp+28]
       vmovaps   xmm1,xmm6
       mov       r8d,14
       call      qword ptr [7FFAE08479F0]
       mov       rcx,offset MT_System.Exception
       call      CORINFO_HELP_NEWSFAST
       mov       rsi,rax
       lea       rcx,[rsp+28]
       call      qword ptr [7FFAE05170C0]
       mov       rdx,rax
       mov       rcx,rsi
       call      qword ptr [7FFADFEF7C18]
       mov       rcx,rsi
       call      CORINFO_HELP_THROW
       int       3
; Total bytes of code 436

Compare Jit Disasm

; System.MathBenchmarks.Double.FusedMultiplyAdd()
       jmp       qword ptr [7FFE01E2F918]
; Total bytes of code 6

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

@performanceautofiler performanceautofiler bot added the untriaged New issue has not been triaged by the area owner label Dec 20, 2022
@kunalspathak kunalspathak transferred this issue from dotnet/perf-autofiling-issues Jan 9, 2023
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jan 9, 2023
@ghost
Copy link

ghost commented Jan 9, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak
See info in area-owners.md if you want to be subscribed.

Issue Details

Run Information

Architecture x64
OS Windows 10.0.18362
Baseline af8694704e807d0147a3033c3e250be2966951fe
Compare 56e22a4337b0256d200b6b698aa70349c9c7f2c5
Diff Diff

Regressions in System.MathBenchmarks.Double

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio Baseline ETL Compare ETL
FusedMultiplyAdd - Duration of single invocation 7.07 μs 7.46 μs 1.06 0.00 False 55296.12756264237 50618.473895582334 0.9154072107172253 Trace Trace

graph
Test Report

Repro

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.MathBenchmarks.Double*'

Payloads

Baseline
Compare

Histogram

System.MathBenchmarks.Double.FusedMultiplyAdd


Description of detection logic

IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline.
IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small.
IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small.
IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline.
IsRegressionWindowed: Marked as regression because 7.463699113984675 > 7.32029771608462.
IsChangePoint: Marked as a change because one of 11/1/2022 2:33:16 AM, 12/16/2022 6:44:06 PM, 12/20/2022 4:30:45 AM falls between 12/11/2022 2:11:02 PM and 12/20/2022 4:30:45 AM.
IsRegressionStdDev: Marked as regression because -34.69916961037376 (T) = (0 -7483.441097082762) / Math.Sqrt((7321.549745472779 / (46)) + (584.8922091499769 / (11))) is less than -2.0040447832881556 = MathNet.Numerics.Distributions.StudentT.InvCDF(0, 1, (46) + (11) - 2, .025) and -0.07246223200251571 = (6977.81317959289 - 7483.441097082762) / 6977.81317959289 is less than -0.05.
IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small.
IsChangeEdgeDetector: Marked not as a regression because Edge Detector said so.

```### Baseline Jit Disasm

```assembly
; System.MathBenchmarks.Double.FusedMultiplyAdd()
       jmp       qword ptr [7FFAE065F918]; System.MathBenchmarks.Double.FusedMultiplyAddTest()
; Total bytes of code 6
; System.MathBenchmarks.Double.FusedMultiplyAddTest()
       push      rsi
       sub       rsp,80
       vzeroupper
       vmovaps   [rsp+70],xmm6
       vmovaps   [rsp+60],xmm7
       vmovaps   [rsp+50],xmm8
       xor       eax,eax
       mov       [rsp+28],rax
       vxorps    xmm4,xmm4,xmm4
       vmovdqa   xmmword ptr [rsp+30],xmm4
       vmovdqa   xmmword ptr [rsp+40],xmm4
       vxorps    xmm6,xmm6,xmm6
       vmovsd    xmm0,qword ptr [7FFADFF34240]
       vmovsd    xmm1,qword ptr [7FFADFF34248]
       vmovsd    xmm2,qword ptr [7FFADFF34250]
       xor       ecx,ecx
       vmovsd    xmm3,qword ptr [7FFADFF34258]
       vmovsd    xmm4,qword ptr [7FFADFF34260]
M01_L00:
       vmovaps   xmm5,xmm0
       vmovaps   xmm7,xmm1
       vmovaps   xmm8,xmm2
       vfmadd213sd xmm5,xmm7,xmm8
       vaddsd    xmm6,xmm5,xmm6
       vaddsd    xmm0,xmm0,xmm3
       vaddsd    xmm1,xmm1,xmm4
       vaddsd    xmm2,xmm2,xmm3
       inc       ecx
       cmp       ecx,1388
       jl        short M01_L00
       vmovsd    xmm0,qword ptr [7FFADFF34268]
       vsubsd    xmm0,xmm0,xmm6
       vandps    xmm0,xmm0,[7FFADFF34270]
       vucomisd  xmm6,xmm6
       jp        short M01_L01
       vucomisd  xmm0,qword ptr [7FFADFF34280]
       ja        short M01_L01
       vmovaps   xmm6,[rsp+70]
       vmovaps   xmm7,[rsp+60]
       vmovaps   xmm8,[rsp+50]
       add       rsp,80
       pop       rsi
       ret
M01_L01:
       lea       rcx,[rsp+28]
       mov       edx,20
       mov       r8d,2
       call      qword ptr [7FFAE0517030]
       mov       ecx,5E71
       mov       rdx,7FFAE066B498
       call      CORINFO_HELP_STRCNS
       mov       rdx,rax
       lea       rcx,[rsp+28]
       call      qword ptr [7FFAE0517120]
       mov       ecx,5E93
       mov       rdx,7FFAE066B498
       call      CORINFO_HELP_STRCNS
       mov       r9,rax
       vmovsd    xmm1,qword ptr [7FFADFF34268]
       lea       rcx,[rsp+28]
       mov       r8d,14
       call      qword ptr [7FFAE08479F0]
       mov       ecx,5E9B
       mov       rdx,7FFAE066B498
       call      CORINFO_HELP_STRCNS
       mov       rdx,rax
       lea       rcx,[rsp+28]
       call      qword ptr [7FFAE0517120]
       mov       ecx,5E93
       mov       rdx,7FFAE066B498
       call      CORINFO_HELP_STRCNS
       mov       r9,rax
       lea       rcx,[rsp+28]
       vmovaps   xmm1,xmm6
       mov       r8d,14
       call      qword ptr [7FFAE08479F0]
       mov       rcx,offset MT_System.Exception
       call      CORINFO_HELP_NEWSFAST
       mov       rsi,rax
       lea       rcx,[rsp+28]
       call      qword ptr [7FFAE05170C0]
       mov       rdx,rax
       mov       rcx,rsi
       call      qword ptr [7FFADFEF7C18]
       mov       rcx,rsi
       call      CORINFO_HELP_THROW
       int       3
; Total bytes of code 436

Compare Jit Disasm

; System.MathBenchmarks.Double.FusedMultiplyAdd()
       jmp       qword ptr [7FFE01E2F918]
; Total bytes of code 6

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

Author: performanceautofiler[bot]
Assignees: kunalspathak
Labels:

area-CodeGen-coreclr, untriaged

Milestone: -

@kunalspathak kunalspathak changed the title [Perf] Windows/x64: 1 Regression on 12/16/2022 10:05:39 PM [Perf] Windows/x64: Regressions in System.MathBenchmarks.Double Jan 9, 2023
@kunalspathak
Copy link
Member

possibly from #79681? @DeepakRajendrakumaran - can you please confirm.

@kunalspathak
Copy link
Member

I don't see other commits in the range are related - 97a51cc...c6045ad

@JulieLeeMSFT JulieLeeMSFT removed the untriaged New issue has not been triaged by the area owner label Jan 9, 2023
@JulieLeeMSFT JulieLeeMSFT added this to the 8.0.0 milestone Jan 9, 2023
@JulieLeeMSFT JulieLeeMSFT added the tenet-performance Performance related issue label Jan 9, 2023
@DeepakRajendrakumaran
Copy link
Contributor

DeepakRajendrakumaran commented Jan 10, 2023

possibly from #79681? @DeepakRajendrakumaran - can you please confirm.

This is very very unlikely since this change should not affect normal runs. This code is used for calculating compressed displacement for EVEX encoding and as such only called when TakesEvexPrefix() == TRUE. This in turn means in order to get here either one of the following flags need to be set - DOTNET_JitForceEVEXEncoding or DOTNET_JitStressEvexEncoding

I'll go ahead and try to confirm it's not this change anyway - is the best way to do this to rerun the test and compare the disassembly? Assuming if there is no diff, it's all good?

@kunalspathak
Copy link
Member

The regression seems to stay so there is definitely something going on there (not necessarily because of your PR). I will try to investigate:

image

@tannergooding
Copy link
Member

tannergooding commented Jan 12, 2023

@kunalspathak, I'd take a guess it could be #79363 instead since that also impacts HWIntrinsic logic

If there is an asm diff that'd be helpful.

Noting that we didn't have any report regressions in superpmi diffs when it was merged, but the performance repo has some unique code so it's possible there is something that was impacted still.

@kunalspathak
Copy link
Member

@kunalspathak, I'd take a guess it could be #79363 instead since that also impacts HWIntrinsic logic

I thought so, but my understanding was that it was mainly some refactoring. Haven't looked into details.

@kunalspathak
Copy link
Member

@tannergooding - will you be able to check if #79363 is causing the regression?

@tannergooding
Copy link
Member

This was caused by #79363 and while it adds an additional instruction in this case, it fixes other cases. It is within an acceptable range for the API overall.

@ghost ghost locked as resolved and limited conversation to collaborators Jul 10, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI tenet-performance Performance related issue
Projects
None yet
Development

No branches or pull requests

4 participants