Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regressions from "Add a head merging transformation" #90929

Closed
performanceautofiler bot opened this issue Aug 22, 2023 · 7 comments
Closed

Regressions from "Add a head merging transformation" #90929

performanceautofiler bot opened this issue Aug 22, 2023 · 7 comments
Assignees
Labels
arch-x64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI os-windows runtime-coreclr specific to the CoreCLR runtime tenet-performance Performance related issue tenet-performance-benchmarks Issue from performance benchmark
Milestone

Comments

@performanceautofiler
Copy link

Run Information

Name Value
Architecture x64
OS Windows 10.0.19042
Queue OwlWindows
Baseline 374b1116d14a2912f8b36ffbd9523001cceb8316
Compare 76d454907d00722609fe170d7d97bfbe644abc50
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Collections.ContainsFalse<Int32>

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
16.59 μs 24.83 μs 1.50 0.14 False

graph
Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Collections.ContainsFalse&lt;Int32&gt;*'

Payloads

Baseline
Compare

System.Collections.ContainsFalse<Int32>.Stack(Size: 512)

ETL Files

Histogram

Description of detection logic

IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small.
IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline.
IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small.
IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small.
IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline.
IsRegressionWindowed: Marked as regression because we could not find enough baseline builds for window checking.
IsChangePoint: Marked as a change because one of 5/26/2023 6:22:05 AM, 7/5/2023 7:07:19 PM, 7/12/2023 2:12:47 AM, 7/23/2023 4:13:29 AM, 8/17/2023 8:30:03 PM, 8/21/2023 10:38:45 PM falls between 8/13/2023 2:50:54 AM and 8/21/2023 10:38:45 PM.
IsRegressionStdDev: Marked as regression because -23.32740069882594 (T) = (0 -24437.675796408497) / Math.Sqrt((902218.3064969234 / (9)) + (46976.70673320098 / (10))) is less than -2.109815577813699 = MathNet.Numerics.Distributions.StudentT.InvCDF(0, 1, (9) + (10) - 2, .025) and -0.4476657453529143 = (16880.744657290375 - 24437.675796408497) / 16880.744657290375 is less than -0.05.
IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small.
IsChangeEdgeDetector: Marked not as a regression because Edge Detector said so.

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

@performanceautofiler performanceautofiler bot added arch-x64 os-windows runtime-coreclr specific to the CoreCLR runtime untriaged New issue has not been triaged by the area owner labels Aug 22, 2023
@EgorBo EgorBo changed the title [Perf] Windows/x64: 1 Regression on 8/18/2023 9:39:07 AM Regressions in System.Collections.ContainsFalse Aug 22, 2023
@EgorBo EgorBo changed the title Regressions in System.Collections.ContainsFalse Regressions from "Add a head merging transformation" Aug 22, 2023
@EgorBo EgorBo transferred this issue from dotnet/perf-autofiling-issues Aug 22, 2023
@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Aug 22, 2023
@EgorBo EgorBo added tenet-performance Performance related issue tenet-performance-benchmarks Issue from performance benchmark labels Aug 22, 2023
@EgorBo
Copy link
Member

EgorBo commented Aug 22, 2023

#90468 cc @jakobbotsch

@EgorBo EgorBo added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Aug 22, 2023
@ghost
Copy link

ghost commented Aug 22, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Run Information

Name Value
Architecture x64
OS Windows 10.0.19042
Queue OwlWindows
Baseline 374b1116d14a2912f8b36ffbd9523001cceb8316
Compare 76d454907d00722609fe170d7d97bfbe644abc50
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Collections.ContainsFalse<Int32>

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
16.59 μs 24.83 μs 1.50 0.14 False

graph
Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Collections.ContainsFalse&lt;Int32&gt;*'

Payloads

Baseline
Compare

System.Collections.ContainsFalse<Int32>.Stack(Size: 512)

ETL Files

Histogram

Description of detection logic

IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small.
IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline.
IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small.
IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small.
IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline.
IsRegressionWindowed: Marked as regression because we could not find enough baseline builds for window checking.
IsChangePoint: Marked as a change because one of 5/26/2023 6:22:05 AM, 7/5/2023 7:07:19 PM, 7/12/2023 2:12:47 AM, 7/23/2023 4:13:29 AM, 8/17/2023 8:30:03 PM, 8/21/2023 10:38:45 PM falls between 8/13/2023 2:50:54 AM and 8/21/2023 10:38:45 PM.
IsRegressionStdDev: Marked as regression because -23.32740069882594 (T) = (0 -24437.675796408497) / Math.Sqrt((902218.3064969234 / (9)) + (46976.70673320098 / (10))) is less than -2.109815577813699 = MathNet.Numerics.Distributions.StudentT.InvCDF(0, 1, (9) + (10) - 2, .025) and -0.4476657453529143 = (16880.744657290375 - 24437.675796408497) / 16880.744657290375 is less than -0.05.
IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small.
IsChangeEdgeDetector: Marked not as a regression because Edge Detector said so.

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

Author: performanceautofiler[bot]
Assignees: -
Labels:

os-windows, tenet-performance, tenet-performance-benchmarks, arch-x64, area-CodeGen-coreclr, untriaged, runtime-coreclr

Milestone: -

@EgorBo
Copy link
Member

EgorBo commented Aug 22, 2023

@jakobbotsch
Copy link
Member

There is no diff in the hot function of the benchmark, but there is one in the outer benchmark function, though I do not see how it could explain the performance difference. The benchmark seems quite noisy from the past, so maybe the changes perturbed it as well. Let's wait and see.

System.Collections.ContainsFalse<Int32>.Stack(Size: 512)

Hot functions:

  • (91.26%) SpanHelpers.LastIndexOfValueType (Tier-1)
    • No diffs
  • (6.70%) System.Collections.ContainsFalse`1.Stack (Tier-1)
    • Has diffs
Diffs

[MicroBenchmarks]System.Collections.ContainsFalse`1[System.Int32].Stack()

 ; optimized using Dynamic PGO
 ; rsp based frame
 ; fully interruptible
-; with Dynamic PGO: edge weights are valid, and fgCalledCount is 11984
+; with Dynamic PGO: edge weights are valid, and fgCalledCount is 7697
 ; 3 inlinees with PGO data; 3 single block inlinees; 0 inlinees without PGO data
 ; Final local variable assignments
 ;
-;  V00 this         [V00,T24] (  4,   4   )     ref  ->  rcx         this class-hnd single-def <System.Collections.ContainsFalse`1[int]>
-;  V01 loc0         [V01,T12] (  4,1018.65)    bool  ->  rbx        
-;  V02 loc1         [V02,T13] (  3,1017.65)     ref  ->  rsi         class-hnd single-def <System.Collections.Generic.Stack`1[int]>
-;  V03 loc2         [V03,T21] (  3, 510.33)     ref  ->  rdi         class-hnd single-def <int[]>
-;  V04 loc3         [V04,T02] (  5,2034.30)     int  ->  rbp        
+;  V00 this         [V00,T21] (  4,   4   )     ref  ->  rcx         this class-hnd single-def <System.Collections.ContainsFalse`1[int]>
+;  V01 loc0         [V01,T12] (  4,1023.61)    bool  ->  rbx        
+;  V02 loc1         [V02,T13] (  3,1022.61)     ref  ->  rsi         class-hnd single-def <System.Collections.Generic.Stack`1[int]>
+;  V03 loc2         [V03,T19] (  3, 512.80)     ref  ->  rdi         class-hnd single-def <int[]>
+;  V04 loc3         [V04,T01] (  5,2044.21)     int  ->  rbp        
 ;  V05 OutArgs      [V05    ] (  1,   1   )  struct (32) [rsp+0x00]  do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
-;  V06 tmp1         [V06,T15] (  3,1016.65)    bool  ->  rdx         "Inline return value spill temp"
-;  V07 tmp2         [V07,T04] (  2,2033.30)     int  ->  rdx         "Inlining Arg"
-;  V08 tmp3         [V08,T00] (  3,3049.95)     ref  ->  rax         class-hnd "Inlining Arg" <int[]>
-;  V09 tmp4         [V09,T01] (  3,3049.95)     int  ->  r15         "Inlining Arg"
-;  V10 tmp5         [V10,T08] (  4,1524.98)     ref  ->  rax        
-;  V11 tmp6         [V11,T16] (  3,1016.65)     int  ->  rdx        
-;  V12 tmp7         [V12,T23] (  2, 508.33)     int  ->  r13        
-;  V13 tmp8         [V13,T14] (  3,1016.65)     ref  ->  rax        
-;  V14 tmp9         [V14,T17] (  3,1016.65)     int  ->  rdx        
-;  V15 tmp10        [V15,T06] (  7,2032.49)     int  ->  r15        
-;  V16 tmp11        [V16,T07] (  6,2032.49)     int  ->   r8        
-;  V17 tmp12        [V17,T18] (  3,1016.65)     int  ->  rcx         "Inline return value spill temp"
-;  V18 tmp13        [V18,T05] (  2,2033.30)     int  ->  rdx         ld-addr-op "Inlining Arg"
-;  V19 tmp14        [V19,T19] (  3,1016.65)     int  ->  r15         "Inline stloc first use temp"
-;  V20 tmp15        [V20,T09] (  3,1524.98)     int  ->  rax         "Inline stloc first use temp"
-;  V21 tmp16        [V21,T20] (  3,1016.65)     int  ->  rcx        
+;  V06 tmp1         [V06,T14] (  3,1021.61)    bool  ->  rdx         "Inline return value spill temp"
+;  V07 tmp2         [V07,T04] (  2,2043.21)     int  ->  rdx         "Inlining Arg"
+;  V08 tmp3         [V08,T00] (  3,3064.82)     ref  ->  rax         class-hnd "Inlining Arg" <int[]>
+;  V09 tmp4         [V09,T05] (  2,2043.21)     int  ->  r15         "Inlining Arg"
+;* V10 tmp5         [V10    ] (  0,   0   )     ref  ->  zero-ref   
+;* V11 tmp6         [V11    ] (  0,   0   )     int  ->  zero-ref   
+;* V12 tmp7         [V12    ] (  0,   0   )     int  ->  zero-ref   
+;  V13 tmp8         [V13,T08] (  3,1532.41)     ref  ->  rax        
+;  V14 tmp9         [V14,T15] (  3,1021.61)     int  ->  rdx        
+;  V15 tmp10        [V15,T06] (  6,2038.26)     int  ->  r15        
+;  V16 tmp11        [V16,T07] (  6,2038.26)     int  ->   r8        
+;  V17 tmp12        [V17,T16] (  3,1021.61)     int  ->  rcx         "Inline return value spill temp"
+;  V18 tmp13        [V18,T03] (  3,2043.21)     int  ->  rdx         ld-addr-op "Inlining Arg"
+;  V19 tmp14        [V19,T17] (  3,1021.61)     int  ->  r15         "Inline stloc first use temp"
+;  V20 tmp15        [V20,T09] (  3,1532.41)     int  ->  rax         "Inline stloc first use temp"
+;  V21 tmp16        [V21,T18] (  3,1021.61)     int  ->  rcx        
 ;* V22 tmp17        [V22    ] (  0,   0   )   byref  ->  zero-ref    "Inlining Arg"
 ;* V23 tmp18        [V23    ] (  0,   0   )     int  ->  zero-ref    "Inlining Arg"
-;  V24 cse0         [V24,T11] (  3,1523.35)     int  ->  r15         "CSE - aggressive"
-;  V25 cse1         [V25,T10] (  3,1524.98)     int  ->  r15         "CSE - aggressive"
-;  V26 cse2         [V26,T03] (  4,2033.30)     int  ->  r12         "CSE - aggressive"
-;  V27 cse3         [V27,T22] (  3, 510.33)     int  ->  r14         "CSE - moderate"
+;  V24 cse0         [V24,T11] (  3,1522.51)     int  ->  r15         "CSE - aggressive"
+;  V25 cse1         [V25,T10] (  3,1532.41)     int  ->  r15         "CSE - aggressive"
+;  V26 cse2         [V26,T02] (  5,2043.21)     int  ->  r13         "CSE - aggressive"
+;  V27 cse3         [V27,T20] (  3, 512.80)     int  ->  r14         "CSE - moderate"
 ;
-; Lcl frame size = 40
+; Lcl frame size = 32
 
 G_M8989_IG01:
        push     r15
        push     r14
        push     r13
-       push     r12
        push     rdi
        push     rsi
        push     rbp
        push     rbx
-       sub      rsp, 40
-						;; size=16 bbWeight=1 PerfScore 8.25
+       sub      rsp, 32
+						;; size=14 bbWeight=1 PerfScore 7.25
 G_M8989_IG02:
        xor      ebx, ebx
        mov      rsi, gword ptr [rcx+0x38]
@@ -310,25 +309,24 @@ G_M8989_IG03:
        dec      r15d
        test     rax, rax
        je       SHORT G_M8989_IG12
-       mov      r13d, r15d
-       mov      r12d, dword ptr [rax+0x08]
-       test     r12d, r12d
+       mov      r13d, dword ptr [rax+0x08]
+       test     r13d, r13d
        je       SHORT G_M8989_IG15
        lea      r8d, [r15+0x01]
-						;; size=47 bbWeight=508.33 PerfScore 6608.23
-G_M8989_IG04:
-       test     r12d, r12d
+       test     r13d, r13d
        je       SHORT G_M8989_IG16
-       cmp      r12d, r15d
+						;; size=49 bbWeight=510.80 PerfScore 7151.24
+G_M8989_IG04:
+       cmp      r13d, r15d
        jbe      SHORT G_M8989_IG13
        test     r8d, r8d
        jl       SHORT G_M8989_IG14
-						;; size=15 bbWeight=508.33 PerfScore 1906.22
+						;; size=10 bbWeight=510.80 PerfScore 1277.01
 G_M8989_IG05:
        sub      r15d, r8d
        inc      r15d
        js       SHORT G_M8989_IG14
-						;; size=8 bbWeight=507.51 PerfScore 761.27
+						;; size=8 bbWeight=505.85 PerfScore 758.78
 G_M8989_IG06:
        movsxd   rcx, r15d
        lea      rcx, bword ptr [rax+4*rcx+0x10]
@@ -336,37 +334,36 @@ G_M8989_IG06:
        test     eax, eax
        jge      SHORT G_M8989_IG18
        xor      ecx, ecx
-						;; size=20 bbWeight=508.33 PerfScore 2922.87
+						;; size=20 bbWeight=510.80 PerfScore 2937.12
 G_M8989_IG07:
        add      ecx, eax
-						;; size=2 bbWeight=508.33 PerfScore 127.08
+						;; size=2 bbWeight=510.80 PerfScore 127.70
 G_M8989_IG08:
        xor      edx, edx
        cmp      ecx, -1
        setne    dl
-						;; size=8 bbWeight=508.33 PerfScore 762.49
+						;; size=8 bbWeight=510.80 PerfScore 766.20
 G_M8989_IG09:
        xor      edx, ebx
        movzx    rbx, dl
        inc      ebp
        cmp      r14d, ebp
        jg       SHORT G_M8989_IG03
-						;; size=12 bbWeight=508.33 PerfScore 1016.65
+						;; size=12 bbWeight=510.80 PerfScore 1021.61
 G_M8989_IG10:
        mov      eax, ebx
 						;; size=2 bbWeight=1 PerfScore 0.25
 G_M8989_IG11:
-       add      rsp, 40
+       add      rsp, 32
        pop      rbx
        pop      rbp
        pop      rsi
        pop      rdi
-       pop      r12
        pop      r13
        pop      r14
        pop      r15
        ret      
-						;; size=17 bbWeight=1 PerfScore 5.25
+						;; size=15 bbWeight=1 PerfScore 4.75
 G_M8989_IG12:
        mov      ecx, 2
        call     [System.ThrowHelper:ThrowArgumentNullException(int)]
@@ -381,9 +378,9 @@ G_M8989_IG14:
        int3     
 						;; size=7 bbWeight=0 PerfScore 0.00
 G_M8989_IG15:
-       mov      r15d, r13d
        xor      r8d, r8d
-       jmp      SHORT G_M8989_IG04
+       test     r13d, r13d
+       jne      SHORT G_M8989_IG04
 						;; size=8 bbWeight=0 PerfScore 0.00
 G_M8989_IG16:
        cmp      r15d, -1
@@ -406,6 +403,6 @@ G_M8989_IG19:
        jmp      SHORT G_M8989_IG09
 						;; size=4 bbWeight=0 PerfScore 0.00
 
-; Total bytes of code 234, prolog size 16, PerfScore 14149.72, instruction count 87, allocated bytes for code 234 (MethodHash=eb11dce2) for method System.Collections.ContainsFalse`1[int]:Stack():bool:this (Tier1)
+; Total bytes of code 227, prolog size 14, PerfScore 14082.35, instruction count 84, allocated bytes for code 227 (MethodHash=eb11dce2) for method System.Collections.ContainsFalse`1[int]:Stack():bool:this (Tier1)
 ; ============================================================
 

@JulieLeeMSFT JulieLeeMSFT added this to the 8.0.0 milestone Aug 22, 2023
@ghost ghost removed the untriaged New issue has not been triaged by the area owner label Aug 22, 2023
@JulieLeeMSFT
Copy link
Member

@jakobbotsch, please close this issue once you determine it is just a noise.

@jakobbotsch
Copy link
Member

The commit in question here is not part of .NET 8 so I will move this for now at least.

@jakobbotsch jakobbotsch modified the milestones: 8.0.0, 9.0.0 Aug 22, 2023
@jakobbotsch
Copy link
Member

This returned back to normal

@ghost ghost locked as resolved and limited conversation to collaborators Dec 6, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-x64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI os-windows runtime-coreclr specific to the CoreCLR runtime tenet-performance Performance related issue tenet-performance-benchmarks Issue from performance benchmark
Projects
None yet
Development

No branches or pull requests

3 participants