Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Perf] Linux/arm64: Regressions in System.Collections.Concurrent.IsEmpty<Int32> #88483

Closed
performanceautofiler bot opened this issue Jul 6, 2023 · 8 comments
Assignees
Labels
arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI os-linux Linux OS (any supported distro) runtime-coreclr specific to the CoreCLR runtime tenet-performance Performance related issue tenet-performance-benchmarks Issue from performance benchmark
Milestone

Comments

@performanceautofiler
Copy link

Run Information

Name Value
Architecture arm64
OS ubuntu 20.04
Queue AmpereUbuntu
Baseline cf50b7d86a1456e9aad09ddeafd5ff8701f9afdc
Compare bba7a9c67e5b6728833bedbb368f732fb499bcc1
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Collections.Concurrent.IsEmpty<Int32>

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
Queue - Duration of single invocation 9.82 ns 14.76 ns 1.50 0.18 False

graph
Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Collections.Concurrent.IsEmpty&lt;Int32&gt;*'

Payloads

Baseline
Compare

System.Collections.Concurrent.IsEmpty<Int32>.Queue(Size: 0)

ETL Files

Histogram


Description of detection logic

IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small.
IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline.
IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small.
IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small.
IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline.
IsRegressionWindowed: Marked as regression because 14.755232225143695 > 10.287517983976874.
IsChangePoint: Marked as a change because one of 5/19/2023 4:22:12 AM, 6/28/2023 11:03:41 PM, 7/5/2023 5:05:14 PM falls between 6/27/2023 3:42:18 AM and 7/5/2023 5:05:14 PM.
IsRegressionStdDev: Marked as regression because -25.250357327274703 (T) = (0 -14.96177257450248) / Math.Sqrt((0.19790267175372064 / (10)) + (0.2506987029667144 / (13))) is less than -2.079613844716807 = MathNet.Numerics.Distributions.StudentT.InvCDF(0, 1, (10) + (13) - 2, .025) and -0.5006118702171165 = (9.970447969559064 - 14.96177257450248) / 9.970447969559064 is less than -0.05.
IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small.
IsChangeEdgeDetector: Marked not as a regression because Edge Detector said so.

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

@performanceautofiler performanceautofiler bot added arch-arm64 os-linux Linux OS (any supported distro) runtime-coreclr specific to the CoreCLR runtime untriaged New issue has not been triaged by the area owner labels Jul 6, 2023
@EgorBo EgorBo removed the untriaged New issue has not been triaged by the area owner label Jul 6, 2023
@EgorBo EgorBo changed the title [Perf] Linux/arm64: 1 Regression on 6/29/2023 8:27:03 AM [Perf] Linux/arm64: Regressions in System.Collections.Concurrent.IsEmpty<Int32> Jul 6, 2023
@EgorBo EgorBo transferred this issue from dotnet/perf-autofiling-issues Jul 6, 2023
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Jul 6, 2023
@ghost
Copy link

ghost commented Jul 6, 2023

Tagging subscribers to this area: @dotnet/area-system-collections
See info in area-owners.md if you want to be subscribed.

Issue Details

Run Information

Name Value
Architecture arm64
OS ubuntu 20.04
Queue AmpereUbuntu
Baseline cf50b7d86a1456e9aad09ddeafd5ff8701f9afdc
Compare bba7a9c67e5b6728833bedbb368f732fb499bcc1
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Collections.Concurrent.IsEmpty<Int32>

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
Queue - Duration of single invocation 9.82 ns 14.76 ns 1.50 0.18 False

graph
Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Collections.Concurrent.IsEmpty&lt;Int32&gt;*'

Payloads

Baseline
Compare

System.Collections.Concurrent.IsEmpty<Int32>.Queue(Size: 0)

ETL Files

Histogram


Description of detection logic

IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small.
IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline.
IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small.
IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small.
IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline.
IsRegressionWindowed: Marked as regression because 14.755232225143695 > 10.287517983976874.
IsChangePoint: Marked as a change because one of 5/19/2023 4:22:12 AM, 6/28/2023 11:03:41 PM, 7/5/2023 5:05:14 PM falls between 6/27/2023 3:42:18 AM and 7/5/2023 5:05:14 PM.
IsRegressionStdDev: Marked as regression because -25.250357327274703 (T) = (0 -14.96177257450248) / Math.Sqrt((0.19790267175372064 / (10)) + (0.2506987029667144 / (13))) is less than -2.079613844716807 = MathNet.Numerics.Distributions.StudentT.InvCDF(0, 1, (10) + (13) - 2, .025) and -0.5006118702171165 = (9.970447969559064 - 14.96177257450248) / 9.970447969559064 is less than -0.05.
IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small.
IsChangeEdgeDetector: Marked not as a regression because Edge Detector said so.

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

Author: performanceautofiler[bot]
Assignees: -
Labels:

arch-arm64, area-System.Collections, os-linux, runtime-coreclr

Milestone: -

@EgorBo
Copy link
Member

EgorBo commented Jul 6, 2023

Either Physical promotion (cc @jakobbotsch) or #88073 cc @MichalPetryka

@EgorBo EgorBo added tenet-performance Performance related issue tenet-performance-benchmarks Issue from performance benchmark labels Jul 6, 2023
@EgorBo
Copy link
Member

EgorBo commented Jul 6, 2023

Same on win-arm64: dotnet/perf-autofiling-issues#19571

@MichalPetryka
Copy link
Contributor

Either Physical promotion (cc @jakobbotsch) or #88073 cc @MichalPetryka

#88073 has diffs here but they seem to be improvements, I don't think it's that.

@eiriktsarpalis eiriktsarpalis removed the untriaged New issue has not been triaged by the area owner label Jul 13, 2023
@eiriktsarpalis eiriktsarpalis added this to the 8.0.0 milestone Jul 13, 2023
@eiriktsarpalis
Copy link
Member

Looking at the diff it's not evident to me what might have caused this. Given that I don't have the hardware to run a git-bisect and current performance seems to be on par with 2022 runs, I'd be inclined to close this. Transferring to the codegen team for a second opinion.

@eiriktsarpalis eiriktsarpalis added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI and removed area-System.Collections labels Jul 25, 2023
@ghost
Copy link

ghost commented Jul 25, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Run Information

Name Value
Architecture arm64
OS ubuntu 20.04
Queue AmpereUbuntu
Baseline cf50b7d86a1456e9aad09ddeafd5ff8701f9afdc
Compare bba7a9c67e5b6728833bedbb368f732fb499bcc1
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Collections.Concurrent.IsEmpty<Int32>

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
Queue - Duration of single invocation 9.82 ns 14.76 ns 1.50 0.18 False

graph
Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Collections.Concurrent.IsEmpty&lt;Int32&gt;*'

Payloads

Baseline
Compare

System.Collections.Concurrent.IsEmpty<Int32>.Queue(Size: 0)

ETL Files

Histogram


Description of detection logic

IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small.
IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline.
IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small.
IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small.
IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline.
IsRegressionWindowed: Marked as regression because 14.755232225143695 > 10.287517983976874.
IsChangePoint: Marked as a change because one of 5/19/2023 4:22:12 AM, 6/28/2023 11:03:41 PM, 7/5/2023 5:05:14 PM falls between 6/27/2023 3:42:18 AM and 7/5/2023 5:05:14 PM.
IsRegressionStdDev: Marked as regression because -25.250357327274703 (T) = (0 -14.96177257450248) / Math.Sqrt((0.19790267175372064 / (10)) + (0.2506987029667144 / (13))) is less than -2.079613844716807 = MathNet.Numerics.Distributions.StudentT.InvCDF(0, 1, (10) + (13) - 2, .025) and -0.5006118702171165 = (9.970447969559064 - 14.96177257450248) / 9.970447969559064 is less than -0.05.
IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small.
IsChangeEdgeDetector: Marked not as a regression because Edge Detector said so.

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

Author: performanceautofiler[bot]
Assignees: -
Labels:

arch-arm64, os-linux, tenet-performance, tenet-performance-benchmarks, area-CodeGen-coreclr, runtime-coreclr

Milestone: 8.0.0

@JulieLeeMSFT
Copy link
Member

@jakobbotsch PTAL.

Either Physical promotion (cc @jakobbotsch) or #88073 cc @MichalPetryka

@jakobbotsch
Copy link
Member

Seems like #88073 causes the JIT to make different inlining decisions:

-2 ldfld or stfld over arguments which are structs.  Multiplier increased to 1.
-Inline candidate has 1 foldable branches.  Multiplier increased to 5.
-Inline candidate callsite is boring.  Multiplier increased to 6.3.
-Inline has 2 backward jumps (loops?).  Multiplier decreased to 4.41.
-calleeNativeSizeEstimate=687
-callsiteNativeSizeEstimate=145
-benefit multiplier=4.41
-threshold=639
-Native estimate for function size exceeds threshold for inlining 68.7 > 63.9 (multiplier = 4.41)
+2 ldfld or stfld over arguments which are structs.  Multiplier increased to 1.
+Inline candidate has 1 foldable branches.  Multiplier increased to 5.
+Inline has 2 intrinsics.  Multiplier increased to 6.6.
+Inline candidate callsite is boring.  Multiplier increased to 7.9.
+Inline has 2 backward jumps (loops?).  Multiplier decreased to 5.53.
+calleeNativeSizeEstimate=687
+callsiteNativeSizeEstimate=145
+benefit multiplier=5.53
+threshold=801
+Native estimate for function size is within threshold for inlining 68.7 <= 80.1 (multiplier = 5.53)

We end up with:

**************** Inline Tree

 Inlines into 06001A9D [via ExtendedDefaultPolicy] System.Collections.Concurrent.IsEmpty`1[int]:Queue():bool:this:
   [INL01 IL=0006 TR=000003 06007FE2] [INLINED: callee: below ALWAYS_INLINE size] System.Collections.Concurrent.ConcurrentQueue`1[int]:get_IsEmpty():bool:this
-    [INL00 IL=0004 TR=000009 06007FF1] [FAILED: call site: unprofitable inline] System.Collections.Concurrent.ConcurrentQueue`1[int]:TryPeek(byref,bool):bool:this
+    [INL02 IL=0004 TR=000009 06007FF1] [INLINED: call site: profitable inline] System.Collections.Concurrent.ConcurrentQueue`1[int]:TryPeek(byref,bool):bool:this
+      [INL00 IL=0024 TR=000028 06007FFE] [FAILED: callee: too many il bytes] System.Collections.Concurrent.ConcurrentQueueSegment`1[int]:TryPeek(byref,bool):bool:this

The interesting thing is that when we compile System.Collections.Concurrent.ConcurrentQueue`1[int]:TryPeek(byref,bool):bool:this in the baseline, we do end up inlining System.Collections.Concurrent.ConcurrentQueueSegment`1[int]:TryPeek(byref,bool):bool:this:

Inlines into 06007FF1 [via ExtendedDefaultPolicy] System.Collections.Concurrent.ConcurrentQueue`1[int]:TryPeek(byref,bool):bool:this:
  [INL01 IL=0015 TR=000006 06003822] [INLINED: callee: below ALWAYS_INLINE size] System.Threading.Volatile:Read[System.__Canon](byref):System.__Canon
  [INL02 IL=0024 TR=000013 06007FFE] [INLINED: call site: profitable inline] System.Collections.Concurrent.ConcurrentQueueSegment`1[int]:TryPeek(byref,bool):bool:this
    [INL03 IL=0041 TR=000055 06003810] [INLINED: callee: below ALWAYS_INLINE size] System.Threading.Volatile:Read(byref):int
    [INL04 IL=0068 TR=000068 06003810] [INLINED: callee: below ALWAYS_INLINE size] System.Threading.Volatile:Read(byref):int
    [INL05 IL=0146 TR=000092 06003810] [INLINED: callee: below ALWAYS_INLINE size] System.Threading.Volatile:Read(byref):int
    [INL06 IL=0167 TR=000116 06007FFB] [INLINED: callee: below ALWAYS_INLINE size] System.Collections.Concurrent.ConcurrentQueueSegment`1[int]:get_FreezeOffset():int:this
    [INL00 IL=0190 TR=000113 06003705] [FAILED: call site: unprofitable inline] System.Threading.SpinWait:SpinOnce(int):this
  [INL07 IL=0046 TR=000024 06003822] [INLINED: callee: below ALWAYS_INLINE size] System.Threading.Volatile:Read[System.__Canon](byref):System.__Canon

The result is the following codegen from my RPi (without RCPC support): https://www.diffchecker.com/xlM2tLdm/

Note that I see not regression when running the benchmark on my RPi.

Changing the inliner is not possible at this point, but it also looks like the benchmarks are improving and almost back to their own level with #70794, so I'm going to close this.

@ghost ghost locked as resolved and limited conversation to collaborators Aug 31, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI os-linux Linux OS (any supported distro) runtime-coreclr specific to the CoreCLR runtime tenet-performance Performance related issue tenet-performance-benchmarks Issue from performance benchmark
Projects
None yet
Development

No branches or pull requests

5 participants