Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Perf] Linux/x64: 49 Regressions on 3/7/2024 2:55:42 PM #30899

Open
performanceautofiler bot opened this issue Mar 12, 2024 · 5 comments
Open

[Perf] Linux/x64: 49 Regressions on 3/7/2024 2:55:42 PM #30899

performanceautofiler bot opened this issue Mar 12, 2024 · 5 comments

Comments

@performanceautofiler
Copy link

performanceautofiler bot commented Mar 12, 2024

Run Information

Name Value
Architecture x64
OS ubuntu 22.04
Queue TigerUbuntu
Baseline 8330db998659c4e6410aba370b37e4304a517a2b
Compare c806bf697035ee47589e246ea6f6453811d6cd40
Diff Diff
Configs CompilationMode:wasm, RunKind:micro

Regressions in System.Tests.Perf_String

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
102.40 ns 120.22 ns 1.17 0.25 False
105.91 ns 125.72 ns 1.19 0.25 False
136.31 ns 154.74 ns 1.14 0.22 False
154.36 ns 193.55 ns 1.25 0.19 False
167.54 ns 188.93 ns 1.13 0.23 False
169.54 ns 194.51 ns 1.15 0.24 False

graph
graph
graph
graph
graph
graph
Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Tests.Perf_String*'

Payloads

Baseline
Compare

System.Tests.Perf_String.Substring_IntInt(s: "dzsdzsDDZSDZSDZSddsz", i1: 10, i2: 1)

ETL Files

Histogram

JIT Disasms

System.Tests.Perf_String.Substring_IntInt(s: "dzsdzsDDZSDZSDZSddsz", i1: 0, i2: 8)

ETL Files

Histogram

JIT Disasms

System.Tests.Perf_String.Remove_Int(s: "dzsdzsDDZSDZSDZSddsz", i: 10)

ETL Files

Histogram

JIT Disasms

System.Tests.Perf_String.TrimEnd(s: "Test ")

ETL Files

Histogram

JIT Disasms

System.Tests.Perf_String.Trim_CharArr(s: " Test", c: [' ', ' '])

ETL Files

Histogram

JIT Disasms

System.Tests.Perf_String.TrimEnd_CharArr(s: "Test ", c: [' ', ' '])

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository


Run Information

Name Value
Architecture x64
OS ubuntu 22.04
Queue TigerUbuntu
Baseline 8330db998659c4e6410aba370b37e4304a517a2b
Compare c806bf697035ee47589e246ea6f6453811d6cd40
Diff Diff
Configs CompilationMode:wasm, RunKind:micro

Regressions in System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
2.06 ms 2.38 ms 1.15 0.27 True
73.55 μs 85.90 μs 1.17 0.32 True
219.68 μs 268.27 μs 1.22 0.44 True
339.31 μs 362.29 μs 1.07 0.06 True
136.79 μs 151.91 μs 1.11 0.31 False
207.34 μs 263.50 μs 1.27 0.50 False
83.87 μs 103.78 μs 1.24 0.24 True
1.28 ms 1.56 ms 1.22 0.24 False
6.12 μs 6.96 μs 1.14 0.16 True
4.94 ms 6.78 ms 1.37 0.47 True
54.34 μs 65.55 μs 1.21 0.29 False
88.07 μs 104.81 μs 1.19 0.42 False
310.93 μs 345.99 μs 1.11 0.19 True
2.09 ms 2.58 ms 1.23 0.35 True
2.30 ms 2.74 ms 1.19 0.49 False
5.53 ms 6.44 ms 1.17 0.41 True
4.92 ms 7.09 ms 1.44 0.39 True
217.74 μs 276.85 μs 1.27 0.52 True
12.71 μs 14.45 μs 1.14 0.21 False
98.42 μs 115.08 μs 1.17 0.36 False
1.87 ms 2.19 ms 1.17 0.29 True

graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives&lt;Double&gt;*'

Payloads

Baseline
Compare

System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.Sigmoid(BufferLength: 3079)

ETL Files

Histogram

JIT Disasms

System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.Exp(BufferLength: 128)

ETL Files

Histogram

JIT Disasms

System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.Pow_Vectors(BufferLength: 128)

ETL Files

Histogram

JIT Disasms

System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.Ieee754Remainder_ScalarDivisor(BufferLength: 3079)

ETL Files

Histogram

JIT Disasms

System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.Distance(BufferLength: 3079)

ETL Files

Histogram

JIT Disasms

System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.Pow_ScalarBase(BufferLength: 128)

ETL Files

Histogram

JIT Disasms

System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.Sigmoid(BufferLength: 128)

ETL Files

Histogram

JIT Disasms

System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.Sin(BufferLength: 3079)

ETL Files

Histogram

JIT Disasms

System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.Distance(BufferLength: 128)

ETL Files

Histogram

JIT Disasms

System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.Pow_Vectors(BufferLength: 3079)

ETL Files

Histogram

JIT Disasms

System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.Sin(BufferLength: 128)

ETL Files

Histogram

JIT Disasms

System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.Sinh(BufferLength: 128)

ETL Files

Histogram

JIT Disasms

System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.Ieee754Remainder_ScalarDividend(BufferLength: 3079)

ETL Files

Histogram

JIT Disasms

System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.Sinh(BufferLength: 3079)

ETL Files

Histogram

JIT Disasms

System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.Log(BufferLength: 3079)

ETL Files

Histogram

JIT Disasms

System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.Pow_ScalarExponent(BufferLength: 3079)

ETL Files

Histogram

JIT Disasms

System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.Pow_ScalarBase(BufferLength: 3079)

ETL Files

Histogram

JIT Disasms

System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.Pow_ScalarExponent(BufferLength: 128)

ETL Files

Histogram

JIT Disasms

System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.Ieee754Remainder_ScalarDividend(BufferLength: 128)

ETL Files

Histogram

JIT Disasms

System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.Log(BufferLength: 128)

ETL Files

Histogram

JIT Disasms

System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.Exp(BufferLength: 3079)

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

Copy link
Author

performanceautofiler bot commented Mar 12, 2024

Run Information

Name Value
Architecture x64
OS ubuntu 22.04
Queue TigerUbuntu
Baseline 8330db998659c4e6410aba370b37e4304a517a2b
Compare c806bf697035ee47589e246ea6f6453811d6cd40
Diff Diff
Configs CompilationMode:wasm, RunKind:micro

Regressions in System.Numerics.Tensors.Tests.Perf_NumberTensorPrimitives<Double>

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
1.83 μs 2.25 μs 1.23 0.24 False
2.80 μs 3.49 μs 1.25 0.13 True
32.00 μs 36.39 μs 1.14 0.26 False
3.14 μs 3.54 μs 1.13 0.24 False
1.66 μs 1.77 μs 1.07 0.21 False
3.14 μs 3.62 μs 1.15 0.06 True
1.64 μs 1.72 μs 1.05 0.24 False
3.57 μs 3.99 μs 1.12 0.18 False

graph
graph
graph
graph
graph
graph
graph
graph
Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Numerics.Tensors.Tests.Perf_NumberTensorPrimitives&lt;Double&gt;*'

Payloads

Baseline
Compare

System.Numerics.Tensors.Tests.Perf_NumberTensorPrimitives<Double>.Negate(BufferLength: 128)

ETL Files

Histogram

JIT Disasms

System.Numerics.Tensors.Tests.Perf_NumberTensorPrimitives<Double>.SumOfMagnitudes(BufferLength: 128)

ETL Files

Histogram

JIT Disasms

System.Numerics.Tensors.Tests.Perf_NumberTensorPrimitives<Double>.Add_Scalar(BufferLength: 3079)

ETL Files

Histogram

JIT Disasms

System.Numerics.Tensors.Tests.Perf_NumberTensorPrimitives<Double>.SumOfSquares(BufferLength: 128)

ETL Files

Histogram

JIT Disasms

System.Numerics.Tensors.Tests.Perf_NumberTensorPrimitives<Double>.Divide_Scalar(BufferLength: 128)

ETL Files

Histogram

JIT Disasms

System.Numerics.Tensors.Tests.Perf_NumberTensorPrimitives<Double>.AddMultiply_ScalarAddend(BufferLength: 128)

ETL Files

Histogram

JIT Disasms

System.Numerics.Tensors.Tests.Perf_NumberTensorPrimitives<Double>.Add_Scalar(BufferLength: 128)

ETL Files

Histogram

JIT Disasms

System.Numerics.Tensors.Tests.Perf_NumberTensorPrimitives<Double>.AddMultiply_Vectors(BufferLength: 128)

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository


Run Information

Name Value
Architecture x64
OS ubuntu 22.04
Queue TigerUbuntu
Baseline 8330db998659c4e6410aba370b37e4304a517a2b
Compare c806bf697035ee47589e246ea6f6453811d6cd40
Diff Diff
Configs CompilationMode:wasm, RunKind:micro

Regressions in System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Single>

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
37.33 μs 41.93 μs 1.12 0.44 False
13.53 μs 14.58 μs 1.08 0.08 False
944.38 μs 1.16 ms 1.23 0.32 False
37.94 μs 48.29 μs 1.27 0.42 False
62.23 μs 70.05 μs 1.13 0.47 False
37.58 μs 50.85 μs 1.35 0.42 False
942.43 μs 1.11 ms 1.18 0.48 False
1.52 ms 1.65 ms 1.09 0.37 False

graph
graph
graph
graph
graph
graph
graph
graph
Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives&lt;Single&gt;*'

Payloads

Baseline
Compare

System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Single>.Exp(BufferLength: 128)

ETL Files

Histogram

JIT Disasms

System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Single>.Ieee754Remainder_ScalarDivisor(BufferLength: 128)

ETL Files

Histogram

JIT Disasms

System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Single>.Sinh(BufferLength: 3079)

ETL Files

Histogram

JIT Disasms

System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Single>.Sinh(BufferLength: 128)

ETL Files

Histogram

JIT Disasms

System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Single>.Pow_ScalarBase(BufferLength: 128)

ETL Files

Histogram

JIT Disasms

System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Single>.Sigmoid(BufferLength: 128)

ETL Files

Histogram

JIT Disasms

System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Single>.Sigmoid(BufferLength: 3079)

ETL Files

Histogram

JIT Disasms

System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Single>.Pow_ScalarExponent(BufferLength: 3079)

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository


Run Information

Name Value
Architecture x64
OS ubuntu 22.04
Queue TigerUbuntu
Baseline 8330db998659c4e6410aba370b37e4304a517a2b
Compare c806bf697035ee47589e246ea6f6453811d6cd40
Diff Diff
Configs CompilationMode:wasm, RunKind:micro

Regressions in Struct.GSeq

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
684.52 μs 787.38 μs 1.15 0.18 False

graph
Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'Struct.GSeq*'

Payloads

Baseline
Compare

Struct.GSeq.FilterSkipMapSum

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository


Run Information

Name Value
Architecture x64
OS ubuntu 22.04
Queue TigerUbuntu
Baseline 8330db998659c4e6410aba370b37e4304a517a2b
Compare c806bf697035ee47589e246ea6f6453811d6cd40
Diff Diff
Configs CompilationMode:wasm, RunKind:micro

Regressions in System.Memory.Span<Char>

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
297.37 ns 321.47 ns 1.08 0.10 False

graph
Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Memory.Span&lt;Char&gt;*'

Payloads

Baseline
Compare

System.Memory.Span<Char>.SequenceEqual(Size: 512)

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository


Run Information

Name Value
Architecture x64
OS ubuntu 22.04
Queue TigerUbuntu
Baseline 8330db998659c4e6410aba370b37e4304a517a2b
Compare c806bf697035ee47589e246ea6f6453811d6cd40
Diff Diff
Configs CompilationMode:wasm, RunKind:micro

Regressions in System.Collections.CtorGivenSize<String>

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
215.89 ns 241.29 ns 1.12 0.12 False

graph
Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Collections.CtorGivenSize&lt;String&gt;*'

Payloads

Baseline
Compare

System.Collections.CtorGivenSize<String>.List(Size: 512)

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository


Run Information

Name Value
Architecture x64
OS ubuntu 22.04
Queue TigerUbuntu
Baseline 8330db998659c4e6410aba370b37e4304a517a2b
Compare c806bf697035ee47589e246ea6f6453811d6cd40
Diff Diff
Configs CompilationMode:wasm, RunKind:micro

Regressions in System.Collections.CtorFromCollectionNonGeneric<Int32>

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
154.21 μs 178.11 μs 1.16 0.07 False

graph
Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Collections.CtorFromCollectionNonGeneric&lt;Int32&gt;*'

Payloads

Baseline
Compare

System.Collections.CtorFromCollectionNonGeneric<Int32>.ArrayList(Size: 512)

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository


Run Information

Name Value
Architecture x64
OS ubuntu 22.04
Queue TigerUbuntu
Baseline 8330db998659c4e6410aba370b37e4304a517a2b
Compare c806bf697035ee47589e246ea6f6453811d6cd40
Diff Diff
Configs CompilationMode:wasm, RunKind:micro

Regressions in SciMark2.kernel

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
2.02 secs 2.17 secs 1.08 0.01 False

graph
Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'SciMark2.kernel*'

Payloads

Baseline
Compare

SciMark2.kernel.benchSOR

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository


Run Information

Name Value
Architecture x64
OS ubuntu 22.04
Queue TigerUbuntu
Baseline 8330db998659c4e6410aba370b37e4304a517a2b
Compare c806bf697035ee47589e246ea6f6453811d6cd40
Diff Diff
Configs CompilationMode:wasm, RunKind:micro

Regressions in System.Text.Json.Tests.Perf_Guids

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
25.62 ms 28.98 ms 1.13 0.15 False

graph
Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Text.Json.Tests.Perf_Guids*'

Payloads

Baseline
Compare

System.Text.Json.Tests.Perf_Guids.WriteGuids(Formatted: False, SkipValidation: True)

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

@kg
Copy link
Member

kg commented Mar 12, 2024

Likely dotnet/runtime#99273. The size of the regression makes me think it's probably because the old (broken) heuristic was not inserting traces at the "right" places, and the traces it's inserting for these scenarios aren't profitable. May investigate the worst ones.

EDIT: Most of these regressions seem to be in the tensor code, which looks to be extremely generic code that wraps vector operators, and we also know as an existing thing that

  1. we don't implement most of the vector ops on wasm interp yet, and may never implement them all
  2. perf for scalar fallback on vector ops regressed recently (this may improve soon)

@kg
Copy link
Member

kg commented Mar 12, 2024

The Trim and Substring ones look like they were probably impacted by the changes that introduced more safepoints into jiterpreter traces, so the recent fix to remove safepoints may make those go away. Looking at a quick profile they seem to spend half their execution time in traces, though a good chunk of that is dominated by time spent allocating strings:
image
Looking at the opcodes for the traces in question there are quite a few imm safepoint branch opcodes in there which were introduced recently by an interpreter optimization. The jiterp should now handle those opcodes in a more efficient way.

@radekdoulik
Copy link
Member

radekdoulik commented Mar 13, 2024

the vector performance indeed regressed recently, the firefox impact is a bit higher than chrome's one. https://radekdoulik.github.io/WasmPerformanceMeasurements/?startDate=2024-02-27T23%3A01%3A28.000Z&endDate=2024-03-12T22%3A53%3A01.000Z&tasks=%2CVector&flavors=2%2C3%2C14%2C15
image

@kg
Copy link
Member

kg commented Mar 15, 2024

I'm hoping dotnet/runtime#99706 will claw back a lot of the vector perf if it lands, though it's impossible for me to measure locally (too much noise)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants