-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Perf][Mono] Linux/x64 Regressions due to rewritten of Matrix3x2 and Matrix4x4 #80569
Comments
Run Information
Regressions in System.Numerics.Tests.Perf_Vector3
Reprogit clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Numerics.Tests.Perf_Vector3*' PayloadsHistogramSystem.Numerics.Tests.Perf_Vector3.TransformByMatrix4x4Benchmark
Description of detection logic
Description of detection logic
DocsProfiling workflow for dotnet/runtime repository |
Run Information
Regressions in System.Numerics.Tests.Perf_Matrix4x4Reprogit clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Numerics.Tests.Perf_Matrix4x4*' PayloadsHistogramSystem.Numerics.Tests.Perf_Matrix4x4.LerpBenchmark
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
DocsProfiling workflow for dotnet/runtime repository |
Run Information
Regressions in System.Numerics.Tests.Perf_Vector2
Reprogit clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Numerics.Tests.Perf_Vector2*' PayloadsHistogramSystem.Numerics.Tests.Perf_Vector2.TransformNormalByMatrix4x4Benchmark
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
DocsProfiling workflow for dotnet/runtime repository
Regressions in System.Numerics.Tests.Perf_Plane
Reprogit clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Numerics.Tests.Perf_Plane*' PayloadsHistogramSystem.Numerics.Tests.Perf_Plane.TransformByMatrix4x4Benchmark
Description of detection logic
DocsProfiling workflow for dotnet/runtime repository Run Information
Regressions in System.Numerics.Tests.Perf_Vector4
Reprogit clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Numerics.Tests.Perf_Vector4*' PayloadsHistogramSystem.Numerics.Tests.Perf_Vector4.TransformVector2ByMatrix4x4Benchmark
Description of detection logic
Description of detection logic
Description of detection logic
DocsProfiling workflow for dotnet/runtime repository Run Information
Regressions in System.Numerics.Tests.Perf_Quaternion
Reprogit clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Numerics.Tests.Perf_Quaternion*' PayloadsHistogramSystem.Numerics.Tests.Perf_Quaternion.CreateFromRotationMatrixBenchmark
Description of detection logic
DocsProfiling workflow for dotnet/runtime repository |
@tannergooding It looks like we have some pretty singificant regressions in mono-aot from this change: f8218f9. I can't say I exactly understand the change; I see there are some differences in the runtime, do we need to make a similar change in Mono? |
The issue here is that Mono SIMD support is still a WIP. The fix is to finish implementing the SIMD support for Vector2/3/4 and Vector128 so that Mono has the correct/expected codegen for these types and isn't using the slow software fallback For reference, Mono currently has acceleration for:
RyuJIT however has much more complete acceleration: |
@fanyang-mono and others have been working on the mono SIMD support, they may be able to provide a more accurate estimate on where things are and where they are expected to be for .NET 8 In general these types expose information on whether or not they are accelerated and devs will not typically be using them when |
Assigning this to @fanyang-mono and @jandupej so they can track it as part of the simd work. |
We do have already added SIMD support for Vector4 and Vector128 on x64 with LLVM. I was surprised to see some of the Matrix4x4 API's regressed with Tanner's change. For example the add and subtraction operators, the new library code only uses Vector4 add and subtraction, which should have SIMD support. I would need to inspect the generated code to find the reason. |
@naricc Is this issue incorrectly labeled as |
@fanyang-mono Yes; we aren't actually doing any AOT runs with out llvm. |
Just noting that it'd be great if we had a simple way to both run and get disassembly for mono, both for MonoJIT and MonoLLVM. With RyuJIT, it's:
Then we can |
Getting the generated code is not hard for Mono JIT either with/without LLVM. You set environment variable |
I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label. |
Current implementation of Matrix3x2 is heavily relied on Vector2 and Matrix4x4 is heavily relied on Vector4. Need to intrinsify Vector2 and Vector4 on mini JIT to bring the performance back. |
cc @matouskozak for further work. |
Won't be able to work on intrinsifying Vector2 within .NET8. Moving to 9.0.0. |
No longer regressed in AOT-llvm scenario. |
Run Information
Regressions in System.Numerics.Tests.Perf_Matrix3x2
Test Report
Repro
Payloads
Baseline
Compare
Histogram
System.Numerics.Tests.Perf_Matrix3x2.IdentityBenchmark
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Docs
Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository
The text was updated successfully, but these errors were encountered: