-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Return Matrix.Identity from static field instead of creating it every time #87723
Conversation
… time This significantly improves the performance when using this field, even on CoreCLR it is about 33% improvement.
Tagging subscribers to this area: @dotnet/area-system-numerics Issue DetailsThis significantly improves the performance when using this field, even on CoreCLR it is about 33% improvement. Quite a bit of matrix benchmarks from our suite rely on this property. This is how it was originally implemented but has since been changed to creating a matrix on the fly in f8218f9 leading to some reported regressions on mono.
|
@@ -48,19 +48,11 @@ internal struct Impl : IEquatable<Impl> | |||
Z = new Vector2(m31, m32); | |||
} | |||
|
|||
private static readonly Impl _identity = new Impl () { X = Vector2.UnitX, Y = Vector2.UnitY, Z = Vector2.Zero }; | |||
|
|||
public static Impl Identity |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be written as one-liner too:
public static Impl Identity | |
public static Impl Identity { get; } = new Impl() { X = Vector2.UnitX, Y = Vector2.UnitY, Z = Vector2.Zero }; |
Can you provide the benchmark numbers and diffs? This was an intentional change to better align with how native libraries do this and to more readily allow optimizations that are applicable to real world usages of the value. In real world cases, returning this the way it was (that is constructing a new object in the property) allows for promotion, constant folding, and other optimizations to light up. |
There are a lot of benchmarks in https://github.com/dotnet/performance/blob/main/src/benchmarks/micro/libraries/System.Numerics.Vectors/Perf_Matrix4x4.cs that call
Before this change it takes around 5.1s and after 3.3s, on coreclr amd64. On mono the gain is much higher. I don't really understand why building the value in the property would provide any additional support for JIT optimizations. This PR makes it such that we access the value from a static readonly field, which the JIT would also see it as a constant once the class is initialized. |
The idea is that Now, this is a bit theoretical with structs, because some things can go not as planned (as the benchmarks results, clearly, indicate), and somewhat recently we've gained the ability get constants out of static readonly structs (but at a late stage, in VN - historically RyuJit wasn't able to see through field access of static readonly structs at all); it would require a deeper investigation to determine what's up. |
At the same time, RyuJIT should be good with static readonly structs in this case, shouldn't it? static readonly Matrix4x4 Identity = Matrix4x4.Identity;
[MethodImpl(MethodImplOptions.NoInlining)]
static float Test()
{
return Identity.M22;
} is correctly folded in |
Yes. It is correctly optimized in Tier 1 after rejit. However, it is worse for scenarios where rejit cannot happen. This can negatively impact AOT and can leave various artifacts left around due to the existence of the static constructor. The existing code should likewise be optimizable. But it looks like the JIT gets tripped up a bit due to the
Right, and they aren't necessarily representative of "real world" code. They're setup in a way to allow tracking general changes to the code over time instead. An actual usage typically only uses
For Mono part of this is because the general For RyuJIT and Mono both, this is a bit of what was touched above that
Likewise, we'd ideally be using the new |
I analyzed why we don't get expected (constant folded) codegen in these benchmarks here: #76928 (comment)
The idea would be to switch the |
Closing this since relying on static constructors to run might be problematic in some scenarios. |
This significantly improves the performance when using this field, even on CoreCLR it is about 33% improvement. Quite a bit of matrix benchmarks from our suite rely on this property.
This is how it was originally implemented but has since been changed to creating a matrix on the fly in f8218f9 leading to some reported regressions on mono.