-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consuming static readonly Vector128<T> fields results in non-optimal codegen #13514
Comments
It feels a bit like this from @mikedn dotnet/coreclr#21711 (looking at the code gen sample diffs he provides) though I may be wrong? |
This is not related to dotnet/coreclr#21711 (though I happen to have noticed this pattern while working on that). My understanding is that static struct fields actually contain references to boxed structs so there's no way to burn the exact address in code because GC can move the boxed object. |
Would |
Doesn't seem to have any effect on the JIT side. It could be that the JIT doesn't know about it or simply ignores it. |
I don't think that using the |
@janvorli dotnet/coreclr#20886 blocked changing the value of initialized readonly statics via reflection. |
The runtime does not take advantage of this attribute for optimizations and this attribute is actually bad for performance. This attribute introduces permanent pinning in the GC heap that prevents full GC heap compaction and that the GC has to allocate around. Back when safe and pure managed C++ was popular, we had multiple situations where we traced down GC-related performance problems to this attribute. I do not think we should count on |
Oh, I wish I could just use |
I'm not sure why you would need const/static to begin with, you can use normal local variables. The bigger problem with that simdjson code might be that it relies on lambdas... |
One issue here is that You fixed the latter in dotnet/coreclr#14393 but we still have https://github.com/dotnet/coreclr/issues/22388 tracking the former. |
I was thinking that if you use these values in a loop then it's sufficient to just store the "constant" vector in a local variable outside of the loop. That will be similar to what we can achieve by special JIT constant SIMD support or by using static readonly fields. But yes, if we're not dealing with loops (or if the loop trip count is so small that the constant SIMD creation is significant) then we'll probably need some JIT support. |
@jkotas, given that we are exposing public static readonly Vector128[] FixedData = CreateFixedData();
private static Vector128[] CreateFixedData()
{
var fixedData = GC.AllocateUninitializedArray<Vector128<float>>(count, alignment: 16, pinned: true);
fixedData[0] = Vector128.Create(...);
// ...
return fixedData;
} Would it possibly be worthwhile exposing an attribute that makes this "easier" to do and which would allow the JIT to automatically optimize access to said data (my initial guess would be this pattern isn't common enough to warrant a JIT optimization; and people could make do by explicitly doing the above and using |
Btw, isn't GC going to have a separate heap for pinned objects so it will no longer be a problem? |
That's kind of what I was getting at. With the introduction of the new Additionally, it might be possible for the JIT to recognize that such However, it might not be worth the investment when you could reasonably achieve the same thing via manual mechanisms (considering it is likely a rare usecase). |
That would again mean redoing how statics are dealt with by the runtime. If we spent time going that refactoring, we can make this "just work" without any special hints. |
Closing as completed, codegen for G_M5242_IG01: ;; offset=0000H
C5F877 vzeroupper
;; size=3 bbWeight=1 PerfScore 1.00
G_M5242_IG02: ;; offset=0003H
C5F81001 vmovups xmm0, xmmword ptr [rcx]
C5F9FC0511000000 vpaddb xmm0, xmm0, xmmword ptr [reloc @RWD00]
C5F9D7C0 vpmovmskb eax, xmm0
;; size=16 bbWeight=1 PerfScore 8.00
G_M5242_IG03: ;; offset=0013H
C3 ret
;; size=1 bbWeight=1 PerfScore 1.00
RWD00 dq 0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh |
Sample code:
Codegen:
Since the static ctor has already been run, the JIT could be optimized to burn into the codegen the exact memory location of where the
Vector0x0F
field lives. This would allow eliding the dereference highlighted in the above codegen sample./cc @tannergooding
category:cq
theme:basic-cq
skill-level:expert
cost:medium
The text was updated successfully, but these errors were encountered: