Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consuming static readonly Vector128<T> fields results in non-optimal codegen #13514

Closed
GrabYourPitchforks opened this issue Oct 1, 2019 · 17 comments
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Milestone

Comments

@GrabYourPitchforks
Copy link
Member

Sample code:

private static readonly Vector128<sbyte> Vector0x0F = Vector128.Create((sbyte)0x0F);

public static int GetMask(Vector128<sbyte> data)
{
    return Sse2.MoveMask(Sse2.Add(data, Vector0x0F));
}

Codegen:

00007ffc`19901370 c5f877          vzeroupper
00007ffc`19901373 c5f91001        vmovupd xmm0,xmmword ptr [rcx]
00007ffc`19901377 48b8482ca75ea8010000 mov rax,1A85EA72C48h
00007ffc`19901381 488b00          mov     rax,qword ptr [rax]  ; this dereference could be avoided
00007ffc`19901384 c5f9fc4008      vpaddb  xmm0,xmm0,xmmword ptr [rax+8]
00007ffc`19901389 c5f9d7c0        vpmovmskb eax,xmm0
00007ffc`1990138d c3              ret

Since the static ctor has already been run, the JIT could be optimized to burn into the codegen the exact memory location of where the Vector0x0F field lives. This would allow eliding the dereference highlighted in the above codegen sample.

/cc @tannergooding

category:cq
theme:basic-cq
skill-level:expert
cost:medium

@benaadams
Copy link
Member

It feels a bit like this from @mikedn dotnet/coreclr#21711 (looking at the code gen sample diffs he provides) though I may be wrong?

@mikedn
Copy link
Contributor

mikedn commented Oct 2, 2019

This is not related to dotnet/coreclr#21711 (though I happen to have noticed this pattern while working on that). My understanding is that static struct fields actually contain references to boxed structs so there's no way to burn the exact address in code because GC can move the boxed object.

@tannergooding
Copy link
Member

tannergooding commented Oct 2, 2019

so there's no way to burn the exact address in code because GC can move the boxed object.

Would System.Runtime.CompilerServices.FixedAddressValueTypeAttribute solve this?

@mikedn
Copy link
Contributor

mikedn commented Oct 2, 2019

Would System.Runtime.CompilerServices.FixedAddressValueTypeAttribute solve this?

Doesn't seem to have any effect on the JIT side. It could be that the JIT doesn't know about it or simply ignores it.

@mikedn
Copy link
Contributor

mikedn commented Oct 2, 2019

The bigger question is if it's a good idea to use FixedAddressValueTypeAttribute. It looks like it makes the assembly uncollectible, it's not obvious that it's a good trade off.

@jkotas @janvorli ?

@janvorli
Copy link
Member

janvorli commented Oct 2, 2019

I don't think that using the FixedAddressValueTypeAttribute would help at all. Even if the JIT somehow knew the value was already initialized and could read the value out, it would not be a correct transformation. readonly variables can still be modified using reflection.

@AndyAyersMS
Copy link
Member

@janvorli dotnet/coreclr#20886 blocked changing the value of initialized readonly statics via reflection.

@jkotas
Copy link
Member

jkotas commented Oct 2, 2019

FixedAddressValueTypeAttribute was introduced for safe and pure managed C++. Managed C++ compiler emitted this attribute on your behalf in pure/safe modes.

The runtime does not take advantage of this attribute for optimizations and this attribute is actually bad for performance. This attribute introduces permanent pinning in the GC heap that prevents full GC heap compaction and that the GC has to allocate around. Back when safe and pure managed C++ was popular, we had multiple situations where we traced down GC-related performance problems to this attribute.

I do not think we should count on FixedAddressValueTypeAttribute to be a part of the solution for this. If we wanted the statics to have fixed addresses, we would want to redo how and where the statics are allocated.

@EgorBo
Copy link
Member

EgorBo commented Oct 2, 2019

Oh, I wish I could just use const keyword for them directly in methods (or at least static readonly fields inside methods, just like inner methods), e.g. if you port this C/C++ to C#'s S.R.I.X86 you will have a lot of single-use fields 🙁 https://github.com/lemire/simdjson/blob/master/src/haswell/stage1_find_marks.h#L24-L86

@mikedn
Copy link
Contributor

mikedn commented Oct 2, 2019

I'm not sure why you would need const/static to begin with, you can use normal local variables. The bigger problem with that simdjson code might be that it relies on lambdas...

@tannergooding
Copy link
Member

you can use normal local variables.

One issue here is that Vector128.Create(constant) is not yet handled (unlike Vector2/3/4 which are).

You fixed the latter in dotnet/coreclr#14393 but we still have https://github.com/dotnet/coreclr/issues/22388 tracking the former.

@mikedn
Copy link
Contributor

mikedn commented Oct 2, 2019

One issue here is that Vector128.Create(constant) is not yet handled (unlike Vector2/3/4 which are).

I was thinking that if you use these values in a loop then it's sufficient to just store the "constant" vector in a local variable outside of the loop. That will be similar to what we can achieve by special JIT constant SIMD support or by using static readonly fields. But yes, if we're not dealing with loops (or if the loop trip count is so small that the constant SIMD creation is significant) then we'll probably need some JIT support.

@tannergooding
Copy link
Member

@jkotas, given that we are exposing GC.AllocateUninitializedArray, users could explicitly do something like the below to create some an array of static readonly "constants" that is both pinned and aligned for the lifetime of the program, correct?

public static readonly Vector128[] FixedData = CreateFixedData();

private static Vector128[] CreateFixedData()
{
    var fixedData = GC.AllocateUninitializedArray<Vector128<float>>(count, alignment: 16, pinned: true);

    fixedData[0] = Vector128.Create(...);
    // ...

    return fixedData;
}

Would it possibly be worthwhile exposing an attribute that makes this "easier" to do and which would allow the JIT to automatically optimize access to said data (my initial guess would be this pattern isn't common enough to warrant a JIT optimization; and people could make do by explicitly doing the above and using Unsafe.AsPointer)?

@EgorBo
Copy link
Member

EgorBo commented Oct 18, 2019

This attribute introduces permanent pinning in the GC heap that prevents full GC heap compaction and that the GC has to allocate around.

Btw, isn't GC going to have a separate heap for pinned objects so it will no longer be a problem?

@tannergooding
Copy link
Member

Btw, isn't GC going to have a separate heap for pinned objects so it will no longer be a problem?

That's kind of what I was getting at.

With the introduction of the new GC.AllocateUninitializedArray and the features that come with it; it should be possible for us to actually allocate said data directly to the pinned heap, for example.

Additionally, it might be possible for the JIT to recognize that such static readonly fields exist in the pinned heap and emit more efficient code (which might also be useful with the C++/CLI support that is being brought to core).

However, it might not be worth the investment when you could reasonably achieve the same thing via manual mechanisms (considering it is likely a rare usecase).

@jkotas
Copy link
Member

jkotas commented Oct 18, 2019

users could explicitly do something like the below to create some an array of static readonly "constants" that is both pinned and aligned for the lifetime of the program, correct?

That would again mean redoing how statics are dealt with by the runtime. If we spent time going that refactoring, we can make this "just work" without any special hints.

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@msftgits msftgits added this to the Future milestone Jan 31, 2020
@BruceForstall BruceForstall added the JitUntriaged CLR JIT issues needing additional triage label Oct 28, 2020
@BruceForstall BruceForstall removed the JitUntriaged CLR JIT issues needing additional triage label Oct 30, 2022
@EgorBo
Copy link
Member

EgorBo commented Mar 5, 2023

Closing as completed, codegen for GetMask in Tier1:

G_M5242_IG01:              ;; offset=0000H
       C5F877               vzeroupper 
						;; size=3 bbWeight=1 PerfScore 1.00
G_M5242_IG02:              ;; offset=0003H
       C5F81001             vmovups  xmm0, xmmword ptr [rcx]
       C5F9FC0511000000     vpaddb   xmm0, xmm0, xmmword ptr [reloc @RWD00]
       C5F9D7C0             vpmovmskb eax, xmm0
						;; size=16 bbWeight=1 PerfScore 8.00
G_M5242_IG03:              ;; offset=0013H
       C3                   ret      
						;; size=1 bbWeight=1 PerfScore 1.00
RWD00  	dq	0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh

@EgorBo EgorBo closed this as completed Mar 5, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Apr 4, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

No branches or pull requests

10 participants