-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add VPCLMULQDQ intrinsics #109137
Add VPCLMULQDQ intrinsics #109137
Conversation
Note regarding the
|
Note regarding the
|
The existing modeling for ISA support presents a challenge in that JIT wants to see a 1:1 mapping between ISA and implementing class, but the actual ISAs are represented by a combination of flags. For my first attempt, I have virtualized this in JIT similarly to the way that e.g. Looking at some of the existing implementations, I see that the fake _VL ISAs are leaking into the I'd appreciate some guidance if there's a better way to handle this scenario since there will be more like this. |
@MichalStrehovsky could I trouble you to look at this? JIT side is working, but I've left it as draft for now because the NAOT leg is failing due to an assert on the (intentionally) unexposed fake ISA.
I'm not sure how much work 2) is in the end, or whether that's something the runtime team cares about The currently broken code I mentioned is runtime/src/coreclr/tools/Common/JitInterface/CorInfoInstructionSet.cs Lines 1896 to 1900 in e70aaa8
because that generated method is matching on managed type name, and there's no I guess there's also option 3) Do it the ugly way for now and hope somebody cleans it up later... |
I don't have much guidance to offer, sorry. Things got a lot more complicated since I last touched any of this (when all we had was AVX2) and I haven't exactly been keeping track of it. E.g. I don't know why _VL instructions are fake and whether we intentionally want/or do not want to support them as Ideally RyuJIT implementation details shouldn't leak out into the managed parts of the compiler or R2R file format, so if RyuJIT needs something fake to operate, it ideally shouldn't burden other components (because then the owners of said component who don't know about RyuJIT implementation details and don't know much about hardware intrinsics in general either have no clue about what's going on). But maybe it's necessary, I don't know. We pulled these RyuJIT implementation details from cpufeatures.h in the past, maybe they can be pulled from more places. @tannergooding and @davidwrighton might have more of an opinion. |
Fair enough. Thanks for the reply anyway. For background, the reason the These newer intersection-style ISAs are more problematic because 1) they don't fall under a well-known x86-64 version set and 2) they actually do exist in hardware independent of each other. For example, Skylake-X implements PCLMULQDQ+AVX512F but not VPCLMULQDQ, Alder Lake implements PCLMULQDQ+VPCLMULQDQ but not AVX512F, and Zen 4 and 5 implement the full set. I'm hoping we can arrive at a better solution for them that also happens to clean up the handling of the ISAs that are already implemented. |
Thanks for the explanation! I agree that given all this, we should ideally not expose _VL as something people can specify on the command line. |
It's spread out in a few places, I think the most recent was in: #103241 (comment) There's a bit of a balance overall between modeling what the CPU exposes (irrespective of the implementation) and modeling something reasonable for users to consume and handle. For With With With So, I think what we want is we need to do is always have virtual instruction sets for any managed exposed ISA class (such as I think what you currently have in the PR roughly models that. We have |
Thanks, Tanner. I think if the decision is to change up the handling of the virtual ISAs in general, that's probably better done in a separate PR, which leads me to believe maybe the best path here is to go ahead and follow the existing pattern for now, and clean them all up later. I've made the required changes to ThunkGenerator to fix the |
src/coreclr/tools/Common/JitInterface/ThunkGenerator/InstructionSetDesc.txt
Show resolved
Hide resolved
OK, this is ready for another review pass. All feedback addressed and updated tests passing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CC. @dotnet/jit-contrib for secondary review
ping @dotnet/jit-contrib for secondary review |
AFAIK methods on the nested X64/Arm64 classes shouldn't be considered intrinsics on 32bit platforms since they are as relevant as e.g. WASM intrinsics. This should fix widespread runtime-nativeaot-outerloop failure on x86. I think this regressed in dotnet#109137.
AFAIK methods on the nested X64/Arm64 classes shouldn't be considered intrinsics on 32bit platforms since they are as relevant as e.g. WASM intrinsics. This should fix widespread runtime-nativeaot-outerloop failure on x86. I think this regressed in #109137.
AFAIK methods on the nested X64/Arm64 classes shouldn't be considered intrinsics on 32bit platforms since they are as relevant as e.g. WASM intrinsics. This should fix widespread runtime-nativeaot-outerloop failure on x86. I think this regressed in dotnet#109137.
* add vpclmulqdq intrinsics * add missing break * add alternate instruction def for evex encoding * rename instruction * whitespace * re-run thunk generator * fix AOT instruction sets * address feedback * apply formatting patch * address feedback round 2 * add missing brace * fix smoketest expected results * fix suffix order * handle implied V512 support in AOT * remove more unnecessary X64 ISA variants --------- Co-authored-by: Tanner Gooding <tagoo@outlook.com>
AFAIK methods on the nested X64/Arm64 classes shouldn't be considered intrinsics on 32bit platforms since they are as relevant as e.g. WASM intrinsics. This should fix widespread runtime-nativeaot-outerloop failure on x86. I think this regressed in dotnet#109137.
Fixes #95772
This is one of several similar new ISAs, where an existing ISA (PCLMULQDQ) was extended to 256-bit with one cpuid flag (VPCLMULQDQ) and then to 512-bit when combined with AVX-512 (VPCLMULQDQ+AVX512F) support.