-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding basic support for recognizing and handling SIMD intrinsics as HW intrinsics #35421
Adding basic support for recognizing and handling SIMD intrinsics as HW intrinsics #35421
Conversation
CC. @CarolEidt, @echesakovMSFT Will post a jit diff shortly. |
// Returns the codegen type for a given SIMD size. | ||
var_types getSIMDTypeForSize(unsigned size) | ||
static var_types getSIMDTypeForSize(unsigned size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also change GenTreeHWIntrinsic
to track the SimdType rather than the SimdSize, but it is a more involved change.
|
||
if ((ni > NI_SIMD_AS_HWINTRINSIC_START) && (ni < NI_SIMD_AS_HWINTRINSIC_END)) | ||
{ | ||
return impSimdAsHWIntrinsic(ni, clsHnd, method, sig, mustExpand); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the intrinsic isn't currently handled or ends up returning nullptr
, we will currently still hit the existing impSIMDIntrinsic
path and produce a GT_SIMD node later.
src/coreclr/src/jit/lowerxarch.cpp
Outdated
|
||
GenTree* op1 = node->gtGetOp1(); | ||
GenTree* op2 = node->gtGetOp2(); | ||
GenTree* op3 = nullptr; | ||
|
||
if (!HWIntrinsicInfo::SupportsContainment(intrinsicId)) | ||
if (!HWIntrinsicInfo::SupportsContainment(intrinsicId) || (simdSize == 8) || (simdSize == 12)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be able to support containment on simdSize == 12
if it is a local or one of the other cases we allocate 16-bytes of storage for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that morph will retype TYP_SIMD12
locals as TYP_SIMD16
where possible, so I think it's probably reasonable to assume it's not safe here (or fix the cases where we don't widen it if we should).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just noting, it is retyped but the simdSize
isn't changed, and even if node
is retyped, it doesn't mean op1
or op2
are retyped.
So it will require a few changes to get correct, but shouldn't be too difficult overall.
940e753
to
759df57
Compare
x64 Windows AVX2 Diff:
ARM64 AdvSIMD diff:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of initial comments. I've only skimmed through the simdashwintrinsic*
files. I'd like to see the impact on compile time and table sizes at some point.
src/coreclr/src/jit/lowerxarch.cpp
Outdated
|
||
GenTree* op1 = node->gtGetOp1(); | ||
GenTree* op2 = node->gtGetOp2(); | ||
GenTree* op3 = nullptr; | ||
|
||
if (!HWIntrinsicInfo::SupportsContainment(intrinsicId)) | ||
if (!HWIntrinsicInfo::SupportsContainment(intrinsicId) || (simdSize == 8) || (simdSize == 12)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that morph will retype TYP_SIMD12
locals as TYP_SIMD16
where possible, so I think it's probably reasonable to assume it's not safe here (or fix the cases where we don't widen it if we should).
759df57
to
c6c4005
Compare
What is the best way to collect this information? |
@tannergooding - For the tables, you can just look at file sizes. For JIT time, the best way is to use SuperPmi to compile a bunch of methods. SuperPmi is described here: https://github.com/dotnet/runtime/blob/master/src/coreclr/scripts/superpmi.md and I usually measure using pin. If it's a hassle I can probably measure for you. |
4072878
to
6d47adf
Compare
I should be able to collect the SuperPMI diffs, although I don't see anything related to As for file sizes:
We will naturally be able to gain some or all of this back as more get implemented and we can start removing the GT_SIMD path. I've listed the |
Looks like its because it expects something to be 226 bytes but it is actually 228 bytes. I'm guessing possibly the JIT/EE version changed which I believe means I'll need to do a new collection
There were a couple of |
Need to fix |
2b7c331
to
2169790
Compare
…h aren't intrinsic
I believe I've resolved the couple minor things I found and am collecting new diffs for the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe you've addressed all my questions and concerns. And I've looked over the recent changes.
3ef1601
to
7291546
Compare
Thanks @CarolEidt, I'm just working on ensuring the diffs are correct again now as there were a couple surprises with the |
7291546
to
4bdc51c
Compare
7127164
to
73f7315
Compare
…Expr in SimdAsHWIntrinsic
73f7315
to
b6494ee
Compare
Diff is back to expected and shows a few additional improvements.
|
Will wait for @echesakovMSFT to review before merging |
I am taking a look now, sorry for the wait |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, looks good - I left couple question/suggestions.
Do we need to update simdashwintrinsiclistarm64.h as further progress on Arm64 intrinsic are made?
@@ -207,7 +207,7 @@ void CodeGen::genHWIntrinsic(GenTreeHWIntrinsic* node) | |||
} | |||
else | |||
{ | |||
emitSize = EA_SIZE(node->gtSIMDSize); | |||
emitSize = emitActualTypeSize(Compiler::getSIMDTypeForSize(node->gtSIMDSize)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed? To support Vector3 on Arm64?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, its required to support Vector3
as it is size = 12
but actualSize = 16
} | ||
|
||
assert(id != NI_AVX_CompareGreaterThan); | ||
return static_cast<int>(FloatComparisonMode::OrderedLessThanSignaling); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be confusing to someone who doesn't know that we expect later in the JIT to swap the intrinsic arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, it would be clearer to leave these as special import intrinsics and move all the plumbing related to opportunisticallyDependsOnAVX to one place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can add a comment, but the entire point of moving it to lowering is so the rest of the JIT doesn't need to care that AVX
supports proper GreaterThan
while Pre-AVX
emulates it.
As we continue adding more JIT optimizations around HWIntrinsics, the distinction doesn't matter to anything except for codegen and so handling the fixup in lowering makes this trivial for everything else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will submit a follow up PR that adds the comment.
@@ -4143,7 +4143,7 @@ GenTree* Compiler::impIntrinsic(GenTree* newobjThis, | |||
case NI_System_MathF_FusedMultiplyAdd: | |||
{ | |||
#ifdef TARGET_XARCH | |||
if (compExactlyDependsOn(InstructionSet_FMA)) | |||
if (compExactlyDependsOn(InstructionSet_FMA) && supportSIMDTypes()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not related to this change, but I wonder if this optimization can be implemented without requiring to support SIMD types
This makes progress towards #956 and is based on the prototype discussed and implemented here: #9766 (comment)
Rather than reimplementing the SIMD intrinsics in managed code or duplicating a lot of the HWIntrinsic support for containment and the VEX encoding on the SIMD intrinsics, this merely recognizes the SIMD Intrinsics in importation via an alternative path and replaces them with equivalent HWIntrinsic nodes.
This allows the SIMD intrinsics to freely get support for features that have already been added to the HWIntrinsics feature such as being VEX aware, supporting containment, and other minor optimizations that have been made.
This does not cover all of the SIMD intrinsics yet, but does lay a foundational framework for the remaining intrinsics to be ported as well. It does cover x86, x64, and ARM64.