-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Arm64] Implement ASIMD widening, narrowing, saturating intrinsics #35612
[Arm64] Implement ASIMD widening, narrowing, saturating intrinsics #35612
Conversation
Tagging subscribers to this area: @tannergooding |
Note regarding the This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, to please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change. |
@CarolEidt @tannergooding PTAL |
@TamarChristinaArm PTAL |
f5004a2
to
97e7d77
Compare
Rebased on top of 1f436e0 |
if ((intrinsic == NI_AdvSimd_AddWideningUpper) || (intrinsic == NI_AdvSimd_SubtractWideningUpper)) | ||
{ | ||
assert(varTypeIsSIMD(op1->TypeGet())); | ||
retNode->AsHWIntrinsic()->SetOtherBaseType(getBaseTypeOfSIMDType(argClass)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do widening and narrowing need an "other" base type? The return type is always twice the size and of the same signedness of the inputs for these operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do widening and narrowing need an "other" base type?
Narrowing intrinsics do not need - only AddWideningUpper and SubractWideningUpper do.
They don't need the operand 1 type itself (which is always going to be TYP_SIMD16) but the operand 1 element type so I can distinguish between each pair of these three
Vector128<int> AddWideningUpper(Vector128<short> left, Vector128<short> right);
Vector128<short> AddWideningUpper(Vector128<short> left, Vector128<sbyte> right);
Vector128<int> AddWideningUpper(Vector128<int> left, Vector128<short> right);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't we just use the second argument for the base type? That is what HW_Flag_BaseTypeFromSecondArg
is for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh nevermind, I missed that int = short + short
and int = int + short
need to be differentiated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting - I renamed the gtIndexBaseType
to gtOtherBaseType
when I restructured the IR, because I figured there would be cases aside from gather where we'd need an additional base type - I didn't think it would be used that quickly :-)
assert(varTypeIsIntegral(intrin.baseType)); | ||
if (intrin.op1->TypeGet() == TYP_SIMD8) | ||
{ | ||
ins = varTypeIsUnsigned(intrin.baseType) ? INS_uaddl : INS_saddl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be able to get rid of this logic once we change the tables to take simdSize
into account during lookup, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can if we make simdSize and baseType of an intrinsic "independent" on each other - i.e. if we in addition to HW_Flag_BaseTypeFromFirstArg
, HW_Flag_BaseTypeFromSecondArg
we have HW_Flag_SimdSizeFromFirstArg
, HW_Flag_SimdSizeFromSecondArg
which would allow, for example, to set simdSize based on op1 and baseType from op2.
Then, the corresponding rows in hwintrinsiclistarm64.h could be updated as
HARDWARE_INTRINSIC(AdvSimd, AddWideningLower, 8, 2, {INS_saddl, INS_uaddl, … HW_Flag_SimdSizeFromFirstArg|HW_Flag_BaseTypeFromSecondArg)
HARDWARE_INTRINSIC(AdvSimd, AddWideningLower, 16, 2, {INS_saddw, INS_uaddw, … HW_Flag_SimdSizeFromFirstArg|HW_Flag_BaseTypeFromSecondArg)
HARDWARE_INTRINSIC(AdvSimd, AddWideningUpper, 8, 2, {INS_saddl2, INS_uaddl2, … HW_Flag_SimdSizeFromFirstArg|HW_Flag_BaseTypeFromSecondArg)
HARDWARE_INTRINSIC(AdvSimd, AddWideningUpper, 16, 2, {INS_saddw2, INS_uaddw2, … HW_Flag_SimdSizeFromFirstArg|HW_Flag_BaseTypeFromSecondArg)
In other words, if we allow x and y coordinates (where x is baseType and y is a tuple of (intrinsicId, simdSize)) of hwintrinsiclistarm64.h table to be orthogonal then the answer to your question is yes.
…tformNotSupported.cs
…ormNotSupported.cs
…ormNotSupported.cs
…hwintrinsiclistarm64.h
…intrinsiclistarm64.h
…wintrinsiclistarm64.h
…er in hwintrinsiclistarm64.h
…ingUpperAndAdd in hwintrinsiclistarm64.h
…in hwintrinsiclistarm64.h
…in hwintrinsiclistarm64.h
…intrinsiclistarm64.h
…rowUpper in hwintrinsiclistarm64.h
…pper in hwintrinsiclistarm64.h
…dedHighNarrowUpper in hwintrinsiclistarm64.h
… in hwintrinsiccodegenarm64.cpp
… value in hwintrinsiccodegenarm64.cpp
…nd SubtractWideningUpper in hwintrinsiccodegenarm64.cpp
…per and SubtractWideningUpper in hwintrinsic.cpp
97e7d77
to
417c253
Compare
Fixed merge conflicts and rebased on top of aa81328 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with some non-blocking comments & questions.
I only lightly reviewed the test helper functions and related changes.
if ((intrinsic == NI_AdvSimd_AddWideningUpper) || (intrinsic == NI_AdvSimd_SubtractWideningUpper)) | ||
{ | ||
assert(varTypeIsSIMD(op1->TypeGet())); | ||
retNode->AsHWIntrinsic()->SetOtherBaseType(getBaseTypeOfSIMDType(argClass)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting - I renamed the gtIndexBaseType
to gtOtherBaseType
when I restructured the IR, because I figured there would be cases aside from gather where we'd need an additional base type - I didn't think it would be used that quickly :-)
@@ -205,7 +205,11 @@ void CodeGen::genHWIntrinsic(GenTreeHWIntrinsic* node) | |||
emitAttr emitSize; | |||
insOpts opt = INS_OPTS_NONE; | |||
|
|||
if ((intrin.category == HW_Category_SIMDScalar) || (intrin.category == HW_Category_Scalar)) | |||
if (intrin.category == HW_Category_SIMDScalar) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes here highlight the fact that there are many subtleties with regard to emit sizes - the "actual type" of the baseType, the "raw" (emitTypeSize
) of the baseType, and the emitSize
of the node itself. I think it would be worth some additional comments both here and in the cases where we use emitTypeSize(node)
for moves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, I will follow up and add the comments
case NI_AdvSimd_AddWideningUpper: | ||
case NI_AdvSimd_SubtractWideningLower: | ||
case NI_AdvSimd_SubtractWideningUpper: | ||
GetEmitter()->emitIns_R_R_R(ins, emitSize, targetReg, op1Reg, op2Reg, opt); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is identical to case NI_Crc32_ComputeCrc32:
et al. Is there a reason you didn't combine it? Is the idea to keep them in order, and rely on the C++ compiler to de-duplicate the code? (I see there's some duplication already).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I just missed this. Thanks for spotting this.
I will follow up and de-duplicate this in a separate PR and I also need to address one of your another suggestions here
Both runtime (Mono Product Build Android x86 debug) and runtime (Mono Product Build OSX x64 debug) has succeeded on the second attempt - https://dev.azure.com/dnceng/public/_build/results?buildId=625278 - merging |
Implement API as approved in #32512 (with corrections regarding their ISA class - see my comment here)
Note that I am using the following temporary names as per comment:
I will update the methods names after the next round of API review.
The change is quite straightforward - the special handling is required only for AddWideningLower, AddWideningUpper, SubractWideningLower and SubtractWideningUpper since each of those are mapped to four different instructions (
saddl{2}
,saddw{2}
,uaddl{2}
anduaddw{2}
;ssubl{2}
,ssubw{2}
,usubl{2}
andusubw{2}
) and do not fit into our table-driven model.The change depends on removing HW_Flag_UnfixedSIMDSize in #35594 - I didn't want to keep adding the flag for the added intrinsics for no particular reason.Fixes #32512