-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: Enabled embedded broadcast for binary ops #87946
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsThis PR provides the embedded broadcast support for binary ops, to be more specific, ops using the genHWIntrinsic_R_R_RM path. Including the following ops:
|
Hi @tannergooding , the PR should be ready for review, would you please take a look at it?
|
windows x64Diffs are based on 1,627,004 contexts (467,427 MinOpts, 1,159,577 FullOpts). MISSED contexts: 2,227 (0.14%) Overall (+2,406 bytes)
MinOpts (+383 bytes)
FullOpts (+2,023 bytes)
|
Fails should be irrelevant. We had regression of about 2,400 bytes in the code size introduced by the changes. This should be expected as the regression is the conversion from VEX encoding to EVEX encoding when enabling embedded broadcast. Do we accept the results? |
The results look good/within reason to me. There are two important bits to call out... First this change, in the typical case, trades a slight increase in code size (typically +2-bytes) for a decrease in data size (going from Second, SPMI diffs are not currently tracking/including the data size allocations as part of "bytes of code" metric. |
src/coreclr/jit/lowerxarch.cpp
Outdated
@@ -8138,11 +8138,18 @@ bool Lowering::IsContainableHWIntrinsicOp(GenTreeHWIntrinsic* parentNode, GenTre | |||
case NI_AVX2_BroadcastScalarToVector256: | |||
case NI_AVX512F_BroadcastScalarToVector512: | |||
{ | |||
var_types baseType = hwintrinsic->GetSimdBaseType(); | |||
if (baseType == TYP_BYTE || baseType == TYP_UBYTE || baseType == TYP_SHORT || baseType == TYP_USHORT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is better handled as
if (baseType == TYP_BYTE || baseType == TYP_UBYTE || baseType == TYP_SHORT || baseType == TYP_USHORT) | |
if (varTypeIsSmall(baseType)) |
CC. @dotnet/jit-contrib for secondary review. |
superpmi-replay and superpmi-diffs pipelines are failing with messages like:
I wonder if the JIT is crashing (not asserting) with this PR? |
First time seeing this fail in this PR, the recent changes might not cause the crashing, I will rebase the PR and re-run the CI again to confirm. |
src/coreclr/jit/hwintrinsic.h
Outdated
@@ -598,6 +598,23 @@ struct HWIntrinsicInfo | |||
#endif | |||
} | |||
|
|||
#if defined(TARGET_XARCH) | |||
static bool IsBitwiseOperation(instruction ins) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like an odd location for this function. All the other functions here take a NamedIntrinsic
. Should this instead be in emitxarch.h/cpp like IsMovInstruction
or HasKMaskRegisterDest
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved the function into emitxarch.h/cpp as a static function.
src/coreclr/jit/instrsxarch.h
Outdated
@@ -399,7 +399,6 @@ INST3(aesenclast, "aesenclast", IUM_WR, BAD_CODE, BAD_CODE, | |||
INST3(aesimc, "aesimc", IUM_WR, BAD_CODE, BAD_CODE, SSE38(0xDB), INS_TT_NONE, REX_WIG | Encoding_VEX) // Perform the AES InvMixColumn Transformation | |||
INST3(aeskeygenassist, "aeskeygenassist", IUM_WR, BAD_CODE, BAD_CODE, SSE3A(0xDF), INS_TT_NONE, REX_WIG | Encoding_VEX) // AES Round Key Generation Assist | |||
INST3(pclmulqdq, "pclmulqdq" , IUM_WR, BAD_CODE, BAD_CODE, SSE3A(0x44), INS_TT_NONE, Input_64Bit | REX_WIG | Encoding_VEX | INS_Flags_IsDstDstSrcAVXInstruction) // Perform a carry-less multiplication of two quadwords | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: no need to delete this empty line?
There are similar failures in multiple PRs from what I've seen recently. Noting that SPMI diffs is faulting in Python when processing the diffs summary:
Replay is similarly failing with things like:
Did we recently get a JIT/EE version change and potentially not have up to date baselines? |
superpmi.py should gracefully handle that. But I think that's a symptom that superpmi.exe crashed.
If the GUID changed, we wouldn't have this problem because we would only pick up matched MCH files. If someone changed the interface without changing the GUID, then anything is possible. |
I kicked off runs on https://dev.azure.com/dnceng-public/public/_build/results?buildId=331428&view=results |
@Ruihan-Yin The superpmi pipelines are now running clean. I would suggest to merge up to HEAD and re-push your PR to trigger re-run of these pipelines. |
Thanks for the notification, will resolve the reviews and rebase the branch soon. |
and, andn, or, xor, min, max, div, mul, mull, sub, variable shiftleftlogical/rightarithmetic/rightlogical
when embedded broadcast is actually enabled
There are cases when broadcast node are falsely contained by a embedded broadcast compatible node, while the data type is actually not supported Adding extra logics to avoid this situation.
instructions with either long or ulong as basetype should be reset to qword instructions.
make the typecheck based on broadcast node it self.
use `varTypeIsSmall` type check to cover all the unsupported data type in embedded broadcast.
1. put the IsBitwiseInstruction to a proper place. 2. nit: restored unnecessary line delete.
Hi @BruceForstall, some tests are cancelled for some reasons, I suppose it is not relevant to the changes in this PR, shall I re-run the test again? And can you please review the new changes, if any. Thanks! |
@@ -1230,6 +1230,30 @@ void CodeGen::inst_RV_RV_TT( | |||
if (IsEmbBroadcast) | |||
{ | |||
instOptions = INS_OPTS_EVEX_b; | |||
if (emitter::IsBitwiseInstruction(ins) && varTypeIsLong(op2->AsHWIntrinsic()->GetSimdBaseType())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: do we need IsBitwiseInstruction
at all since we have a switch over ins anyway? (we can just remove the unreached
in debug
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly not. I'll let Ruihan-Yin fix this in a follow up PR if they want to, that way we can land the important bit before the snap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback, we will fix this point in the following PR on embedded broadcast.
This PR provides the embedded broadcast support for binary ops, to be more specific, ops using the genHWIntrinsic_R_R_RM path.
Including the following ops:
and, andn, or, xor,
min, max,
div, mul, mull, sub,
variable shiftleftlogical/rightarithmetic/rightlogical