-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARM64-SVE: refactor lsra buildHWIntrinsic #107459
base: main
Are you sure you want to change the base?
Conversation
@kunalspathak - Still WIP. For now ignore anything outside of |
This PR is ready now. Requires #107084, #107180 and a workaround for #107537 in order for all the hwintrinsic tests to pass. Apologies, this is a large change to review, and the github diff is confused about functions I haven't touched. Probably best starting a review from the new version of I recommend this is not merged until after we've gone past the Net9 RC2 deadline. @dotnet/arm64-contrib I'll do a spmidiff next. |
This looks like it's all to tldr: there is a bug in HEAD where Long version: There are multiple versions of
In HEAD, for the multiple register versions,
Note that For the single register variant,
This is wrong - it should be using In my PR, |
In addition, That happens in:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks much cleaner. We should also run jitstress
and other outerloop legs before merging. I can do it once we are done with addressing the feedback.
assert(candidates == RBM_NONE); | ||
|
||
// Some operands have consective op which is also a delay free op | ||
srcCount += BuildConsecutiveRegistersForUse(operand, delayFreeOp); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also seems to call buildInternalRegisterUses()
for consecutive registers. Are we missing it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also for NI_AdvSimd_VectorTableLookupExtension()
, we call like this. We should double check that logic in new code.
- BuildConsecutiveRegistersForUse
- buildInternalRegisterUses
- BuildDef
- buildInternalRegisterUses
- BuildDef
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also seems to call
buildInternalRegisterUses()
for consecutive registers. Are we missing it here?
All intrinsics will call buildInternalRegisterUses()
after the for
loop, before building the destination.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also for
NI_AdvSimd_VectorTableLookupExtension()
, we call like this. We should double check that logic in new code.
- BuildConsecutiveRegistersForUse
- buildInternalRegisterUses
- BuildDef
- buildInternalRegisterUses
- BuildDef
In old code:
- op1 - BuildUse (because tgtPrefUse is set, which is because isRMW)
- op2 - BuildConsecutiveRegistersForUse
- op3 - BuildDelayFreeUses
- buildInternalRegisterUses
- BuildDef
In new code:
- delay free = op1 (because isRMW)
- addr = nullptr
- consecutive = op2
- dest consecutive = false
- embedded = nullptr
- BuildHWIntrinsicImmediate (which is a nop)
- op1 - BuildUse (because delayFreeOp == op1)
- op2 - BuildConsecutiveRegistersForUse (because consecutive == op2)
- op3 - BuildDelayFreeUses (because delay free != nullptr)
- buildInternalRegisterUses
- BuildDef
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're missing the thing I always miss....
Also for
NI_AdvSimd_VectorTableLookupExtension()
, we call like this. We should double check that logic in new code.
- BuildConsecutiveRegistersForUse
- buildInternalRegisterUses
- BuildDef
there is a return srcCount;
here (line 1934)
Which means it never does:
- buildInternalRegisterUses
- BuildDef
Got some asmdiffs for the SVE tests. Spotted two differences, and one of them is due to issues in HEAD. I'll raise PRs to fix these (plus one for LoadAndInsertScalar), and then rebase this once merged. I'd like there to be no asmdiff differences in this PR
|
Change-Id: Id60f884b7281a9fae85a948a361511656c91357e
Rebased on top of the other fixes. As mentioned in #107786, fixed it so that |
No asm diffs now:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some minor comments.
|
||
if (resultOpNum != 0) | ||
{ | ||
embeddedDelayFreeOp = embeddedOpNode->Op(resultOpNum); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hhm, can you confirm if we just overwrite the embeddedDelayFreeOp
that was passed as the parameter to this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hhm, can you confirm if we just overwrite the
embeddedDelayFreeOp
that was passed as the parameter to this function?
Yes. the value being passed into BuildEmbeddedOperandUses()
is the address of embeddedDelayFreeOp
. That value can be changed within BuildEmbeddedOperandUses()
and it won't effect any variables outside BuildEmbeddedOperandUses()
that were also same to the same value.
I think that's what you were asking for.
All fixed up as suggested. |
Looks like the reason for this is that in HEAD, assert(isRMW);
assert(intrin.op1->OperIs(GT_FIELD_LIST));
GenTreeFieldList* op1 = intrin.op1->AsFieldList();
assert(compiler->info.compNeedsConsecutiveRegisters);
for (GenTreeFieldList::Use& use : op1->Uses())
{
BuildDelayFreeUses(use.GetNode(), intrinsicTree);
srcCount++;
} My patch generates srcCount += BuildConsecutiveRegistersForUse(operand, delayFreeOp);
AIUI, |
With this patch there are no diffs again. It would be shame to put something so specific in this patch.
|
Fixes #104842
The logic for hwintrisics has become convoluted. Refactor it, for both SVE and AdvSimd.
Add functions to get the operand (if any) for each requirement - delay slot, consecutive registers, address, etc.
Then use a simple for loop to iterate through each operand and build depending on which requirements match for that operand.
Tested by using stress_test.py on the entire HardwareIntrinsics_Arm set.