Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix LSRA handling of arm32 double registers and spilling #94947

Merged

Conversation

BruceForstall
Copy link
Member

If LSRA uses regsBusyUntilKill to eliminate registers from consideration for spilling, for spilling arm32 double type registers, it needs to remove the even registers from the candidates. For example, if regsBusyUntilKill contains f3, candidates also needs to remove f2, since double register candidates only contains the even registers to represent to pair of registers for an arm32 double.

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Nov 18, 2023
@ghost ghost assigned BruceForstall Nov 18, 2023
@ghost
Copy link

ghost commented Nov 18, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

If LSRA uses regsBusyUntilKill to eliminate registers from consideration for spilling, for spilling arm32 double type registers, it needs to remove the even registers from the candidates. For example, if regsBusyUntilKill contains f3, candidates also needs to remove f2, since double register candidates only contains the even registers to represent to pair of registers for an arm32 double.

Author: BruceForstall
Assignees: BruceForstall
Labels:

area-CodeGen-coreclr

Milestone: -

@BruceForstall
Copy link
Member Author

/azp run runtime-coreclr outerloop, runtime-coreclr jitstress, runtime-coreclr libraries-jitstress

Copy link

Azure Pipelines successfully started running 3 pipeline(s).

@BruceForstall
Copy link
Member Author

/azp run runtime-coreclr superpmi-diffs

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@BruceForstall
Copy link
Member Author

All the test failures are known or infra

@BruceForstall
Copy link
Member Author

fyi, I found this testing JitOptRepeat.

@BruceForstall
Copy link
Member Author

@kunalspathak PTAL
cc @dotnet/jit-contrib

@@ -12457,6 +12457,14 @@ regMaskTP LinearScan::RegisterSelection::select(Interval* currentInterval,
regMaskTP busyRegs = linearScan->regsBusyUntilKill | linearScan->regsInUseThisLocation;
candidates &= ~busyRegs;

#ifdef TARGET_ARM
// For TYP_DOUBLE on ARM, we can only use register for which the odd half is also available.
if (currentInterval->registerType == TYP_DOUBLE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If busyRegs does not contain both registers of the pairs, a more appropriate fix would be to find place where we miss setting 1 of the register and fix it. When we set regMask in regsBusyUntilKill or regsInUseThisLocation, we mostly use getRegMask which takes into account the "double register" as well.

runtime/src/coreclr/jit/lsra.h

Lines 1740 to 1746 in 740ecd0

#ifdef TARGET_ARM
if (regType == TYP_DOUBLE)
{
assert(genIsValidDoubleReg(reg));
regMask |= (regMask << 1);
}
#endif // TARGET_ARM

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The model used for arm32 doubles is possibly inconsistent: it seems like only the low register of the pair is set in some (most) cases, whereas in getRegMask both are set (IMO, that function has poorly named given genRegMask exists). Also, it's not clear when LSRA should use genRegMask versus getRegMask.

However, here, busyRegs is correct: only one of two of a double pair are busy. But because we're asking for a double pair, we need to avoid it if the pair is not fully available.

The example is that when we're here allocating a double, the candidates are all the even registers (the odd registers are not candidates, although when we allocate an even register, we implicitly also allocate the odd register). So, say f2 is a candidate, and f3 is a busy register. we need to remove f2 from the candidate set due to f3 being busy.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

genRegMask versus getRegMask

I think we should use getRegMask(reg, var_type) version and delete genRegMask(reg, var_type). They are doing same thing.

So, say f2 is a candidate, and f3 is a busy register. we need to remove f2 from the candidate set due to f3 being busy.

Correct, thats what I expect, but it should happen at place that added f3 as busy register. Do you know which of regsBusyUntilKill or regsInUseThisLocation does it gets marked as busy? The change looks correct, but I want to make sure we do not accidently mark a register (probably in jitstressregs) not part of current even/odd pair as busy.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The case was regsBusyUntilKill.

but it should happen at place that added f3 as busy register

If f3 is marked as busy because it is a float (single-precision) type, then you wouldn't want to mark f2 busy at that point. It's only when we're looking for candidates for a double that we need to mask off both f2 and f3 if f3 is already marked as busy.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mind sharing the repro or the JitDump file?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sent a link to JitDump offline.

You can also repro locally using:

py src\coreclr\scripts\superpmi.py replay -f coreclr_tests -arch x86 -target_arch arm -target_os linux -jitoption JitOptRepeat#* -jitoption JitOptRepeatCount#2 -jitoption JitStressRegs#3

And looking for:

Assertion failed '!isRegBusy(physRegRecord->regNum, current->registerType)'

(for me, MC#459399)

(I’ve fixed the other asserts in #94250)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to repro and see how f3 ends up in regsBusyUntilKill. I am ok with this fix, but can we add assert((busyRegs & 0x555555550000) == RBM_NONE) which will confirm at least that we never have f2 in this mask and hence we don't accidently remove f1 from the mask?

@BruceForstall
Copy link
Member Author

@kunalspathak I updated the comment

@kunalspathak
Copy link
Member

@kunalspathak I updated the comment

can you also explain the implication of >> 1 if we have f2 in the mask as we spoke offline?

freeRegs = freeRegs & ((freeRegs << 1) & RBM_ALLDOUBLE);
// Exclude any doubles for which the odd half isn't in freeRegs,
// and restrict down to just the even part of the even/odd pair.
freeRegs &= (freeRegs & RBM_ALLDOUBLE_HIGH) >> 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For better understanding this concept, should we create a function or macro something like:

#define RBM_ALLDOUBLE_EVEN(mask) (mask & RBM_ALLDOUBLE_HIGH) >> 1

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm... that seems very confusing to me. RBM_ALLDOUBLE already is the even or "low" registers of the pairs.

This change fixes a bug I noticed just by inspection, and leads to some diffs. I'm still investigating, but I also found some pre-existing corruption in double register output in double register disasm, so I'm continuing down the rabbit hole...

If LSRA uses `regsBusyUntilKill` to eliminate registers from consideration
for spilling, for spilling arm32 double type registers, it needs to
remove the even registers from the candidates. For example, if
`regsBusyUntilKill` contains `f3`, candidates also needs to remove `f2`,
since double register candidates only contains the even registers to
represent to pair of registers for an arm32 double.
register part of ARM double registers even/odd pairs.
@BruceForstall BruceForstall force-pushed the FixLsraArm32DoubleSpillCandidates branch from 76f0dff to ef25ee6 Compare November 22, 2023 22:09
@BruceForstall
Copy link
Member Author

I fixed a bug with disasm output of arm double registers d0-d15.

There are register allocation diffs due to the getTempRegForResolution fix because previously we would never consider d0 free (there may be other cases as well).

@kunalspathak PTAL

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@BruceForstall BruceForstall merged commit 63e07f5 into dotnet:main Nov 26, 2023
129 checks passed
@BruceForstall BruceForstall deleted the FixLsraArm32DoubleSpillCandidates branch November 26, 2023 16:43
@github-actions github-actions bot locked and limited conversation to collaborators Dec 27, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants