Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix LSRA handling of arm32 double registers and spilling #94947

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion src/coreclr/jit/codegencommon.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -4062,7 +4062,7 @@ void CodeGen::genFnPrologCalleeRegArgs(regNumber xtraReg, bool* pXtraRegClobbere
// Don't process the double until both halves of the destination are clear.
if (genActualType(destMemType) == TYP_DOUBLE)
{
assert((destMask & RBM_DBL_REGS) != 0);
assert((destMask & RBM_ALLDOUBLE) != 0);
destMask |= genRegMask(REG_NEXT(destRegNum));
}
#endif
Expand Down
2 changes: 1 addition & 1 deletion src/coreclr/jit/compiler.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -3250,7 +3250,7 @@ __forceinline regMaskTP genMapArgNumToRegMask(unsigned argNum, var_types type)
#ifdef TARGET_ARM
if (type == TYP_DOUBLE)
{
assert((result & RBM_DBL_REGS) != 0);
assert((result & RBM_ALLDOUBLE) != 0);
result |= (result << 1);
}
#endif
Expand Down
2 changes: 1 addition & 1 deletion src/coreclr/jit/emitarm.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -6934,7 +6934,7 @@ void emitter::emitDispReg(regNumber reg, emitAttr attr, bool addComma)
else
{
assert(regIndex < 100);
printf("d%c%c", (regIndex / 10), (regIndex % 10));
printf("d%c%c", (regIndex / 10) + '0', (regIndex % 10) + '0');
}
}
else
Expand Down
17 changes: 15 additions & 2 deletions src/coreclr/jit/lsra.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -7946,8 +7946,9 @@ regNumber LinearScan::getTempRegForResolution(BasicBlock* fromBlock,
#ifdef TARGET_ARM
if (type == TYP_DOUBLE)
{
// Exclude any doubles for which the odd half isn't in freeRegs.
freeRegs = freeRegs & ((freeRegs << 1) & RBM_ALLDOUBLE);
// Exclude any doubles for which the odd half isn't in freeRegs,
// and restrict down to just the even part of the even/odd pair.
freeRegs &= (freeRegs & RBM_ALLDOUBLE_HIGH) >> 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For better understanding this concept, should we create a function or macro something like:

#define RBM_ALLDOUBLE_EVEN(mask) (mask & RBM_ALLDOUBLE_HIGH) >> 1

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm... that seems very confusing to me. RBM_ALLDOUBLE already is the even or "low" registers of the pairs.

This change fixes a bug I noticed just by inspection, and leads to some diffs. I'm still investigating, but I also found some pre-existing corruption in double register output in double register disasm, so I'm continuing down the rabbit hole...

}
#endif

Expand Down Expand Up @@ -12457,6 +12458,18 @@ regMaskTP LinearScan::RegisterSelection::select(Interval* currentInterval,
regMaskTP busyRegs = linearScan->regsBusyUntilKill | linearScan->regsInUseThisLocation;
candidates &= ~busyRegs;

#ifdef TARGET_ARM
// For TYP_DOUBLE on ARM, we can only use an even floating-point register for which the odd half
// is also available. Thus, for every busyReg that is an odd floating-point register, we need to
// remove from candidates the corresponding even floating-point register. For example, if busyRegs
// contains `f3`, we need to remove `f2` from the candidates for a double register interval. The
// clause below creates a mask to do this.
if (currentInterval->registerType == TYP_DOUBLE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If busyRegs does not contain both registers of the pairs, a more appropriate fix would be to find place where we miss setting 1 of the register and fix it. When we set regMask in regsBusyUntilKill or regsInUseThisLocation, we mostly use getRegMask which takes into account the "double register" as well.

runtime/src/coreclr/jit/lsra.h

Lines 1740 to 1746 in 740ecd0

#ifdef TARGET_ARM
if (regType == TYP_DOUBLE)
{
assert(genIsValidDoubleReg(reg));
regMask |= (regMask << 1);
}
#endif // TARGET_ARM

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The model used for arm32 doubles is possibly inconsistent: it seems like only the low register of the pair is set in some (most) cases, whereas in getRegMask both are set (IMO, that function has poorly named given genRegMask exists). Also, it's not clear when LSRA should use genRegMask versus getRegMask.

However, here, busyRegs is correct: only one of two of a double pair are busy. But because we're asking for a double pair, we need to avoid it if the pair is not fully available.

The example is that when we're here allocating a double, the candidates are all the even registers (the odd registers are not candidates, although when we allocate an even register, we implicitly also allocate the odd register). So, say f2 is a candidate, and f3 is a busy register. we need to remove f2 from the candidate set due to f3 being busy.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

genRegMask versus getRegMask

I think we should use getRegMask(reg, var_type) version and delete genRegMask(reg, var_type). They are doing same thing.

So, say f2 is a candidate, and f3 is a busy register. we need to remove f2 from the candidate set due to f3 being busy.

Correct, thats what I expect, but it should happen at place that added f3 as busy register. Do you know which of regsBusyUntilKill or regsInUseThisLocation does it gets marked as busy? The change looks correct, but I want to make sure we do not accidently mark a register (probably in jitstressregs) not part of current even/odd pair as busy.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The case was regsBusyUntilKill.

but it should happen at place that added f3 as busy register

If f3 is marked as busy because it is a float (single-precision) type, then you wouldn't want to mark f2 busy at that point. It's only when we're looking for candidates for a double that we need to mask off both f2 and f3 if f3 is already marked as busy.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mind sharing the repro or the JitDump file?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sent a link to JitDump offline.

You can also repro locally using:

py src\coreclr\scripts\superpmi.py replay -f coreclr_tests -arch x86 -target_arch arm -target_os linux -jitoption JitOptRepeat#* -jitoption JitOptRepeatCount#2 -jitoption JitStressRegs#3

And looking for:

Assertion failed '!isRegBusy(physRegRecord->regNum, current->registerType)'

(for me, MC#459399)

(I’ve fixed the other asserts in #94250)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to repro and see how f3 ends up in regsBusyUntilKill. I am ok with this fix, but can we add assert((busyRegs & 0x555555550000) == RBM_NONE) which will confirm at least that we never have f2 in this mask and hence we don't accidently remove f1 from the mask?

{
candidates &= ~((busyRegs & RBM_ALLDOUBLE_HIGH) >> 1);
}
#endif // TARGET_ARM

#ifdef DEBUG
inUseOrBusyRegsMask |= busyRegs;
#endif
Expand Down
2 changes: 2 additions & 0 deletions src/coreclr/jit/targetarm.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -24,4 +24,6 @@ const regNumber fltArgRegs [] = {REG_F0, REG_F1, REG_F2, REG_F3, REG_F4, REG_F5,
const regMaskTP fltArgMasks[] = {RBM_F0, RBM_F1, RBM_F2, RBM_F3, RBM_F4, RBM_F5, RBM_F6, RBM_F7, RBM_F8, RBM_F9, RBM_F10, RBM_F11, RBM_F12, RBM_F13, RBM_F14, RBM_F15 };
// clang-format on

static_assert_no_msg(RBM_ALLDOUBLE == (RBM_ALLDOUBLE_HIGH >> 1));

#endif // TARGET_ARM
6 changes: 5 additions & 1 deletion src/coreclr/jit/targetarm.h
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,11 @@
#define RBM_ALLFLOAT (RBM_FLT_CALLEE_SAVED | RBM_FLT_CALLEE_TRASH)
#define RBM_ALLDOUBLE (RBM_F0|RBM_F2|RBM_F4|RBM_F6|RBM_F8|RBM_F10|RBM_F12|RBM_F14|RBM_F16|RBM_F18|RBM_F20|RBM_F22|RBM_F24|RBM_F26|RBM_F28|RBM_F30)

// Double registers on ARM take two registers in odd/even pairs, e.g. f0 and f1, f2 and f3, etc. Mostly, we refer
// to double registers by their low, even, part. Sometimes we need to know that the high, odd, part is unused in
// a bitmask, say, hence we have this definition.
#define RBM_ALLDOUBLE_HIGH (RBM_F1|RBM_F3|RBM_F5|RBM_F7|RBM_F9|RBM_F11|RBM_F13|RBM_F15|RBM_F17|RBM_F19|RBM_F21|RBM_F23|RBM_F25|RBM_F27|RBM_F29|RBM_F31)

#define REG_VAR_ORDER REG_R3,REG_R2,REG_R1,REG_R0,REG_R4,REG_LR,REG_R12,\
REG_R5,REG_R6,REG_R7,REG_R8,REG_R9,REG_R10

Expand Down Expand Up @@ -281,7 +286,6 @@

#define RBM_ARG_REGS (RBM_ARG_0|RBM_ARG_1|RBM_ARG_2|RBM_ARG_3)
#define RBM_FLTARG_REGS (RBM_F0|RBM_F1|RBM_F2|RBM_F3|RBM_F4|RBM_F5|RBM_F6|RBM_F7|RBM_F8|RBM_F9|RBM_F10|RBM_F11|RBM_F12|RBM_F13|RBM_F14|RBM_F15)
#define RBM_DBL_REGS RBM_ALLDOUBLE

extern const regNumber fltArgRegs [MAX_FLOAT_REG_ARG];
extern const regMaskTP fltArgMasks[MAX_FLOAT_REG_ARG];
Expand Down