Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support fast tailcalls in R2R #56669

Merged
merged 28 commits into from
Oct 7, 2021
Merged
Show file tree
Hide file tree
Changes from 23 commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
7137494
Support fast tailcalls in R2R
jakobbotsch Jul 31, 2021
0efbbc4
Support ARM64
jakobbotsch Jul 31, 2021
5b0127e
Run jit-format
jakobbotsch Jul 31, 2021
2b360c8
Fix non-R2R ARM build
jakobbotsch Jul 31, 2021
ae8fd46
Fix recursive-call-to-loop optimization with non-standard args
jakobbotsch Jul 31, 2021
e45f0c5
Implement new delay load helper for fast tailcalls
jakobbotsch Aug 1, 2021
7f21f31
Minor changes and fix build break
jakobbotsch Aug 1, 2021
5ce1e65
Switch to a define for tailcall register
jakobbotsch Aug 1, 2021
81268fa
Fix x86
jakobbotsch Aug 2, 2021
8e15c1c
Implement DefType.IsUnsafeValueType
jakobbotsch Aug 2, 2021
b971fce
Emit rex.jmp for tailcall jumps on x64
jakobbotsch Aug 2, 2021
483b34a
Refactor non standard args + refix recursive tailcall opt
jakobbotsch Aug 2, 2021
b533f4b
Set nonStandardArgKind for stack args also
jakobbotsch Aug 4, 2021
1c0ede7
Merge remote-tracking branch 'upstream/main' into cg2-fast-tailcalls
jakobbotsch Aug 29, 2021
424d538
Regenerate JIT interface
jakobbotsch Aug 30, 2021
24cfa6f
Merge remote-tracking branch 'upstream/main' into cg2-fast-tailcalls
jakobbotsch Sep 4, 2021
0968851
Improve recursion-to-loop decision about which args need to be reassi…
jakobbotsch Sep 4, 2021
91a665a
Merge remote-tracking branch 'upstream/main' into cg2-fast-tailcalls
jakobbotsch Sep 24, 2021
b345db7
Use INS_tail_i_jmp for func token indir
jakobbotsch Sep 24, 2021
0816d52
More efficient arm64 VSD fast tailcalls, fix some bad merging
jakobbotsch Sep 24, 2021
863ad4d
Take R2R indirection into account for tail call profitability
jakobbotsch Sep 24, 2021
0778263
Disallow tailcalls via JIT helper in R2R builds
jakobbotsch Sep 24, 2021
92cd381
Merge remote-tracking branch 'upstream/main' into cg2-fast-tailcalls
jakobbotsch Sep 24, 2021
16e32b0
Revert "Take R2R indirection into account for tail call profitability"
jakobbotsch Sep 24, 2021
d5729a7
Add SPMI support, clean up mcPackets enum
jakobbotsch Sep 25, 2021
b8ad2ba
Merge remote-tracking branch 'upstream/main' into cg2-fast-tailcalls
jakobbotsch Sep 26, 2021
9f70cc4
Fix conflicts
jakobbotsch Oct 1, 2021
ad78caf
Take necessary conditions into account in canTailCall
jakobbotsch Oct 5, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -1776,6 +1776,11 @@ bool interceptor_ICJI::getTailCallHelpers(
return result;
}

void interceptor_ICJI::updateEntryPointForTailCall(CORINFO_CONST_LOOKUP* entryPoint)
{
mc->cr->AddCall("updateEntryPointForTailCall");
}

// Stuff directly on ICorJitInfo

// Returns extended flags for a particular compilation instance.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1250,6 +1250,13 @@ bool interceptor_ICJI::notifyInstructionSetUsage(
return original_ICorJitInfo->notifyInstructionSetUsage(instructionSet, supportEnabled);
}

void interceptor_ICJI::updateEntryPointForTailCall(
CORINFO_CONST_LOOKUP* entryPoint)
{
mcs->AddCall("updateEntryPointForTailCall");
original_ICorJitInfo->updateEntryPointForTailCall(entryPoint);
}

void interceptor_ICJI::allocMem(
AllocMemArgs* pArgs)
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1094,6 +1094,12 @@ bool interceptor_ICJI::notifyInstructionSetUsage(
return original_ICorJitInfo->notifyInstructionSetUsage(instructionSet, supportEnabled);
}

void interceptor_ICJI::updateEntryPointForTailCall(
CORINFO_CONST_LOOKUP* entryPoint)
{
original_ICorJitInfo->updateEntryPointForTailCall(entryPoint);
}

void interceptor_ICJI::allocMem(
AllocMemArgs* pArgs)
{
Expand Down
5 changes: 5 additions & 0 deletions src/coreclr/ToolBox/superpmi/superpmi/icorjitinfo.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1556,6 +1556,11 @@ bool MyICJI::notifyInstructionSetUsage(CORINFO_InstructionSet instructionSet, bo
return supported;
}

void MyICJI::updateEntryPointForTailCall(CORINFO_CONST_LOOKUP* entryPoint)
{
jitInstance->mc->cr->AddCall("updateEntryPointForTailCall");
}

// Stuff directly on ICorJitInfo

// Returns extended flags for a particular compilation instance.
Expand Down
7 changes: 7 additions & 0 deletions src/coreclr/inc/corinfo.h
Original file line number Diff line number Diff line change
Expand Up @@ -3173,6 +3173,13 @@ class ICorDynamicInfo : public ICorStaticInfo
CORINFO_InstructionSet instructionSet,
bool supportEnabled
) = 0;

// Notify EE that JIT needs an entry-point that is tail-callable.
// This is used for AOT on x64 to support delay loaded fast tailcalls.
// Normally the indirection cell is retrieved from the return address,
// but for tailcalls, the contract is that JIT leaves the indirection cell in
// a register during tailcall.
virtual void updateEntryPointForTailCall(CORINFO_CONST_LOOKUP* entryPoint) = 0;
};

/**********************************************************************************/
Expand Down
3 changes: 3 additions & 0 deletions src/coreclr/inc/icorjitinfoimpl_generated.h
Original file line number Diff line number Diff line change
Expand Up @@ -635,6 +635,9 @@ bool notifyInstructionSetUsage(
CORINFO_InstructionSet instructionSet,
bool supportEnabled) override;

void updateEntryPointForTailCall(
CORINFO_CONST_LOOKUP* entryPoint) override;

void allocMem(
AllocMemArgs* pArgs) override;

Expand Down
12 changes: 6 additions & 6 deletions src/coreclr/inc/jiteeversionguid.h
Original file line number Diff line number Diff line change
Expand Up @@ -43,12 +43,12 @@ typedef const GUID *LPCGUID;
#define GUID_DEFINED
#endif // !GUID_DEFINED

constexpr GUID JITEEVersionIdentifier = { /* 4ef06a0e-e58d-4796-abb7-0cda6460610c */
0x4ef06a0e,
0xe58d,
0x4796,
{0xab, 0xb7, 0xc, 0xda, 0x64, 0x60, 0x61, 0xc}
};
constexpr GUID JITEEVersionIdentifier = { /* 7374274c-5cb5-4c41-872e-01f438ac1548 */
0x7374274c,
0x5cb5,
0x4c41,
{ 0x87, 0x2e, 0x1, 0xf4, 0x38, 0xac, 0x15, 0x48 }
};

//////////////////////////////////////////////////////////////////////////////////////////////////////////
//
Expand Down
1 change: 1 addition & 0 deletions src/coreclr/jit/ICorJitInfo_API_names.h
Original file line number Diff line number Diff line change
Expand Up @@ -160,6 +160,7 @@ DEF_CLR_API(MethodCompileComplete)
DEF_CLR_API(getTailCallHelpers)
DEF_CLR_API(convertPInvokeCalliToCall)
DEF_CLR_API(notifyInstructionSetUsage)
DEF_CLR_API(updateEntryPointForTailCall)
DEF_CLR_API(allocMem)
DEF_CLR_API(reserveUnwindInfo)
DEF_CLR_API(allocUnwindInfo)
Expand Down
8 changes: 8 additions & 0 deletions src/coreclr/jit/ICorJitInfo_API_wrapper.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -1527,6 +1527,14 @@ bool WrapICorJitInfo::notifyInstructionSetUsage(
return temp;
}

void WrapICorJitInfo::updateEntryPointForTailCall(
CORINFO_CONST_LOOKUP* entryPoint)
{
API_ENTER(updateEntryPointForTailCall);
wrapHnd->updateEntryPointForTailCall(entryPoint);
API_LEAVE(updateEntryPointForTailCall);
}

void WrapICorJitInfo::allocMem(
AllocMemArgs* pArgs)
{
Expand Down
34 changes: 30 additions & 4 deletions src/coreclr/jit/codegenarmarch.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2351,6 +2351,24 @@ void CodeGen::genCall(GenTreeCall* call)
// Indirect fast tail calls materialize call target either in gtControlExpr or in gtCallAddr.
genConsumeReg(target);
}
#ifdef FEATURE_READYTORUN
else if (call->IsR2ROrVirtualStubRelativeIndir())
{
assert(((call->IsR2RRelativeIndir()) && (call->gtEntryPoint.accessType == IAT_PVALUE)) ||
((call->IsVirtualStubRelativeIndir()) && (call->gtEntryPoint.accessType == IAT_VALUE)));
assert(call->gtControlExpr == nullptr);

regNumber tmpReg = call->GetSingleTempReg();
// Register where we save call address in should not be overridden by epilog.
assert((tmpReg & (RBM_INT_CALLEE_TRASH & ~RBM_LR)) == tmpReg);

regNumber callAddrReg =
call->IsVirtualStubRelativeIndir() ? compiler->virtualStubParamInfo->GetReg() : REG_R2R_INDIRECT_PARAM;
GetEmitter()->emitIns_R_R(ins_Load(TYP_I_IMPL), emitActualTypeSize(TYP_I_IMPL), tmpReg, callAddrReg);
// We will use this again when emitting the jump in genCallInstruction in the epilog
call->gtRsvdRegs |= genRegMask(tmpReg);
}
#endif

return;
}
Expand Down Expand Up @@ -2557,12 +2575,20 @@ void CodeGen::genCallInstruction(GenTreeCall* call)
((call->IsVirtualStubRelativeIndir()) && (call->gtEntryPoint.accessType == IAT_VALUE)));
#endif // FEATURE_READYTORUN
assert(call->gtControlExpr == nullptr);
assert(!call->IsTailCall());

regNumber tmpReg = call->GetSingleTempReg();
regNumber callAddrReg =
call->IsVirtualStubRelativeIndir() ? compiler->virtualStubParamInfo->GetReg() : REG_R2R_INDIRECT_PARAM;
GetEmitter()->emitIns_R_R(ins_Load(TYP_I_IMPL), emitActualTypeSize(TYP_I_IMPL), tmpReg, callAddrReg);
// For fast tailcalls we have already loaded the call target when processing the call node.
if (!call->IsFastTailCall())
{
regNumber callAddrReg =
call->IsVirtualStubRelativeIndir() ? compiler->virtualStubParamInfo->GetReg() : REG_R2R_INDIRECT_PARAM;
GetEmitter()->emitIns_R_R(ins_Load(TYP_I_IMPL), emitActualTypeSize(TYP_I_IMPL), tmpReg, callAddrReg);
}
else
{
// Register where we save call address in should not be overridden by epilog.
assert((tmpReg & (RBM_INT_CALLEE_TRASH & ~RBM_LR)) == tmpReg);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused about the role of tmpReg in the fast tail call case. How does it end up having the right contents?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It happens in genCall here: https://github.com/jakobbotsch/runtime/blob/9f70cc42d73467c15a458ee9774589271d721536/src/coreclr/jit/codegenarmarch.cpp#L2340-L2375

For normal calls we call genCall which then calls genCallInstruction that takes care to generate the code to load the call target and do the call.
For tailcalls we instead generate the code to load the call target in genCall, but this is the last thing that happens when we see the GenTreeCall node. The remaining work happens during epilog generation which also calls genCallInstruction.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you could pass REG_NA for the tail call case?

I guess it's GetSingleTempReg that is confusing me. It seems odd to "allocate" a temp reg without altering state.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems odd to "allocate" a temp reg without altering state.

Not sure I understand. The temp reg is allocated during RA for this specific optimization we do on ARM/ARM64, where we have no target node and need an extra register to store the call target loaded from the indirection cell. We still need it for the tail call case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, not that it's not ultimately needed, but that the value of tmpReg passed here to GenCall is not used, instead we go call GetSingleTempReg again.

It's not that important, I just found it confusing to follow how the value gets from one place to another.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code here in genCallInstruction is called last. genCall also calls GetSingleTempReg, but it adds the register back to gtRsvdRegs so that this call will succeed:
https://github.com/jakobbotsch/runtime/blob/9f70cc42d73467c15a458ee9774589271d721536/src/coreclr/jit/codegenarmarch.cpp#L2369-L2370

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, makes more sense now.

}

// We have now generated code for gtControlExpr evaluating it into `tmpReg`.
// We just need to emit "call tmpReg" in this case.
Expand Down
46 changes: 34 additions & 12 deletions src/coreclr/jit/codegenxarch.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -5701,18 +5701,40 @@ void CodeGen::genCallInstruction(GenTreeCall* call X86_ARG(target_ssize_t stackA
{
emitter::EmitCallType type =
(call->gtEntryPoint.accessType == IAT_VALUE) ? emitter::EC_FUNC_TOKEN : emitter::EC_FUNC_TOKEN_INDIR;
// clang-format off
genEmitCall(type,
methHnd,
INDEBUG_LDISASM_COMMA(sigInfo)
(void*)call->gtEntryPoint.addr
X86_ARG(argSizeForEmitter),
retSize
MULTIREG_HAS_SECOND_GC_RET_ONLY_ARG(secondRetSize),
ilOffset,
REG_NA,
call->IsFastTailCall());
// clang-format on
if (call->IsFastTailCall() && (type == emitter::EC_FUNC_TOKEN_INDIR))
{
// For fast tailcall with func token indir we already have the indirection cell in REG_R2R_INDIRECT_PARAM,
// so get it from there.
// clang-format off
GetEmitter()->emitIns_Call(
emitter::EC_INDIR_ARD,
methHnd,
INDEBUG_LDISASM_COMMA(sigInfo)
nullptr,
0,
retSize
MULTIREG_HAS_SECOND_GC_RET_ONLY_ARG(secondRetSize),
gcInfo.gcVarPtrSetCur,
gcInfo.gcRegGCrefSetCur,
gcInfo.gcRegByrefSetCur,
ilOffset, REG_R2R_INDIRECT_PARAM, REG_NA, 0, 0, true);
// clang-format on
}
else
{
// clang-format off
genEmitCall(type,
methHnd,
INDEBUG_LDISASM_COMMA(sigInfo)
(void*)call->gtEntryPoint.addr
X86_ARG(argSizeForEmitter),
retSize
MULTIREG_HAS_SECOND_GC_RET_ONLY_ARG(secondRetSize),
ilOffset,
REG_NA,
call->IsFastTailCall());
// clang-format on
}
}
#endif
else
Expand Down
3 changes: 3 additions & 0 deletions src/coreclr/jit/compiler.h
Original file line number Diff line number Diff line change
Expand Up @@ -6231,6 +6231,7 @@ class Compiler
private:
GenTree* fgMorphField(GenTree* tree, MorphAddrContext* mac);
bool fgCanFastTailCall(GenTreeCall* call, const char** failReason);
bool fgShouldFastTailCall(GenTreeCall* call, const char** failReason);
#if FEATURE_FASTTAILCALL
bool fgCallHasMustCopyByrefParameter(GenTreeCall* callee);
#endif
Expand All @@ -6255,9 +6256,11 @@ class Compiler

GenTree* fgMorphPotentialTailCall(GenTreeCall* call);
GenTree* fgGetStubAddrArg(GenTreeCall* call);
unsigned fgGetArgTabEntryParameterLclNum(GenTreeCall* call, fgArgTabEntry* argTabEntry);
void fgMorphRecursiveFastTailCallIntoLoop(BasicBlock* block, GenTreeCall* recursiveTailCall);
Statement* fgAssignRecursiveCallArgToCallerParam(GenTree* arg,
fgArgTabEntry* argTabEntry,
unsigned lclParamNum,
BasicBlock* block,
IL_OFFSETX callILOffset,
Statement* tmpAssignmentInsertionPoint,
Expand Down
6 changes: 5 additions & 1 deletion src/coreclr/jit/gentree.h
Original file line number Diff line number Diff line change
Expand Up @@ -4631,11 +4631,15 @@ struct GenTreeCall final : public GenTree
return (gtCallMoreFlags & GTF_CALL_M_VIRTSTUB_REL_INDIRECT) != 0;
}

#ifdef FEATURE_READYTORUN
bool IsR2RRelativeIndir() const
{
#ifdef FEATURE_READYTORUN
return (gtCallMoreFlags & GTF_CALL_M_R2R_REL_INDIRECT) != 0;
#else
return false;
#endif
}
#ifdef FEATURE_READYTORUN
void setEntryPoint(const CORINFO_CONST_LOOKUP& entryPoint)
{
gtEntryPoint = entryPoint;
Expand Down
1 change: 0 additions & 1 deletion src/coreclr/jit/jitconfigvalues.h
Original file line number Diff line number Diff line change
Expand Up @@ -122,7 +122,6 @@ CONFIG_INTEGER(JitNoStructPromotion, W("JitNoStructPromotion"), 0) // Disables s
CONFIG_INTEGER(JitNoUnroll, W("JitNoUnroll"), 0)
CONFIG_INTEGER(JitOrder, W("JitOrder"), 0)
CONFIG_INTEGER(JitQueryCurrentStaticFieldClass, W("JitQueryCurrentStaticFieldClass"), 1)
CONFIG_INTEGER(JitReportFastTailCallDecisions, W("JitReportFastTailCallDecisions"), 0)
CONFIG_INTEGER(JitPInvokeCheckEnabled, W("JITPInvokeCheckEnabled"), 0)
CONFIG_INTEGER(JitPInvokeEnabled, W("JITPInvokeEnabled"), 1)
CONFIG_METHODSET(JitPrintInlinedMethods, W("JitPrintInlinedMethods"))
Expand Down
19 changes: 10 additions & 9 deletions src/coreclr/jit/lower.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -3746,15 +3746,19 @@ GenTree* Lowering::LowerDirectCall(GenTreeCall* call)

case IAT_PVALUE:
{
bool isR2RRelativeIndir = false;
#if defined(FEATURE_READYTORUN) && defined(TARGET_ARMARCH)
bool hasIndirectionCell = false;
#if defined(TARGET_ARMARCH)
// Skip inserting the indirection node to load the address that is already
// computed in REG_R2R_INDIRECT_PARAM as a hidden parameter. Instead during the
// codegen, just load the call target from REG_R2R_INDIRECT_PARAM.
isR2RRelativeIndir = call->IsR2RRelativeIndir();
#endif // FEATURE_READYTORUN && TARGET_ARMARCH
hasIndirectionCell = call->IsR2RRelativeIndir();
#elif defined(TARGET_XARCH)
// For xarch we usually get the indirection cell from the return address,
// except for fast tailcalls where we do the same as ARM.
hasIndirectionCell = call->IsR2RRelativeIndir() && call->IsFastTailCall();
#endif

if (!isR2RRelativeIndir)
if (!hasIndirectionCell)
{
// Non-virtual direct calls to addresses accessed by
// a single indirection.
Expand Down Expand Up @@ -4813,15 +4817,12 @@ GenTree* Lowering::LowerVirtualStubCall(GenTreeCall* call)
}
else
{

bool shouldOptimizeVirtualStubCall = false;
#if defined(FEATURE_READYTORUN) && defined(TARGET_ARMARCH)
// Skip inserting the indirection node to load the address that is already
// computed in REG_R2R_INDIRECT_PARAM as a hidden parameter. Instead during the
// codegen, just load the call target from REG_R2R_INDIRECT_PARAM.
// However, for tail calls, the call target is always computed in RBM_FASTTAILCALL_TARGET
// and so do not optimize virtual stub calls for such cases.
shouldOptimizeVirtualStubCall = !call->IsTailCall();
shouldOptimizeVirtualStubCall = true;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@EgorBo this is the change that fixes #60025, along with fixing the JIT to load the call target for tailcalls while processing the GenTreeCall node instead of during epilog generation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

#endif // FEATURE_READYTORUN && TARGET_ARMARCH

if (!shouldOptimizeVirtualStubCall)
Expand Down
14 changes: 12 additions & 2 deletions src/coreclr/jit/lsraarmarch.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -178,12 +178,22 @@ int LinearScan::BuildCall(GenTreeCall* call)
{
// Fast tail call - make sure that call target is always computed in volatile registers
// that will not be overridden by epilog sequence.
ctrlExprCandidates = RBM_INT_CALLEE_TRASH;
ctrlExprCandidates = allRegs(TYP_INT) & RBM_INT_CALLEE_TRASH;
assert(ctrlExprCandidates != RBM_NONE);
}
}
else if (call->IsR2ROrVirtualStubRelativeIndir())
{
buildInternalIntRegisterDefForNode(call);
// For R2R and VSD we have stub address in REG_R2R_INDIRECT_PARAM
// and will load call address into the temp register from this register.
regMaskTP candidates = RBM_NONE;
if (call->IsFastTailCall())
{
candidates = allRegs(TYP_INT) & RBM_INT_CALLEE_TRASH;
assert(candidates != RBM_NONE);
}

buildInternalIntRegisterDefForNode(call, candidates);
}
#ifdef TARGET_ARM
else
Expand Down
2 changes: 1 addition & 1 deletion src/coreclr/jit/lsraxarch.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1238,7 +1238,7 @@ int LinearScan::BuildCall(GenTreeCall* call)
#if FEATURE_VARARG
// If it is a fast tail call, it is already preferenced to use RAX.
// Therefore, no need set src candidates on call tgt again.
if (call->IsVarargs() && callHasFloatRegArgs && !call->IsFastTailCall())
if (call->IsVarargs() && callHasFloatRegArgs && (ctrlExprCandidates == RBM_NONE))
{
// Don't assign the call target to any of the argument registers because
// we will use them to also pass floating point arguments as required
Expand Down
Loading