Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace FEATURE_EH_FUNCLETS in JIT with runtime switch #99191

Merged
merged 2 commits into from
Apr 5, 2024

Conversation

filipnavara
Copy link
Member

@filipnavara filipnavara commented Mar 2, 2024

Everything but win-x86 already had FEATURE_EH_FUNCLETS defined unconditionally. For win-x86 there's a new FEATURE_EH_X86_FRAMES switch defined in targetx86.h (shortcut for TARGET_X86 && !UNIX_X86_ABI). Funclet code is always compiled in and controlled at runtime with Compiler::UsesFunclets(). On win-x86 it's controlled by NativeAOT ABI, everywhere else it's unconditionally true.

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Mar 2, 2024
@ghost
Copy link

ghost commented Mar 2, 2024

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Everything but win-x86 already had FEATURE_EH_FUNCLETS defined unconditionally. For win-x86 there's a new FEATURE_EH_X86_FRAMES switch defined in targetx86.h. Funclet code is always compiled in and controlled at runtime with Compiler::UsesFunclets(). On win-x86 it's controlled by NativeAOT ABI, everywhere else it's unconditionally true.

Author: filipnavara
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@@ -1503,10 +1503,11 @@ void CodeGen::genExitCode(BasicBlock* block)
void CodeGen::genJumpToThrowHlpBlk(emitJumpKind jumpKind, SpecialCodeKind codeKind, BasicBlock* failBlk)
{
bool useThrowHlpBlk = compiler->fgUseThrowHelperBlocks();
#if defined(UNIX_X86_ABI) && defined(FEATURE_EH_FUNCLETS)
#if defined(UNIX_X86_ABI)
// TODO: Is this really UNIX_X86_ABI specific? Should we guard with compiler->UsesFunclets() instead?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be double-checked along with

runtime/src/coreclr/jit/morph.cpp

Lines 14299 to 14304 in ce6cf4b

if (info.compXcptnsCount > 0)
{
assert(!codeGen->isGCTypeFixed());
// Enforce fully interruptible codegen for funclet unwinding
SetInterruptible(true);
}

Is any of that relevant for win-x86 with funclets?

@filipnavara filipnavara marked this pull request as ready for review March 2, 2024 19:51
@filipnavara
Copy link
Member Author

cc @jakobbotsch @SingleAccretion since you seemed to favor this over having a separate JIT flavor.

@jkotas
Copy link
Member

jkotas commented Mar 3, 2024

Ideally, we would switch win-x86 to the funclet plan and delete !FEATURE_EH_FUNCLETS code everywhere. @janvorli Have you thought about what it would take?

@jakobbotsch
Copy link
Member

It would be good to check where the throughput cost is coming from (perhaps some of the new runtime funclet checks can be hoisted out).

@jakobbotsch
Copy link
Member

cc @dotnet/jit-contrib

@janvorli
Copy link
Member

janvorli commented Mar 4, 2024

Ideally, we would switch win-x86 to the funclet plan and delete !FEATURE_EH_FUNCLETS code everywhere. @janvorli Have you thought about what it would take?

The original plan was to leave the win x86 on the old plan and not to spend time porting that to the new EH. Is there a reason for modernizing this almost obsolete platform?

@jkotas
Copy link
Member

jkotas commented Mar 4, 2024

Windows x86 support is here to stay for many years. The reason for switching over to the unified model would be code base simplification. We keep spending time on dealing with the difference, as this PR demonstrates.

@filipnavara has been working on enabling native AOT for Windows x86 that uses the funclet model. Our testing strategy for native AOT assumes that the codegen differences are kept to absolute minimum. Switching regular Windows x86 to the funclet model would help with that.

@jakobbotsch
Copy link
Member

jakobbotsch commented Mar 6, 2024

I checked the methods that are contributing to the TP regression in Tier0 for benchmarks.run_pgo. The breakdown looks like:

Base: 17457576750, Diff: 17528444068, +0.4059%

20569264  : +6.76%   : 17.35% : +0.1178% : public: unsigned int __thiscall emitter::emitOutputInstr(struct insGroup *, struct emitter::instrDesc *, unsigned char **)                                                                     
14592023  : NA       : 12.31% : +0.0836% : private: struct insGroup * __thiscall emitter::emitAllocIG(void)                                                                                                                               
14141188  : NA       : 11.93% : +0.0810% : public: void __thiscall CodeGen::siOpenScopesForNonTrackedVars(struct BasicBlock const *, unsigned int)                                                                                        
7645212   : NA       : 6.45%  : +0.0438% : public: void __thiscall emitter::emitBegFN(bool)                                                                                                                                               
3468762   : +5.70%   : 2.93%  : +0.0199% : public: unsigned int __thiscall GCInfo::gcMakeRegPtrTable(unsigned char *, int, struct InfoHdr const &, unsigned int, unsigned int *)                                                          
3332472   : NA       : 2.81%  : +0.0191% : public: bool __thiscall Compiler::UsesFunclets(void)                                                                                                                                           
3085243   : +40.50%  : 2.60%  : +0.0177% : public: void __thiscall emitter::emitCreatePlaceholderIG(enum insGroupPlaceholderType, struct BasicBlock *, unsigned int *const &, unsigned int, unsigned int, bool)                           
2923051   : +24.12%  : 2.47%  : +0.0167% : public: void __thiscall GCInfo::gcCountForHeader(unsigned int *, unsigned int *)                                                                                                               
2415660   : +3.65%   : 2.04%  : +0.0138% : protected: void __thiscall CodeGen::genCall(struct GenTreeCall *)                                                                                                                              
2128950   : +147.06% : 1.80%  : +0.0122% : private: void __thiscall emitter::emitNxtIG(bool)                                                                                                                                              
1950494   : +4.12%   : 1.65%  : +0.0112% : private: unsigned int __thiscall GCInfo::gcInfoBlockHdrSave(unsigned char *, int, unsigned int, unsigned int, unsigned int, struct InfoHdr *, int *)                                           
1760517   : NA       : 1.48%  : +0.0101% : public: void __thiscall Compiler::funSetCurrentFunc(unsigned int)                                                                                                                              
1701900   : +14.19%  : 1.44%  : +0.0097% : private: void * __thiscall emitter::emitAddLabel(unsigned int *const &, unsigned int, unsigned int)                                                                                            
1597287   : +2.00%   : 1.35%  : +0.0091% : private: void __thiscall LinearScan::setBlockSequence(void)                                                                                                                                    
1414600   : NA       : 1.19%  : +0.0081% : public: struct FuncInfoDsc * __thiscall Compiler::funCurrentFunc(void)                                                                                                                         
1332289   : +3.57%   : 1.12%  : +0.0076% : public: void __thiscall Compiler::lvaAssignVirtualFrameOffsetsToLocals(void)                                                                                                                   
1253994   : +39.81%  : 1.06%  : +0.0072% : public: enum PhaseStatus __thiscall Compiler::lvaMarkLocalVars(void)                                                                                                                           
1060950   : +22.73%  : 0.89%  : +0.0061% : public: void __thiscall CodeGen::genEmitMachineCode(void)                                                                                                                                      
1057289   : +5.74%   : 0.89%  : +0.0061% : public: void __thiscall CodeGen::genEmitUnwindDebugGCandEH(void)                                                                                                                               
979839    : NA       : 0.83%  : +0.0056% : protected: void __thiscall Compiler::impImportLeaveEHRegions(struct BasicBlock *)                                                                                                              
926607    : +1.81%   : 0.78%  : +0.0053% : protected: void __thiscall CodeGen::genFnProlog(void)                                                                                                                                          
919488    : +12.16%  : 0.78%  : +0.0053% : public: void __thiscall emitter::emitGeneratePrologEpilog(void)                                                                                                                                
535926    : +8.21%   : 0.45%  : +0.0031% : protected: void __thiscall CodeGen::genExitCode(struct BasicBlock *)                                                                                                                           
512478    : NA       : 0.43%  : +0.0029% : public: unsigned int __thiscall Compiler::bbThrowIndex(struct BasicBlock *)                                                                                                                    
391775    : +0.28%   : 0.33%  : +0.0022% : public: unsigned int __thiscall emitter::emitEndCodeGen(class Compiler *, bool, bool, bool, unsigned int, unsigned int *, unsigned int *, void **, void **, void **, void **, void **, void **)
390696    : +6.24%   : 0.33%  : +0.0022% : public: void __thiscall Compiler::fgFindBasicBlocks(void)                                                                                                                                      
289133    : NA       : 0.24%  : +0.0017% : public: enum PhaseStatus __thiscall Compiler::fgMergeFinallyChains(void)                                                                                                                       
283556    : +0.91%   : 0.24%  : +0.0016% : protected: void __thiscall Compiler::compCompile(void **, unsigned int *, class JitFlags *)                                                                                                    
267951    : +4.55%   : 0.23%  : +0.0015% : public: void __thiscall emitter::emitBegPrologEpilog(struct insGroup *)                                                                                                                        
240221    : +0.98%   : 0.20%  : +0.0014% : protected: void __thiscall CodeGen::genCheckUseBlockInit(void)                                                                                                                                 
218487    : +0.84%   : 0.18%  : +0.0013% : public: void __thiscall Compiler::compInit(class ArenaAllocator *, struct CORINFO_METHOD_STRUCT_*, class ICorJitInfo *, struct CORINFO_METHOD_INFO *, struct InlineInfo *)                     
208706    : +1.19%   : 0.18%  : +0.0012% : protected: void __thiscall CodeGen::genInitialize(void)                                                                                                                                        
182117    : +2.21%   : 0.15%  : +0.0010% : public: enum PhaseStatus __thiscall Compiler::fgAddInternal(void)                                                                                                                              
-141778   : -33.31%  : 0.12%  : -0.0008% : __scrt_stub_for_is_c_termination_complete                                                                                                                                                      
-141848   : -100.00% : 0.12%  : -0.0008% : public: void __thiscall Compiler::fgNormalizeEH(void)                                                                                                                                          
-923488   : -89.05%  : 0.78%  : -0.0053% : protected: void __thiscall Compiler::impImportLeave(struct BasicBlock *)                                                                                                                       
-2406826  : -20.66%  : 2.03%  : -0.0138% : private: void __thiscall emitter::emitGenIG(struct insGroup *)                                                                                                                                 
-2627780  : -1.42%   : 2.22%  : -0.0151% : protected: void __thiscall CodeGen::genCodeForBBlist(void)                                                                                                                                     
-4615901  : -29.00%  : 3.89%  : -0.0264% : public: void __thiscall CodeGen::genGenerateMachineCode(void)                                                                                                                                  
-12943600 : -100.00% : 10.92% : -0.0741% : private: void __thiscall emitter::emitNewIG(void)                                                                                                                                              

A number of cases in the top look like different inlining decisions that we'd expect to change regardless with native PGO.

@BruceForstall
Copy link
Member

We should update the documentation that states that Windows/x86 does not use funclets: https://github.com/dotnet/runtime/blob/main/docs/design/coreclr/botr/clr-abi.md#funclets

@MichalPetryka
Copy link
Contributor

We should update the documentation that states that Windows/x86 does not use funclets: https://github.com/dotnet/runtime/blob/main/docs/design/coreclr/botr/clr-abi.md#funclets

Aren't they still used in CoreCLR? I though this only disabled them for NativeAOT.

@BruceForstall
Copy link
Member

Aren't they still used in CoreCLR? I though this only disabled them for NativeAOT.

That document covers NativeAOT as well as CoreCLR.

@jkotas
Copy link
Member

jkotas commented Mar 11, 2024

@filipnavara has been working on enabling native AOT for Windows x86 that uses the funclet model. Our testing strategy for native AOT assumes that the codegen differences are kept to absolute minimum. Switching regular Windows x86 to the funclet model would help with that.

I have done a bit of work towards this: https://github.com/jkotas/runtime/tree/x86funclets (compiles, does not work and needs cleanup of NYIs)

Copy link
Member

@BruceForstall BruceForstall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you update this PR to fix the merge conflicts, and so the testing + asmdiffs/TP diffs can run again?

src/coreclr/jit/targetx86.h Outdated Show resolved Hide resolved
src/coreclr/jit/compiler.h Outdated Show resolved Hide resolved
return UsesFunclets();
}
#else
bool UsesFunclets()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

Suggested change
bool UsesFunclets()
static bool UsesFunclets()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is semi-intentional. The compiler will inline it for non-x86 configurations anyway, so the static doesn't have an effect on the resulting code. The absence of static does, however, prevent it from calling Compiler:: UsesFunclets() by accident and only discovering it when the x86 JIT is built on CI.

return true;
}

bool UsesCallfinallyThunks()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

Suggested change
bool UsesCallfinallyThunks()
static bool UsesCallFinallyThunks()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same reason as #99191 (comment)

@filipnavara
Copy link
Member Author

Can you update this PR to fix the merge conflicts, and so the testing + asmdiffs/TP diffs can run again?

I'll do it first thing tomorrow morning along with applying your suggestions.

@BruceForstall
Copy link
Member

Diffs

No asm diffs (expected).

Small TP diffs on all non-x86 platforms, mostly improvements. Surprising there is any change here?

Big TP regressions on x86, especially MinOpts: up to almost 0.5%.

I wonder if this could be significantly improved by changing:

    bool UsesFunclets()
    {
        return IsTargetAbi(CORINFO_NATIVEAOT_ABI);
    }

such that we compute the result early (maybe in compInit?), cache it on Compiler, and then have:

    bool UsesFunclets()
    {
        return eeIsNativeAotAbi;
    }

@filipnavara
Copy link
Member Author

filipnavara commented Mar 29, 2024

The TP diff on x86 are mostly inlining decisions that would like be reduced with a new PGO profile. At least that was the conclusion when @jakobbotsch looked into it. I can try some variations to see if it makes a difference but other unrelated PRs saw similar patterns on small changes.

@BruceForstall
Copy link
Member

The TP diff on x86 are mostly inlining decisions that would like be reduced with a new PGO profile.

Our superpmi-diffs pipeline disables native PGO (otherwise we'd never get "apples-to-apples" comparison).

I tried the small suggestion I made above here:

#100655

(after rebasing this PR on top of upstream/main, and fixing the conflicts) and it seemed to greatly reduce the x86 TP regression.

@filipnavara
Copy link
Member Author

Thanks for testing it. I did another rebase and cherry-picked your change.

@BruceForstall
Copy link
Member

Diffs

The notable win-x86 TP diffs:

image

Copy link
Member

@BruceForstall BruceForstall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@BruceForstall BruceForstall merged commit 41b1091 into dotnet:main Apr 5, 2024
113 of 115 checks passed
@filipnavara filipnavara deleted the jit-funclets branch April 5, 2024 18:47
matouskozak pushed a commit to matouskozak/runtime that referenced this pull request Apr 30, 2024
* Replace FEATURE_EH_FUNCLETS/FEATURE_EH_CALLFINALLY_THUNKS in JIT with runtime switch

* Cache Native AOT ABI check to see if TP improves

---------

Co-authored-by: Bruce Forstall <brucefo@microsoft.com>
@github-actions github-actions bot locked and limited conversation to collaborators May 6, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants