Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Add a head merging transformation alongside the tail merging transformation #90468

Merged
merged 12 commits into from
Aug 18, 2023

Conversation

jakobbotsch
Copy link
Member

@jakobbotsch jakobbotsch commented Aug 12, 2023

Add a pass that does head merging to complement the existing tail merging pass. Unlike tail merging this either requires spilling or reordering the first statement with the terminator node of the predecessor, which requires some interference checking.

Fix #90017

Add a pass that does head merging to compliment the existing tail
merging pass. Unlike tail merging this requires reordering the first
statement with the terminator node of the predecessor, which requires
some interference checking.

Fix dotnet#90017
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Aug 12, 2023
@ghost ghost assigned jakobbotsch Aug 12, 2023
@ghost
Copy link

ghost commented Aug 12, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Add a pass that does head merging to compliment the existing tail merging pass. Unlike tail merging this requires reordering the first statement with the terminator node of the predecessor, which requires some interference checking.

Fix #90017

Author: jakobbotsch
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@jakobbotsch jakobbotsch added this to the 9.0.0 milestone Aug 14, 2023
@jakobbotsch jakobbotsch marked this pull request as ready for review August 16, 2023 10:17
@jakobbotsch jakobbotsch marked this pull request as draft August 16, 2023 10:25
@jakobbotsch jakobbotsch changed the title JIT: Add a head merging pass JIT: Add a head merging transformation alongside the tail merging transformation Aug 16, 2023
@jakobbotsch jakobbotsch marked this pull request as ready for review August 16, 2023 13:08
@jakobbotsch
Copy link
Member Author

jakobbotsch commented Aug 16, 2023

cc @dotnet/jit-contrib PTAL @AndyAyersMS. Thoughts on this? Better to unify them more? Split them more apart?

Diffs. Quite skewed because of one particular test, but still a fair amount of improvements, in a lot of cases creating new conditional selects (which both bring size-wise improvements and regressions). The reordering also can bring regressions with it, in the same way that we see from forward sub sometimes.

There's also a consideration that tail merging and head merging can create new opportunities for each other, but this change doesn't try to rerun either of them when that potentially happens.

@jakobbotsch
Copy link
Member Author

/azp run Fuzzlyn, runtime-coreclr jitstress, runtime-coreclr libraries-jitstress

@azure-pipelines
Copy link

Azure Pipelines successfully started running 3 pipeline(s).

{
if (gtHasRef(tree1, dsc->lvFieldLclStart + i))
{
JITDUMP(" cannot reorder with interferring use of struct field\n");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
JITDUMP(" cannot reorder with interferring use of struct field\n");
JITDUMP(" cannot reorder with interfering use of struct field\n");

similar elsehwere

@AndyAyersMS
Copy link
Member

cc @dotnet/jit-contrib PTAL @AndyAyersMS. Thoughts on this? Better to unify them more? Split them more apart?

Seems like a pretty natural fit.

I always worry that interference checking (particularly when repeated) can be costly. Current TP cost does not look too bad, but suggests a lot of failed checks (since if we can head merge it likely improves TP). So I wonder if there are ways to fail faster?

Do you know how often we end up moving multiple statements?

@jakobbotsch
Copy link
Member Author

jakobbotsch commented Aug 17, 2023

The Fuzzlyn failures are #7539 (hitting this a lot after adding Unsafe.As support -- need to figure out what to do about it). Runtime and jitstress failures are #90593.

I always worry that interference checking (particularly when repeated) can be costly. Current TP cost does not look too bad, but suggests a lot of failed checks (since if we can head merge it likely improves TP). So I wonder if there are ways to fail faster?

Do you know how often we end up moving multiple statements?

For libraries.pmi the histogram of how many statements we head merge in each block (where we head merge any) looks like:

      1 ..       1 ===>    4173 count ( 56% of total)
      2 ..       2 ===>     972 count ( 70% of total)
      3 ..       3 ===>     919 count ( 82% of total)
      4 ..       4 ===>     366 count ( 87% of total)
      5 ..       5 ===>     487 count ( 94% of total)
      6 ..      10 ===>     360 count ( 99% of total)
     11 ..      15 ===>      40 count ( 99% of total)
     16 ..      20 ===>       5 count (100% of total)

I can try to see if I can improve TP a bit -- for one we shouldn't need to iterate over all BBs twice if we just call both head and tail merging from one loop.
I'm not sure that the interference checks are that costly, but certainly we could do fewer tree walks as part of it.

@jakobbotsch
Copy link
Member Author

The detailed TP breakdown is, after moving the head merging into fgHeadMerge:

Base: 115595112161, Diff: 115766296686, +0.1481%

?fgHeadMerge@Compiler@@QEAA_NPEAUBasicBlock@@AEAV?$ArrayStack@UPredSuccInfo@@@@_N@Z                                                                                                                                                : 159867209  : NA       : 20.48% : +0.1383%
`Compiler::fgHeadTailMerge'::`2'::<lambda_1>::operator()                                                                                                                                                                           : 128143167  : NA       : 16.42% : +0.1109%
?NumSucc@BasicBlock@@QEAAIPEAVCompiler@@@Z                                                                                                                                                                                         : 55787892   : +32.82%  : 7.15%  : +0.0483%
?fgHeadTailMerge@Compiler@@QEAA?AW4PhaseStatus@@_N@Z                                                                                                                                                                               : 53696551   : NA       : 6.88%  : +0.0465%
?GetSucc@BasicBlock@@QEAAPEAU1@IPEAVCompiler@@@Z                                                                                                                                                                                   : 30154807   : +13.41%  : 3.86%  : +0.0261%
?Compare@GenTree@@SA_NPEAU1@0_N@Z                                                                                                                                                                                                  : 15846220   : +21.62%  : 2.03%  : +0.0137%
`Compiler::compCompile'::`2'::<lambda_3>::operator()                                                                                                                                                                               : 12277667   : NA       : 1.57%  : +0.0106%
??$Emplace@AEAUPredSuccInfo@@@?$ArrayStack@UPredSuccInfo@@@@QEAAXAEAUPredSuccInfo@@@Z                                                                                                                                              : 5529537    : NA       : 0.71%  : +0.0048%
?compCompile@Compiler@@IEAAXPEAPEAXPEAIPEAVJitFlags@@@Z                                                                                                                                                                            : 1684987    : +1.37%   : 0.22%  : +0.0015%
GenTreeVisitor<`Compiler::gtHasLocalsWithAddrOp'::`2'::LocalsWithAddrOpVisitor>::WalkTree                                                                                                                                          : 1252917    : +1.84%   : 0.16%  : +0.0011%
?fgUpdateFlowGraph@Compiler@@QEAA_N_N0@Z                                                                                                                                                                                           : 1130065    : +0.08%   : 0.14%  : +0.0010%
BasicBlock::VisitAllSuccs<`Compiler::optReachable'::`12'::<lambda_1> >                                                                                                                                                             : 879816     : +0.47%   : 0.11%  : +0.0008%
?fgReorderBlocks@Compiler@@QEAA_N_N@Z                                                                                                                                                                                              : -790057    : -0.21%   : 0.10%  : -0.0007%
?newRefPosition@LinearScan@@AEAAPEAVRefPosition@@PEAVInterval@@IW4RefType@@PEAUGenTree@@_KI@Z                                                                                                                                      : -801471    : -0.09%   : 0.10%  : -0.0007%
?RenameDef@SsaBuilder@@AEAAXPEAUGenTree@@PEAUBasicBlock@@@Z                                                                                                                                                                        : -812236    : -0.86%   : 0.10%  : -0.0007%
?incRefCnts@LclVarDsc@@QEAAXNPEAVCompiler@@W4RefCountState@@_N@Z                                                                                                                                                                   : -814351    : -0.27%   : 0.10%  : -0.0007%
?OptimizeRangeCheck@RangeCheck@@QEAAXPEAUBasicBlock@@PEAUStatement@@PEAUGenTree@@@Z                                                                                                                                                : -825515    : -0.37%   : 0.11%  : -0.0007%
?genCodeForTreeNode@CodeGen@@IEAAXPEAUGenTree@@@Z                                                                                                                                                                                  : -825588    : -0.12%   : 0.11%  : -0.0007%
?MakeCopy@?$BitSetOps@PEA_K$00PEAVCompiler@@VTrackedVarBitSetTraits@@@@SAPEA_KPEAVCompiler@@PEA_K@Z                                                                                                                                : -835566    : -0.32%   : 0.11%  : -0.0007%
?fgValueNumberBlock@Compiler@@QEAAXPEAUBasicBlock@@@Z                                                                                                                                                                              : -846018    : -0.31%   : 0.11%  : -0.0007%
?AddPhiArg@SsaBuilder@@AEAAXPEAUBasicBlock@@PEAUStatement@@PEAUGenTreePhi@@II0@Z                                                                                                                                                   : -862799    : -1.49%   : 0.11%  : -0.0007%
?getWeight@LinearScan@@AEAANPEAVRefPosition@@@Z                                                                                                                                                                                    : -866614    : -0.29%   : 0.11%  : -0.0007%
?UnionD@?$BitSetOps@PEA_K$00PEAVCompiler@@VTrackedVarBitSetTraits@@@@SAXPEAVCompiler@@AEAPEA_KPEA_K@Z                                                                                                                              : -871931    : -0.20%   : 0.11%  : -0.0008%
?fgMorphTree@Compiler@@QEAAPEAUGenTree@@PEAU2@PEAUMorphAddrContext@1@@Z                                                                                                                                                            : -883522    : -0.10%   : 0.11%  : -0.0008%
?RenamePushDef@SsaBuilder@@AEAAIPEAUGenTree@@PEAUBasicBlock@@I_N@Z                                                                                                                                                                 : -886870    : -1.13%   : 0.11%  : -0.0008%
?EnsureCoversInd@?$JitExpandArray@PEAUChunk@ValueNumStore@@@@IEAAXI@Z                                                                                                                                                              : -892604    : -0.31%   : 0.11%  : -0.0008%
?gtSetEvalOrder@Compiler@@QEAAIPEAUGenTree@@@Z                                                                                                                                                                                     : -905597    : -0.05%   : 0.12%  : -0.0008%
??$resolveRegisters@$00@LinearScan@@QEAAXXZ                                                                                                                                                                                        : -934671    : -0.15%   : 0.12%  : -0.0008%
?fgInterBlockLocalVarLiveness@Compiler@@QEAAXXZ                                                                                                                                                                                    : -998691    : -0.17%   : 0.13%  : -0.0009%
?optVnCopyProp@Compiler@@QEAA?AW4PhaseStatus@@XZ                                                                                                                                                                                   : -1013931   : -0.51%   : 0.13%  : -0.0009%
?RewriteNode@Rationalizer@@AEAA?AW4fgWalkResult@Compiler@@PEAPEAUGenTree@@AEAV?$ArrayStack@PEAUGenTree@@@@@Z                                                                                                                       : -1016909   : -0.13%   : 0.13%  : -0.0009%
?fgValueNumberStore@Compiler@@QEAAXPEAUGenTree@@@Z                                                                                                                                                                                 : -1027387   : -0.72%   : 0.13%  : -0.0009%
?fgComputeLifeLIR@Compiler@@QEAAXAEAPEA_KPEAUBasicBlock@@AEBQEA_K@Z                                                                                                                                                                : -1032432   : -0.25%   : 0.13%  : -0.0009%
?BuildNode@LinearScan@@AEAAHPEAUGenTree@@@Z                                                                                                                                                                                        : -1036010   : -0.15%   : 0.13%  : -0.0009%
?fgPerBlockLocalVarLiveness@Compiler@@QEAAXXZ                                                                                                                                                                                      : -1052193   : -0.16%   : 0.13%  : -0.0009%
?InsertPhiFunctions@SsaBuilder@@AEAAXPEAPEAUBasicBlock@@H@Z                                                                                                                                                                        : -1106749   : -0.58%   : 0.14%  : -0.0010%
?optBlockCopyProp@Compiler@@QEAA_NPEAUBasicBlock@@PEAV?$JitHashTable@IU?$JitSmallPrimitiveKeyFuncs@I@@PEAV?$ArrayStack@VCopyPropSsaDef@Compiler@@@@VCompAllocator@@VJitHashTableBehavior@@@@@Z                                     : -1144389   : -0.32%   : 0.15%  : -0.0010%
?fgValueNumberTree@Compiler@@QEAAXPEAUGenTree@@@Z                                                                                                                                                                                  : -1149567   : -0.17%   : 0.15%  : -0.0010%
`Compiler::optCopyPropPushDef'::`2'::<lambda_1>::operator()                                                                                                                                                                        : -1166866   : -1.11%   : 0.15%  : -0.0010%
?optAssertionProp@Compiler@@QEAAPEAUGenTree@@AEBQEA_KPEAU2@PEAUStatement@@PEAUBasicBlock@@@Z                                                                                                                                       : -1194100   : -0.26%   : 0.15%  : -0.0010%
?fgMorphSmpOp@Compiler@@AEAAPEAUGenTree@@PEAU2@PEAUMorphAddrContext@1@PEA_N@Z                                                                                                                                                      : -1206778   : -0.07%   : 0.15%  : -0.0010%
?fgComputeLifeTrackedLocalDef@Compiler@@QEAA_NAEAPEA_KAEBQEA_KAEAVLclVarDsc@@PEAUGenTreeLclVarCommon@@@Z                                                                                                                           : -1229223   : -0.67%   : 0.16%  : -0.0011%
?optAssertionPropMain@Compiler@@QEAA?AW4PhaseStatus@@XZ                                                                                                                                                                            : -1373368   : -0.34%   : 0.18%  : -0.0012%
?PerBlockAnalysis@LiveVarAnalysis@@AEAA_NPEAUBasicBlock@@_N1@Z                                                                                                                                                                     : -1390878   : -0.10%   : 0.18%  : -0.0012%
??$buildIntervals@$00@LinearScan@@QEAAXXZ                                                                                                                                                                                          : -1420236   : -0.28%   : 0.18%  : -0.0012%
?doLinearScan@LinearScan@@UEAA?AW4PhaseStatus@@XZ                                                                                                                                                                                  : -1472138   : -0.40%   : 0.19%  : -0.0013%
?WalkTree@?$GenTreeVisitor@V?$GenericTreeWalker@$0A@$00$0A@$00@@@@QEAA?AW4fgWalkResult@Compiler@@PEAPEAUGenTree@@PEAU4@@Z                                                                                                          : -1480816   : -0.34%   : 0.19%  : -0.0013%
?UpdateLifeVar@?$TreeLifeUpdater@$00@@AEAAXPEAUGenTree@@PEAUGenTreeLclVarCommon@@@Z                                                                                                                                                : -1492533   : -0.53%   : 0.19%  : -0.0013%
?resolveLocalRef@LinearScan@@AEAAXPEAUBasicBlock@@PEAUGenTreeLclVar@@PEAVRefPosition@@@Z                                                                                                                                           : -1708209   : -0.47%   : 0.22%  : -0.0015%
?optVNAssertionPropCurStmtVisitor@Compiler@@KA?AW4fgWalkResult@1@PEAPEAUGenTree@@PEAUfgWalkData@1@@Z                                                                                                                               : -1787018   : -0.34%   : 0.23%  : -0.0015%
?genCodeForBBlist@CodeGen@@IEAAXXZ                                                                                                                                                                                                 : -1863492   : -0.20%   : 0.24%  : -0.0016%
?optAssertionGen@Compiler@@QEAAXPEAUGenTree@@@Z                                                                                                                                                                                    : -1953068   : -0.28%   : 0.25%  : -0.0017%
?lvaComputeRefCounts@Compiler@@QEAAX_N0@Z                                                                                                                                                                                          : -1990712   : -0.26%   : 0.26%  : -0.0017%
jitstd::`anonymous namespace'::quick_sort<unsigned int *,LclVarDsc_BlendedCode_Less>                                                                                                                                               : -2055001   : -0.47%   : 0.26%  : -0.0018%
?allocateMemory@ArenaAllocator@@QEAAPEAX_K@Z                                                                                                                                                                                       : -2058385   : -0.08%   : 0.26%  : -0.0018%
?fgMarkUseDef@Compiler@@AEAAXPEAUGenTreeLclVarCommon@@@Z                                                                                                                                                                           : -2423637   : -0.39%   : 0.31%  : -0.0021%
?allocateRegisters@LinearScan@@QEAAXXZ                                                                                                                                                                                             : -2501343   : -0.08%   : 0.32%  : -0.0022%
??$select@$0A@@RegisterSelection@LinearScan@@QEAA_KPEAVInterval@@PEAVRefPosition@@@Z                                                                                                                                               : -3296353   : -0.12%   : 0.42%  : -0.0029%
?DoPhase@@YAXPEAVCompiler@@W4Phases@@P81@EAA?AW4PhaseStatus@@XZ@Z                                                                                                                                                                  : -3654638   : -1.32%   : 0.47%  : -0.0032%
?processBlockStartLocations@LinearScan@@AEAAXPEAUBasicBlock@@@Z                                                                                                                                                                    : -3668164   : -0.25%   : 0.47%  : -0.0032%
ArrayStack<`Compiler::fgTailMerge'::`2'::PredInfo>::Emplace<`Compiler::fgTailMerge'::`2'::PredInfo &>                                                                                                                              : -5529385   : -100.00% : 0.71%  : -0.0048%
?optCopyProp@Compiler@@QEAA_NPEAUBasicBlock@@PEAUStatement@@PEAUGenTreeLclVarCommon@@IPEAV?$JitHashTable@IU?$JitSmallPrimitiveKeyFuncs@I@@PEAV?$ArrayStack@VCopyPropSsaDef@Compiler@@@@VCompAllocator@@VJitHashTableBehavior@@@@@Z : -10813657  : -0.93%   : 1.39%  : -0.0094%
`Compiler::compCompile'::`2'::<lambda_2>::operator()                                                                                                                                                                               : -12285906  : -100.00% : 1.57%  : -0.0106%
?fgTailMerge@Compiler@@QEAA?AW4PhaseStatus@@XZ                                                                                                                                                                                     : -28273557  : -100.00% : 3.62%  : -0.0245%
`Compiler::fgTailMerge'::`2'::<lambda_1>::operator()                                                                                                                                                                               : -128142453 : -100.00% : 16.42% : -0.1109%

(Note that fgTailMerge() was changed to fgHeadTailMerge(bool), so the tail merging lambda shows up both as appearing with the new name and disappearing with the old). So it seems like it is pretty close to the same TP cost as the tail merging.

fgCanMoveFirstStatementIntoPred is not costly enough to show up in the trace. gtHasLocalsWithAddrOp barely shows up.

I collected stats on the checks, and over libraries.pmi fgCanMoveFirstStatementIntoPred is called 18110 times and returns true 15688 times and false 2422 times.

@AndyAyersMS
Copy link
Member

What happens if you just restrict head merge to BBJ_COND -- do we see any interesting cases with switches?

@jakobbotsch
Copy link
Member Author

What happens if you just restrict head merge to BBJ_COND -- do we see any interesting cases with switches?

It over halves the TP impact with only a couple non-test regressions -- I will switch to that. I wonder how much it is about visiting switches compared to the Succs enumerator being expensive -- I am introducing a visitor based regular successor enumerator in #89328 that could probably also have helped.

@jakobbotsch
Copy link
Member Author

Failures are known according to build analysis.

@jakobbotsch jakobbotsch merged commit e9ce3aa into dotnet:main Aug 18, 2023
@jakobbotsch jakobbotsch deleted the head-merging branch August 18, 2023 09:39
@ghost ghost locked as resolved and limited conversation to collaborators Sep 17, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cmove is not emitted for a simple expression
2 participants