Skip to content

Commit

Permalink
Generalize loop pre-header creation and loop hoisting (#62560)
Browse files Browse the repository at this point in the history
* Generalize loop pre-header creation and loop hoisting

A loop pre-header is a block that flows directly (and only) to the loop entry
block. The loop pre-header is the only non-loop predecessor of the entry block.
Loop invariant code can be hoisted to the loop pre-header where it is
guaranteed to be executed just once (per loop entry).

Currently, a loop pre-header has a number of restrictions:
- it is only created for a "do-while" (top entry) loop, not for a mid-loop entry.
- it isn't created if the current loop head and the loop entry block are in
different EH try regions

Additionally, partially due those restrictions, loop hoisting has restrictions:
- it requires a "do-while" (top entry) loop
- it requires the existing `head` block to dominate the loop entry block
- it requires the existing `head` block to be in the same EH region as the entry block
- it won't hoist if the `entry` block is the first block of a handler

This change removes all these restrictions.

Previously, even if we did create a pre-header, the definition of a pre-header
was a little weaker: an entry predecessor could be a non-loop block and also
not the pre-header, if the predecessor was dominated by the entry block. This
is more complicated to reason about, so I change the pre-header creation to
force entry block non-loop predecessors to branch to the pre-header instead.
This case only rarely occurs, when we have what looks like an outer loop back
edge but the natural loop recognition package doesn't recognize it as an outer loop.

I added a "stress mode" to always create a loop pre-header immediately after
loop recognition. This is disabled currently because loop cloning doesn't
respect the special status and invariants of a pre-header, and so inserts
all the cloning conditions and copied blocks after the pre-header, triggering
new loop structure asserts. This should be improved in the future.

A lot more checking of the loop table and loop annotations on blocks has been
added. This revealed a number of problems with loop unrolling leaving things
in a bad state for downstream phases. Loop unrolling has been updated to fix
this, in particular, the loop table is rebuilt if it is detected that we unroll
a loop that contains nested loops, since there will now be multiple copies of
those nested loops. This is the first case where we might rebuild the loop
table, so it lays the groundwork for potentially rebuilding the loop table in
other cases, such as after loop cloning where we don't add the "slow path"
loops to the table.

There is some code refactoring to simplify the "find loops" code as well.

Some change details:
- `optSetBlockWeights` is elevated to a "phase" that runs prior to loop recognition.
- LoopFlags is simplified:
	- LPFLG_DO_WHILE is removed; call `lpIsTopEntry` instead
	- LPFLG_ONE_EXIT is removed; check `lpExitCnt == 1` instead
	- LPFLG_HOISTABLE is removed (there are no restrictions anymore)
	- LPFLG_CONST is removed: check `lpFlags & (LPFLG_CONST_INIT | LPFLG_CONST_LIMIT) == (LPFLG_CONST_INIT | LPFLG_CONST_LIMIT)` instead (only used in one place
	- bool lpContainsCall is removed and replaced by LPFLG_CONTAINS_CALL
- Added a `lpInitBlock` field to the loop table. For constant and variable
initialization loops, code assumed that these expressions existed in the
`head` block. This isn't true anymore if we insert a pre-header block.
So, capture the block where these actually exist when we determine that
they do exist, and explicitly use this block pointer where needed.
- Added `fgComputeReturnBlocks()` to extract this code out of `fgComputeReachability` into a function
- Added `optFindAndScaleGeneralLoopBlocks()` to extract this out of loop recognition to its own function.
- Added `optResetLoopInfo()` to reset the loop table and block annotations related to loops
- Added `fgDebugCheckBBNumIncreasing()` to allow asserting that the bbNum
order of blocks is increasing. This should be used in phases that depend
on this order to do bbNum comparisons.
- Add a lot more loop table validation in `fgDebugCheckLoopTable()`

* Inline fgBuildBlockNumMap to allow using _alloca

* Fix BBJ_SWITCH output

1. Change `dspSuccs()` to not call code that will call `GetDescriptorForSwitch()`
2. Change `GetDescriptorForSwitch()` to use the correct max block number while
inlining. We probably don't or shouldn't call GetDescriptorForSwitch while inlining,
especially after (1), but this change doesn't hurt.

* Remove incorrect assertion

There was an assertion when considering an existing `head` block as
a potential pre-header, that the `entry` block have the `head`
block as a predecessor. However, we early exit if we find a non-head,
non-loop edge. This could happen before we encounter the `head` block,
making the assert incorrect. We don't want to run the entire loop just
for the purpose of the assert (at least not here), so just remove the
assert.

* Formatting

* Use `_alloca` instead of `alloca` name

* Convert fgBBNumMax usage

Change:
```
compIsForInlining() ? impInlineInfo->InlinerCompiler->fgBBNumMax : fgBBNumMax
```
to:
```
impInlineRoot()->fgBBNumMax
```

* Code review feedback

1. Added loop epoch concept. Currently set but never checked.
2. Added disabled assert about return blocks always being moved
out of line of loop body.
3. Fixed bug checking `totalIter` after it was decremented to zero
as a loop variable
4. Added more comments on incremental loop block `bbNatLoopNum`
setting.

* Add EH condition when converting head to pre-header

When considering converting an existing loop `head` block to a pre-header,
verify that the block has the same EH try region that we would create
for a new pre-header. If not, we go ahead and create the new pre-header.
  • Loading branch information
BruceForstall authored Dec 14, 2021
1 parent c3683b0 commit 955a681
Show file tree
Hide file tree
Showing 20 changed files with 1,147 additions and 460 deletions.
47 changes: 39 additions & 8 deletions src/coreclr/jit/block.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -575,18 +575,49 @@ unsigned BasicBlock::dspCheapPreds()
return count;
}

/*****************************************************************************
*
* Display the basic block successors.
*/

//------------------------------------------------------------------------
// dspSuccs: Display the basic block successors.
//
// Arguments:
// compiler - compiler instance; passed to NumSucc(Compiler*) -- see that function for implications.
//
void BasicBlock::dspSuccs(Compiler* compiler)
{
bool first = true;
for (BasicBlock* const succ : Succs(compiler))

// If this is a switch, we don't want to call `Succs(Compiler*)` because it will eventually call
// `GetSwitchDescMap()`, and that will have the side-effect of allocating the unique switch descriptor map
// and/or compute this switch block's unique succ set if it is not present. Debug output functions should
// never have an effect on codegen. We also don't want to assume the unique succ set is accurate, so we
// compute it ourselves here.
if (bbJumpKind == BBJ_SWITCH)
{
// Create a set with all the successors. Don't use BlockSet, so we don't need to worry
// about the BlockSet epoch.
unsigned bbNumMax = compiler->impInlineRoot()->fgBBNumMax;
BitVecTraits bitVecTraits(bbNumMax + 1, compiler);
BitVec uniqueSuccBlocks(BitVecOps::MakeEmpty(&bitVecTraits));
for (BasicBlock* const bTarget : SwitchTargets())
{
BitVecOps::AddElemD(&bitVecTraits, uniqueSuccBlocks, bTarget->bbNum);
}
BitVecOps::Iter iter(&bitVecTraits, uniqueSuccBlocks);
unsigned bbNum = 0;
while (iter.NextElem(&bbNum))
{
// Note that we will output switch successors in increasing numerical bbNum order, which is
// not related to their order in the bbJumpSwt->bbsDstTab table.
printf("%s" FMT_BB, first ? "" : ",", bbNum);
first = false;
}
}
else
{
printf("%s" FMT_BB, first ? "" : ",", succ->bbNum);
first = false;
for (BasicBlock* const succ : Succs(compiler))
{
printf("%s" FMT_BB, first ? "" : ",", succ->bbNum);
first = false;
}
}
}

Expand Down
12 changes: 11 additions & 1 deletion src/coreclr/jit/block.h
Original file line number Diff line number Diff line change
Expand Up @@ -559,7 +559,7 @@ enum BasicBlockFlags : unsigned __int64

// Flags that relate blocks to loop structure.

BBF_LOOP_FLAGS = BBF_LOOP_PREHEADER | BBF_LOOP_HEAD | BBF_LOOP_CALL0 | BBF_LOOP_CALL1,
BBF_LOOP_FLAGS = BBF_LOOP_PREHEADER | BBF_LOOP_HEAD | BBF_LOOP_CALL0 | BBF_LOOP_CALL1 | BBF_LOOP_ALIGN,

// Flags to update when two blocks are compacted

Expand Down Expand Up @@ -1001,6 +1001,16 @@ struct BasicBlock : private LIR::Range
bbHndIndex = from->bbHndIndex;
}

void copyTryIndex(const BasicBlock* from)
{
bbTryIndex = from->bbTryIndex;
}

void copyHndIndex(const BasicBlock* from)
{
bbHndIndex = from->bbHndIndex;
}

static bool sameTryRegion(const BasicBlock* blk1, const BasicBlock* blk2)
{
return blk1->bbTryIndex == blk2->bbTryIndex;
Expand Down
1 change: 1 addition & 0 deletions src/coreclr/jit/clrjit.natvis
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ Documentation for VS debugger format specifiers: https://docs.microsoft.com/en-u

<Type Name="BasicBlock">
<DisplayString Condition="bbJumpKind==BBJ_COND || bbJumpKind==BBJ_ALWAYS || bbJumpKind==BBJ_LEAVE || bbJumpKind==BBJ_EHCATCHRET || bbJumpKind==BBJ_CALLFINALLY">BB{bbNum,d}->BB{bbJumpDest->bbNum,d}; {bbJumpKind,en}</DisplayString>
<DisplayString Condition="bbJumpKind==BBJ_SWITCH">BB{bbNum,d}; {bbJumpKind,en}; {bbJumpSwt->bbsCount} cases</DisplayString>
<DisplayString>BB{bbNum,d}; {bbJumpKind,en}</DisplayString>
</Type>

Expand Down
16 changes: 8 additions & 8 deletions src/coreclr/jit/compiler.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -4898,10 +4898,14 @@ void Compiler::compCompile(void** methodCodePtr, uint32_t* methodCodeSize, JitFl
//
DoPhase(this, PHASE_COMPUTE_REACHABILITY, &Compiler::fgComputeReachability);

// Scale block weights and mark run rarely blocks.
//
DoPhase(this, PHASE_SET_BLOCK_WEIGHTS, &Compiler::optSetBlockWeights);

// Discover and classify natural loops (e.g. mark iterative loops as such). Also marks loop blocks
// and sets bbWeight to the loop nesting levels.
//
DoPhase(this, PHASE_FIND_LOOPS, &Compiler::optFindLoops);
DoPhase(this, PHASE_FIND_LOOPS, &Compiler::optFindLoopsPhase);

// Clone loops with optimization opportunities, and choose one based on dynamic condition evaluation.
//
Expand Down Expand Up @@ -5444,20 +5448,16 @@ void Compiler::ResetOptAnnotations()
// The intent of this method is to update loop structure annotations, and those
// they depend on; these annotations may have become stale during optimization,
// and need to be up-to-date before running another iteration of optimizations.

//
void Compiler::RecomputeLoopInfo()
{
assert(opts.optRepeat);
assert(JitConfig.JitOptRepeatCount() > 0);
// Recompute reachability sets, dominators, and loops.
optLoopCount = 0;
optResetLoopInfo();
fgDomsComputed = false;
for (BasicBlock* const block : Blocks())
{
block->bbFlags &= ~BBF_LOOP_FLAGS;
block->bbNatLoopNum = BasicBlock::NOT_IN_LOOP;
}
fgComputeReachability();
optSetBlockWeights();
// Rebuild the loop tree annotations themselves
optFindLoops();
}
Expand Down
Loading

0 comments on commit 955a681

Please sign in to comment.