Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LoopVectorize] Teach LoopVectorizationLegality about more early exits #107004

Merged
merged 5 commits into from
Sep 19, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
62 changes: 62 additions & 0 deletions llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h
Original file line number Diff line number Diff line change
Expand Up @@ -377,6 +377,24 @@ class LoopVectorizationLegality {
return LAI->getDepChecker().getMaxSafeVectorWidthInBits();
}

/// Returns true if the loop has a speculative early exit, i.e. an
/// uncountable exit that isn't the latch block.
bool hasSpeculativeEarlyExit() const { return HasSpeculativeEarlyExit; }

/// Returns the speculative early exiting block.
BasicBlock *getSpeculativeEarlyExitingBlock() const {
assert(getUncountableExitingBlocks().size() == 1 &&
"Expected only a single uncountable exiting block");
return getUncountableExitingBlocks()[0];
}

/// Returns the destination of a speculative early exiting block.
BasicBlock *getSpeculativeEarlyExitBlock() const {
assert(getUncountableExitBlocks().size() == 1 &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both getSpeculativeEarlyExitBloc and getUncountableExitingBlocks assert that there's exactly one entry. Would it be simpler to store the pointers directly, rather than in a 1 element vector?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could do that, and would be happy to do so if people prefer. I wrote it this way because in future I thought it's possible we may support multiple early exits. @paulwalker-arm @huntergr-arm any preference?

"Expected only a single uncountable exit block");
return getUncountableExitBlocks()[0];
}

/// Returns true if vector representation of the instruction \p I
/// requires mask.
bool isMaskRequired(const Instruction *I) const {
Expand Down Expand Up @@ -404,6 +422,22 @@ class LoopVectorizationLegality {

DominatorTree *getDominatorTree() const { return DT; }

/// Returns all exiting blocks with a countable exit, i.e. the
/// exit-not-taken count is known exactly at compile time.
const SmallVector<BasicBlock *, 4> &getCountableExitingBlocks() const {
return CountableExitingBlocks;
}

/// Returns all the exiting blocks with an uncountable exit.
const SmallVector<BasicBlock *, 4> &getUncountableExitingBlocks() const {
return UncountableExitingBlocks;
}

/// Returns all the exit blocks from uncountable exiting blocks.
SmallVector<BasicBlock *, 4> getUncountableExitBlocks() const {
return UncountableExitBlocks;
}

private:
/// Return true if the pre-header, exiting and latch blocks of \p Lp and all
/// its nested loops are considered legal for vectorization. These legal
Expand Down Expand Up @@ -446,6 +480,23 @@ class LoopVectorizationLegality {
/// specific checks for outer loop vectorization.
bool canVectorizeOuterLoop();

/// Returns true if this is an early exit loop that can be vectorized.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would probably be helpful to summarize the required conditions here (or somewhere else)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

/// Currently, a loop with an uncountable early exit is considered
/// vectorizable if:
/// 1. There are no writes to memory in the loop.
/// 2. The loop has only one early uncountable exit
/// 3. The early exit block dominates the latch block.
/// 4. The latch block has an exact exit count.
/// 5. The loop does not contain reductions or recurrences.
/// 6. We can prove at compile-time that loops will not contain faulting
/// loads.
/// 7. It is safe to speculatively execute instructions such as divide or
/// call instructions.
/// The list above is not based on theoretical limitations of vectorization,
/// but simply a statement that more work is needed to support these
/// additional cases safely.
bool isVectorizableEarlyExitLoop();

/// Return true if all of the instructions in the block can be speculatively
/// executed, and record the loads/stores that require masking.
/// \p SafePtrs is a list of addresses that are known to be legal and we know
Expand Down Expand Up @@ -551,6 +602,17 @@ class LoopVectorizationLegality {
/// (potentially) make a better decision on the maximum VF and enable
/// the use of those function variants.
bool VecCallVariantsFound = false;

/// Indicates whether this loop has a speculative early exit, i.e. an
/// uncountable exiting block that is not the latch.
bool HasSpeculativeEarlyExit = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency, with SCEV terminology and the variables below, Speculative -> Uncountable?


/// Keep track of all the loop exiting blocks.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment accurate? AFAICT they only keep track of a single countable and uncountable exiting block, if the backedge taken count is not computable.

SmallVector<BasicBlock *, 4> CountableExitingBlocks;
SmallVector<BasicBlock *, 4> UncountableExitingBlocks;

/// Keep track of the destinations of all uncountable exits.
SmallVector<BasicBlock *, 4> UncountableExitBlocks;
Comment on lines +610 to +615
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of the three only UncountableExitingBlocks looks to be used by this patch. I'd just like confirmation this it intentional and will be built upon in a following PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, these will be used in follow-on patches. However, if you prefer to keep this initial patch clean I can remove them and add them later. I was just trying to reduce the size of follow-on patches by moving it into this patch that's all.

};

} // namespace llvm
Expand Down
160 changes: 150 additions & 10 deletions llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1445,6 +1445,145 @@ bool LoopVectorizationLegality::canVectorizeLoopNestCFG(
return Result;
}

bool LoopVectorizationLegality::isVectorizableEarlyExitLoop() {
BasicBlock *LatchBB = TheLoop->getLoopLatch();
if (!LatchBB) {
reportVectorizationFailure("Loop does not have a latch",
"Cannot vectorize early exit loop",
"NoLatchEarlyExit", ORE, TheLoop);
return false;
}

if (Reductions.size() || FixedOrderRecurrences.size()) {
reportVectorizationFailure(
"Found reductions or recurrences in early-exit loop",
"Cannot vectorize early exit loop with reductions or recurrences",
"RecurrencesInEarlyExitLoop", ORE, TheLoop);
return false;
}

SmallVector<BasicBlock *, 8> ExitingBlocks;
TheLoop->getExitingBlocks(ExitingBlocks);

// Keep a record of all the exiting blocks.
SmallVector<const SCEVPredicate *, 4> Predicates;
for (BasicBlock *BB1 : ExitingBlocks) {
const SCEV *EC =
PSE.getSE()->getPredicatedExitCount(TheLoop, BB1, &Predicates);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given Predicates is not used, can you pass nullptr here? From what I can see all the functions getPredicatedExitCount represent support not caring about the predicates.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that's true because if the loop really does require predicates then getPredicatedExitCount will return SCEVCouldNotCompute for the exit count and we will incorrectly treat this as an uncountable exiting block. If I pass in nullptr here the simple_early_exit.ll test file starts failing.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I didn't see this parameter controlling any code paths but I do see an assert that would fail without it.

What worried me was that you're clearing Predicates after iterating across all blocks and that made me assume it's contents didn't matter, but then I wondered why you need to manually clear it at all?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment here or below why predicates is passed and dropped below?

if (isa<SCEVCouldNotCompute>(EC)) {
UncountableExitingBlocks.push_back(BB1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it matter if BB1 has no out-of-loop successors?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's an exiting block at least one successor must be outside the loop. I've added an assert for this.


SmallVector<BasicBlock *, 2> Succs(successors(BB1));
if (Succs.size() != 2) {
reportVectorizationFailure(
"Early exiting block does not have exactly two successors",
"Incorrect number of successors from early exiting block",
"EarlyExitTooManySuccessors", ORE, TheLoop);
return false;
}

BasicBlock *BB2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clearer to name ExitBlock?

if (!TheLoop->contains(Succs[0]))
BB2 = Succs[0];
else {
assert(!TheLoop->contains(Succs[1]));
BB2 = Succs[1];
}
UncountableExitBlocks.push_back(BB2);
} else
CountableExitingBlocks.push_back(BB1);
}
Predicates.clear();

// We only support one uncountable early exit.
if (getUncountableExitingBlocks().size() != 1) {
reportVectorizationFailure(
"Loop has too many uncountable exits",
"Cannot vectorize early exit loop with more than one early exit",
"TooManyUncountableEarlyExits", ORE, TheLoop);
return false;
}

// The only supported early exit loops so far are ones where the early
// exiting block is a unique predecessor of the latch block.
Comment on lines +1507 to +1508
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This answers my earlier question but that means the code in canVectorizeMemory is fragile. Can an assert be added to canVectorizeMemory to ensure this is the case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

BasicBlock *LatchPredBB = LatchBB->getUniquePredecessor();
if (LatchPredBB != getSpeculativeEarlyExitingBlock()) {
reportVectorizationFailure("Early exit is not the latch predecessor",
"Cannot vectorize early exit loop",
"EarlyExitNotLatchPredecessor", ORE, TheLoop);
return false;
}

// Check to see if there are instructions that could potentially generate
// exceptions or have side-effects.
auto IsSafeOperation = [](Instruction *I) -> bool {
switch (I->getOpcode()) {
case Instruction::Load:
case Instruction::Store:
case Instruction::PHI:
case Instruction::Br:
// These are checked separately.
return true;
default:
return isSafeToSpeculativelyExecute(I);
}
};

for (auto *BB : TheLoop->blocks())
for (auto &I : *BB) {
if (I.mayWriteToMemory()) {
// We don't support writes to memory.
reportVectorizationFailure(
"Writes to memory unsupported in early exit loops",
"Cannot vectorize early exit loop with writes to memory",
"WritesInEarlyExitLoop", ORE, TheLoop);
return false;
} else if (!IsSafeOperation(&I)) {
reportVectorizationFailure("Early exit loop contains operations that "
"cannot be speculatively executed",
"Early exit loop contains operations that "
"cannot be speculatively executed",
"UnsafeOperationsEarlyExitLoop", ORE,
TheLoop);
return false;
}
}

// The latch block must have a countable exit.
if (isa<SCEVCouldNotCompute>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move simpler checks earlier before iterating over the whole loop? Can also check if LatchBB is in the collected countable exiting blocks?

PSE.getSE()->getPredicatedExitCount(TheLoop, LatchBB, &Predicates))) {
reportVectorizationFailure(
"Cannot determine exact exit count for latch block",
"Cannot vectorize early exit loop",
"UnknownLatchExitCountEarlyExitLoop", ORE, TheLoop);
return false;
}

// The vectoriser cannot handle loads that occur after the early exit block.
assert(LatchBB->getUniquePredecessor() == getSpeculativeEarlyExitingBlock() &&
"Expected latch predecessor to be the early exiting block");

// TODO: Handle loops that may fault.
if (!isDereferenceableReadOnlyLoop(TheLoop, PSE.getSE(), DT, AC)) {
reportVectorizationFailure(
"Loop may fault",
"Cannot vectorize potentially faulting early exit loop",
"PotentiallyFaultingEarlyExitLoop", ORE, TheLoop);
return false;
}

LLVM_DEBUG(
dbgs()
<< "LV: Found an early exit. Retrying with speculative exit count.\n");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to be consistent with SCEV terminology, i.e. symbolic max BTC. Or more specifically the computable exiting count for the loop latch, as with the current restrictions getSymbolicMaxBackedgeTakenCount() == >getPredicatedExitCount(TheLoop, Latch)?

const SCEV *SpecExitCount = PSE.getSymbolicMaxBackedgeTakenCount();
assert(!isa<SCEVCouldNotCompute>(SpecExitCount) &&
"Failed to get symbolic expression for backedge taken count");

LLVM_DEBUG(dbgs() << "LV: Found speculative backedge taken count: "
<< *SpecExitCount << '\n');
return true;
}

bool LoopVectorizationLegality::canVectorize(bool UseVPlanNativePath) {
// Store the result and return it at the end instead of exiting early, in case
// allowExtraAnalysis is used to report multiple reasons for not vectorizing.
Expand Down Expand Up @@ -1505,6 +1644,17 @@ bool LoopVectorizationLegality::canVectorize(bool UseVPlanNativePath) {
return false;
}

HasSpeculativeEarlyExit = false;
if (isa<SCEVCouldNotCompute>(PSE.getBackedgeTakenCount())) {
if (!isVectorizableEarlyExitLoop()) {
if (DoExtraAnalysis)
Result = false;
else
return false;
} else
HasSpeculativeEarlyExit = true;
}

// Go over each instruction and look at memory deps.
if (!canVectorizeMemory()) {
LLVM_DEBUG(dbgs() << "LV: Can't vectorize due to memory conflicts\n");
Expand All @@ -1514,16 +1664,6 @@ bool LoopVectorizationLegality::canVectorize(bool UseVPlanNativePath) {
return false;
}

if (isa<SCEVCouldNotCompute>(PSE.getBackedgeTakenCount())) {
reportVectorizationFailure("could not determine number of loop iterations",
"could not determine number of loop iterations",
"CantComputeNumberOfIterations", ORE, TheLoop);
if (DoExtraAnalysis)
Result = false;
else
return false;
}

if (Result) {
LLVM_DEBUG(dbgs() << "LV: We can vectorize this loop"
<< (LAI->getRuntimePointerChecking()->Need
Expand Down
8 changes: 8 additions & 0 deletions llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -9815,6 +9815,14 @@ bool LoopVectorizePass::processLoop(Loop *L) {
return false;
}

if (LVL.hasSpeculativeEarlyExit()) {
reportVectorizationFailure(
"Auto-vectorization of early exit loops is not yet supported.",
"Auto-vectorization of early exit loops is not yet supported.",
"EarlyExitLoopsUnsupported", ORE, L);
return false;
}

// Entrance to the VPlan-native vectorization path. Outer loops are processed
// here. They may require CFG and instruction level transformations before
// even evaluating whether vectorization is profitable. Since we cannot modify
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
; }
; }
; File, line, and column should match those specified in the metadata
; CHECK: remark: source.cpp:5:9: loop not vectorized: could not determine number of loop iterations
; CHECK: remark: source.cpp:5:9: loop not vectorized: Cannot vectorize early exit loop
; CHECK: remark: source.cpp:5:9: loop not vectorized

; void test_disabled(int *A, int Length) {
Expand Down Expand Up @@ -46,12 +46,12 @@

; YAML: --- !Analysis
; YAML-NEXT: Pass: loop-vectorize
; YAML-NEXT: Name: CantComputeNumberOfIterations
; YAML-NEXT: Name: EarlyExitNotLatchPredecessor
; YAML-NEXT: DebugLoc: { File: source.cpp, Line: 5, Column: 9 }
; YAML-NEXT: Function: _Z4testPii
; YAML-NEXT: Args:
; YAML-NEXT: - String: 'loop not vectorized: '
; YAML-NEXT: - String: could not determine number of loop iterations
; YAML-NEXT: - String: Cannot vectorize early exit loop
; YAML-NEXT: ...
; YAML-NEXT: --- !Missed
; YAML-NEXT: Pass: loop-vectorize
Expand Down Expand Up @@ -117,12 +117,12 @@
; YAML-NEXT: ...
; YAML-NEXT: --- !Analysis
; YAML-NEXT: Pass: loop-vectorize
; YAML-NEXT: Name: CantComputeNumberOfIterations
; YAML-NEXT: Name: EarlyExitNotLatchPredecessor
; YAML-NEXT: DebugLoc: { File: source.cpp, Line: 27, Column: 3 }
; YAML-NEXT: Function: test_multiple_failures
; YAML-NEXT: Args:
; YAML-NEXT: - String: 'loop not vectorized: '
; YAML-NEXT: - String: could not determine number of loop iterations
; YAML-NEXT: - String: Cannot vectorize early exit loop
; YAML-NEXT: ...
; YAML: --- !Missed
; YAML-NEXT: Pass: loop-vectorize
Expand Down
2 changes: 1 addition & 1 deletion llvm/test/Transforms/LoopVectorize/control-flow.ll
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
; return 0;
; }

; CHECK: remark: source.cpp:5:9: loop not vectorized: could not determine number of loop iterations
; CHECK: remark: source.cpp:5:9: loop not vectorized: Cannot vectorize early exit loop with writes to memory
; CHECK: remark: source.cpp:5:9: loop not vectorized

; CHECK: _Z4testPii
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
; Make sure LV does not crash when generating remarks for loops with non-unique
; exit blocks.
define i32 @test_non_unique_exit_blocks(ptr nocapture readonly align 4 dereferenceable(1024) %data, i32 %x) {
; CHECK: loop not vectorized: could not determine number of loop iterations
; CHECK: loop not vectorized: Cannot vectorize early exit loop
;
entry:
br label %for.header
Expand Down
Loading
Loading