-
Notifications
You must be signed in to change notification settings - Fork 12.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LoopVectorize] Teach LoopVectorizationLegality about more early exits #107004
Conversation
@llvm/pr-subscribers-llvm-analysis @llvm/pr-subscribers-llvm-transforms Author: David Sherwood (david-arm) ChangesThis patch is split off from PR #88385 and concerns only the code related to the legality of vectorising early exit loops. It is the first step in adding support for vectorisation of a simple class of loops that typically involves searching for something, i.e. for (int i = 0; i < n; i++) { or for (int i = 0; i < n; i++) { In this initial commit LoopVectorizationLegality will only consider early exit loops legal for vectorising if they follow these criteria:
Tests have been added here: Transforms/LoopVectorize/AArch64/simple_early_exit.ll Patch is 82.58 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/107004.diff 9 Files Affected:
diff --git a/llvm/include/llvm/Analysis/LoopAccessAnalysis.h b/llvm/include/llvm/Analysis/LoopAccessAnalysis.h
index 73d9c26ed6b1b7..5ea2202c4d4559 100644
--- a/llvm/include/llvm/Analysis/LoopAccessAnalysis.h
+++ b/llvm/include/llvm/Analysis/LoopAccessAnalysis.h
@@ -660,7 +660,8 @@ class LoopAccessInfo {
bool isInvariant(Value *V) const;
unsigned getNumStores() const { return NumStores; }
- unsigned getNumLoads() const { return NumLoads;}
+ unsigned getNumLoads() const { return NumLoads; }
+ unsigned getNumCalls() const { return NumCalls; }
/// The diagnostics report generated for the analysis. E.g. why we
/// couldn't analyze the loop.
@@ -753,6 +754,7 @@ class LoopAccessInfo {
unsigned NumLoads = 0;
unsigned NumStores = 0;
+ unsigned NumCalls = 0;
/// Cache the result of analyzeLoop.
bool CanVecMem = false;
diff --git a/llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h b/llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h
index 0f4d1355dd2bfe..bb21f97c0ae521 100644
--- a/llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h
+++ b/llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h
@@ -377,6 +377,24 @@ class LoopVectorizationLegality {
return LAI->getDepChecker().getMaxSafeVectorWidthInBits();
}
+ /// Returns true if the loop has a speculative early exit, i.e. an
+ /// uncountable exit that isn't the latch block.
+ bool hasSpeculativeEarlyExit() const { return HasSpeculativeEarlyExit; }
+
+ /// Returns the speculative early exiting block.
+ BasicBlock *getSpeculativeEarlyExitingBlock() const {
+ assert(getUncountableExitingBlocks().size() == 1 &&
+ "Expected only a single uncountable exiting block");
+ return getUncountableExitingBlocks()[0];
+ }
+
+ /// Returns the destination of a speculative early exiting block.
+ BasicBlock *getSpeculativeEarlyExitBlock() const {
+ assert(getUncountableExitBlocks().size() == 1 &&
+ "Expected only a single uncountable exit block");
+ return getUncountableExitBlocks()[0];
+ }
+
/// Returns true if vector representation of the instruction \p I
/// requires mask.
bool isMaskRequired(const Instruction *I) const {
@@ -404,6 +422,22 @@ class LoopVectorizationLegality {
DominatorTree *getDominatorTree() const { return DT; }
+ /// Returns all exiting blocks with a countable exit, i.e. the
+ /// exit-not-taken count is known exactly at compile time.
+ const SmallVector<BasicBlock *, 4> &getCountableExitingBlocks() const {
+ return CountableExitingBlocks;
+ }
+
+ /// Returns all the exiting blocks with an uncountable exit.
+ const SmallVector<BasicBlock *, 4> &getUncountableExitingBlocks() const {
+ return UncountableExitingBlocks;
+ }
+
+ /// Returns all the exit blocks from uncountable exiting blocks.
+ SmallVector<BasicBlock *, 4> getUncountableExitBlocks() const {
+ return UncountableExitBlocks;
+ }
+
private:
/// Return true if the pre-header, exiting and latch blocks of \p Lp and all
/// its nested loops are considered legal for vectorization. These legal
@@ -436,7 +470,7 @@ class LoopVectorizationLegality {
/// we read and write from memory. This method checks if it is
/// legal to vectorize the code, considering only memory constrains.
/// Returns true if the loop is vectorizable
- bool canVectorizeMemory();
+ bool canVectorizeMemory(bool IsEarlyExitLoop);
/// Return true if we can vectorize this loop using the IF-conversion
/// transformation.
@@ -446,6 +480,10 @@ class LoopVectorizationLegality {
/// specific checks for outer loop vectorization.
bool canVectorizeOuterLoop();
+ /// Returns true if this is a supported early exit loop that we can
+ /// vectorize.
+ bool isVectorizableEarlyExitLoop();
+
/// Return true if all of the instructions in the block can be speculatively
/// executed, and record the loads/stores that require masking.
/// \p SafePtrs is a list of addresses that are known to be legal and we know
@@ -551,6 +589,16 @@ class LoopVectorizationLegality {
/// (potentially) make a better decision on the maximum VF and enable
/// the use of those function variants.
bool VecCallVariantsFound = false;
+
+ /// Indicates whether this loop has a speculative early exit, i.e. an
+ /// uncountable exiting block that is not the latch.
+ bool HasSpeculativeEarlyExit = false;
+
+ /// Keeps track of all the exits with known or countable exit-not-taken
+ /// counts.
+ SmallVector<BasicBlock *, 4> CountableExitingBlocks;
+ SmallVector<BasicBlock *, 4> UncountableExitingBlocks;
+ SmallVector<BasicBlock *, 4> UncountableExitBlocks;
};
} // namespace llvm
diff --git a/llvm/lib/Analysis/LoopAccessAnalysis.cpp b/llvm/lib/Analysis/LoopAccessAnalysis.cpp
index 980f142f113265..bf367a77aae1ce 100644
--- a/llvm/lib/Analysis/LoopAccessAnalysis.cpp
+++ b/llvm/lib/Analysis/LoopAccessAnalysis.cpp
@@ -2445,8 +2445,11 @@ bool LoopAccessInfo::analyzeLoop(AAResults *AA, const LoopInfo *LI,
// vectorize a loop if it contains known function calls that don't set
// the flag. Therefore, it is safe to ignore this read from memory.
auto *Call = dyn_cast<CallInst>(&I);
- if (Call && getVectorIntrinsicIDForCall(Call, TLI))
- continue;
+ if (Call) {
+ NumCalls++;
+ if (getVectorIntrinsicIDForCall(Call, TLI))
+ continue;
+ }
// If this is a load, save it. If this instruction can read from memory
// but is not a load, then we quit. Notice that we don't handle function
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
index 66a779da8c25bc..3efdfd1fa2bc6e 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
@@ -1048,7 +1048,7 @@ bool LoopVectorizationLegality::canVectorizeInstrs() {
return true;
}
-bool LoopVectorizationLegality::canVectorizeMemory() {
+bool LoopVectorizationLegality::canVectorizeMemory(bool IsEarlyExitLoop) {
LAI = &LAIs.getInfo(*TheLoop);
const OptimizationRemarkAnalysis *LAR = LAI->getReport();
if (LAR) {
@@ -1070,6 +1070,50 @@ bool LoopVectorizationLegality::canVectorizeMemory() {
return false;
}
+ // For loops with uncountable early exiting blocks that are not the latch
+ // it's necessary to perform extra checks, since the vectoriser is currently
+ // only capable of handling simple search loops.
+ if (IsEarlyExitLoop) {
+ // We don't support calls or any memory accesses that write to memory.
+ if (LAI->getNumStores()) {
+ reportVectorizationFailure(
+ "Writes to memory unsupported in early exit loops",
+ "Cannot vectorize early exit loop with writes to memory",
+ "WritesInEarlyExitLoop", ORE, TheLoop);
+ return false;
+ }
+
+ if (LAI->getNumCalls()) {
+ reportVectorizationFailure(
+ "Calls unsupported in early exit loops",
+ "Cannot vectorize early exit loop with function calls",
+ "CallsInEarlyExitLoop", ORE, TheLoop);
+ return false;
+ }
+
+ // The vectoriser cannot handle loads that occur after the early exit block.
+ BasicBlock *LatchBB = TheLoop->getLoopLatch();
+ for (Instruction &I : *LatchBB) {
+ if (I.mayReadFromMemory()) {
+ reportVectorizationFailure(
+ "Loads not permitted after early exit",
+ "Cannot vectorize early exit loop with loads after early exit",
+ "LoadsAfterEarlyExit", ORE, TheLoop);
+ return false;
+ }
+ }
+
+ // The vectoriser does not yet handle loops that may fault, but this will
+ // be improved in a follow-on patch.
+ if (!isDereferenceableReadOnlyLoop(TheLoop, PSE.getSE(), DT, AC)) {
+ reportVectorizationFailure(
+ "Loop may fault",
+ "Cannot vectorize potentially faulting early exit loop",
+ "PotentiallyFaultingEarlyExitLoop", ORE, TheLoop);
+ return false;
+ }
+ }
+
// We can vectorize stores to invariant address when final reduction value is
// guaranteed to be stored at the end of the loop. Also, if decision to
// vectorize loop is made, runtime checks are added so as to make sure that
@@ -1442,6 +1486,95 @@ bool LoopVectorizationLegality::canVectorizeLoopNestCFG(
return Result;
}
+bool LoopVectorizationLegality::isVectorizableEarlyExitLoop() {
+ // At least one of the exiting blocks must be the latch.
+ BasicBlock *LatchBB = TheLoop->getLoopLatch();
+ if (!LatchBB) {
+ reportVectorizationFailure("Loop does not have a latch",
+ "Cannot vectorize early exit loop",
+ "NoLatchEarlyExit", ORE, TheLoop);
+ return false;
+ }
+
+ SmallVector<BasicBlock *, 8> ExitingBlocks;
+ TheLoop->getExitingBlocks(ExitingBlocks);
+
+ // Keep a record of all the exiting blocks with exact exit counts, as well as
+ // those with inexact counts.
+ SmallVector<const SCEVPredicate *, 4> Predicates;
+ for (BasicBlock *BB1 : ExitingBlocks) {
+ const SCEV *EC =
+ PSE.getSE()->getPredicatedExitCount(TheLoop, BB1, &Predicates);
+ if (isa<SCEVCouldNotCompute>(EC)) {
+ UncountableExitingBlocks.push_back(BB1);
+
+ unsigned NumExitBlocks = 0;
+ for (BasicBlock *BB2 : successors(BB1)) {
+ if (!TheLoop->contains(BB2)) {
+ UncountableExitBlocks.push_back(BB2);
+ NumExitBlocks++;
+ }
+ }
+ if (NumExitBlocks > 1) {
+ reportVectorizationFailure(
+ "Early exiting block has more than one successor outside of loop",
+ "Too many successors from early exiting block",
+ "EarlyExitTooManySuccessors", ORE, TheLoop);
+ return false;
+ }
+ } else
+ CountableExitingBlocks.push_back(BB1);
+ }
+ Predicates.clear();
+
+ // We only support one uncountable early exit.
+ if (getUncountableExitingBlocks().size() != 1) {
+ reportVectorizationFailure(
+ "Loop has too many uncountable exits",
+ "Cannot vectorize early exit loop with more than one early exit",
+ "TooManyUncountableEarlyExits", ORE, TheLoop);
+ return false;
+ }
+
+ // The only supported early exit loops so far are ones where the early
+ // exiting block is a unique predecessor of the latch block.
+ BasicBlock *LatchPredBB = LatchBB->getUniquePredecessor();
+ if (!LatchPredBB || LatchPredBB != getUncountableExitingBlocks()[0]) {
+ reportVectorizationFailure("Early exit is not the latch predecessor",
+ "Cannot vectorize early exit loop",
+ "EarlyExitNotLatchPredecessor", ORE, TheLoop);
+ return false;
+ }
+
+ if (Reductions.size() || FixedOrderRecurrences.size()) {
+ reportVectorizationFailure(
+ "Found reductions or recurrences in early-exit loop",
+ "Cannot vectorize early exit loop with reductions or recurrences",
+ "RecurrencesInEarlyExitLoop", ORE, TheLoop);
+ return false;
+ }
+
+ LLVM_DEBUG(
+ dbgs()
+ << "LV: Found an early exit. Retrying with speculative exit count.\n");
+ if (isa<SCEVCouldNotCompute>(
+ PSE.getSE()->getPredicatedExitCount(TheLoop, LatchBB, &Predicates))) {
+ reportVectorizationFailure(
+ "Cannot determine exact exit count for latch block",
+ "Cannot vectorize early exit loop",
+ "UnknownLatchExitCountEarlyExitLoop", ORE, TheLoop);
+ return false;
+ }
+
+ const SCEV *SpecExitCount = PSE.getSymbolicMaxBackedgeTakenCount();
+ assert(!isa<SCEVCouldNotCompute>(SpecExitCount) &&
+ "Failed to get symbolic expression for backedge taken count");
+
+ LLVM_DEBUG(dbgs() << "LV: Found speculative backedge taken count: "
+ << *SpecExitCount << '\n');
+ return true;
+}
+
bool LoopVectorizationLegality::canVectorize(bool UseVPlanNativePath) {
// Store the result and return it at the end instead of exiting early, in case
// allowExtraAnalysis is used to report multiple reasons for not vectorizing.
@@ -1500,19 +1633,20 @@ bool LoopVectorizationLegality::canVectorize(bool UseVPlanNativePath) {
return false;
}
- // Go over each instruction and look at memory deps.
- if (!canVectorizeMemory()) {
- LLVM_DEBUG(dbgs() << "LV: Can't vectorize due to memory conflicts\n");
- if (DoExtraAnalysis)
- Result = false;
- else
- return false;
+ HasSpeculativeEarlyExit = false;
+ if (isa<SCEVCouldNotCompute>(PSE.getBackedgeTakenCount())) {
+ if (!isVectorizableEarlyExitLoop()) {
+ if (DoExtraAnalysis)
+ Result = false;
+ else
+ return false;
+ } else
+ HasSpeculativeEarlyExit = true;
}
- if (isa<SCEVCouldNotCompute>(PSE.getBackedgeTakenCount())) {
- reportVectorizationFailure("could not determine number of loop iterations",
- "could not determine number of loop iterations",
- "CantComputeNumberOfIterations", ORE, TheLoop);
+ // Go over each instruction and look at memory deps.
+ if (!canVectorizeMemory(HasSpeculativeEarlyExit)) {
+ LLVM_DEBUG(dbgs() << "LV: Can't vectorize due to memory conflicts\n");
if (DoExtraAnalysis)
Result = false;
else
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index a8722db654f5c9..bef7aa264157ce 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -9802,6 +9802,12 @@ bool LoopVectorizePass::processLoop(Loop *L) {
return false;
}
+ if (LVL.hasSpeculativeEarlyExit()) {
+ LLVM_DEBUG(dbgs() << "LV: Not vectorizing: Auto-vectorization of early "
+ << "exit loops is not yet supported.\n");
+ return false;
+ }
+
// Entrance to the VPlan-native vectorization path. Outer loops are processed
// here. They may require CFG and instruction level transformations before
// even evaluating whether vectorization is profitable. Since we cannot modify
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/simple_early_exit.ll b/llvm/test/Transforms/LoopVectorize/AArch64/simple_early_exit.ll
new file mode 100644
index 00000000000000..d677598399f1c2
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/simple_early_exit.ll
@@ -0,0 +1,1584 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 4
+; RUN: opt -S < %s -p loop-vectorize -mattr=+sve -debug-only=loop-vectorize \
+; RUN: -mtriple aarch64-linux-gnu 2>%t | FileCheck %s --check-prefixes=CHECK
+; RUN: cat %t | FileCheck %s --check-prefix=DEBUG
+
+declare void @init_mem(ptr, i64);
+
+define i64 @same_exit_block_pre_inc_use1() {
+; DEBUG-LABEL: LV: Checking a loop in 'same_exit_block_pre_inc_use1'
+; DEBUG: LV: Found an early exit. Retrying with speculative exit count.
+; DEBUG-NEXT: LV: Found speculative backedge taken count: 63
+; DEBUG-NEXT: LV: We can vectorize this loop!
+; DEBUG-NEXT: LV: Not vectorizing: Auto-vectorization of early exit loops is not yet supported.
+; CHECK-LABEL: define i64 @same_exit_block_pre_inc_use1(
+; CHECK-SAME: ) #[[ATTR0:[0-9]+]] {
+; CHECK-NEXT: entry:
+; CHECK-NEXT: [[P1:%.*]] = alloca [1024 x i8], align 4
+; CHECK-NEXT: [[P2:%.*]] = alloca [1024 x i8], align 4
+; CHECK-NEXT: call void @init_mem(ptr [[P1]], i64 1024)
+; CHECK-NEXT: call void @init_mem(ptr [[P2]], i64 1024)
+; CHECK-NEXT: br label [[LAND_RHS:%.*]]
+; CHECK: loop:
+; CHECK-NEXT: [[INDEX:%.*]] = phi i64 [ [[INDEX_NEXT:%.*]], [[FOR_INC:%.*]] ], [ 3, [[ENTRY:%.*]] ]
+; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i8, ptr [[P1]], i64 [[INDEX]]
+; CHECK-NEXT: [[TMP38:%.*]] = load i8, ptr [[ARRAYIDX]], align 1
+; CHECK-NEXT: [[ARRAYIDX1:%.*]] = getelementptr inbounds i8, ptr [[P2]], i64 [[INDEX]]
+; CHECK-NEXT: [[TMP39:%.*]] = load i8, ptr [[ARRAYIDX1]], align 1
+; CHECK-NEXT: [[CMP3:%.*]] = icmp eq i8 [[TMP38]], [[TMP39]]
+; CHECK-NEXT: br i1 [[CMP3]], label [[FOR_INC]], label [[FOR_END_LOOPEXIT:%.*]]
+; CHECK: loop.inc:
+; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 1
+; CHECK-NEXT: [[EXITCOND:%.*]] = icmp ne i64 [[INDEX_NEXT]], 67
+; CHECK-NEXT: br i1 [[EXITCOND]], label [[LAND_RHS]], label [[FOR_END_LOOPEXIT]]
+; CHECK: loop.end:
+; CHECK-NEXT: [[START_0_LCSSA:%.*]] = phi i64 [ [[INDEX]], [[LAND_RHS]] ], [ 67, [[FOR_INC]] ]
+; CHECK-NEXT: ret i64 [[START_0_LCSSA]]
+;
+entry:
+ %p1 = alloca [1024 x i8]
+ %p2 = alloca [1024 x i8]
+ call void @init_mem(ptr %p1, i64 1024)
+ call void @init_mem(ptr %p2, i64 1024)
+ br label %loop
+
+loop:
+ %index = phi i64 [ %index.next, %loop.inc ], [ 3, %entry ]
+ %arrayidx = getelementptr inbounds i8, ptr %p1, i64 %index
+ %ld1 = load i8, ptr %arrayidx, align 1
+ %arrayidx1 = getelementptr inbounds i8, ptr %p2, i64 %index
+ %ld2 = load i8, ptr %arrayidx1, align 1
+ %cmp3 = icmp eq i8 %ld1, %ld2
+ br i1 %cmp3, label %loop.inc, label %loop.end
+
+loop.inc:
+ %index.next = add i64 %index, 1
+ %exitcond = icmp ne i64 %index.next, 67
+ br i1 %exitcond, label %loop, label %loop.end
+
+loop.end:
+ %retval = phi i64 [ %index, %loop ], [ 67, %loop.inc ]
+ ret i64 %retval
+}
+
+
+define i64 @same_exit_block_pre_inc_use1_gep_two_indices() {
+; CHECK-LABEL: define i64 @same_exit_block_pre_inc_use1_gep_two_indices(
+; CHECK-SAME: ) #[[ATTR0]] {
+; CHECK-NEXT: entry:
+; CHECK-NEXT: [[P1:%.*]] = alloca [1024 x i8], align 4
+; CHECK-NEXT: [[P2:%.*]] = alloca [1024 x i8], align 4
+; CHECK-NEXT: call void @init_mem(ptr [[P1]], i64 1024)
+; CHECK-NEXT: call void @init_mem(ptr [[P2]], i64 1024)
+; CHECK-NEXT: br label [[LOOP:%.*]]
+; CHECK: loop:
+; CHECK-NEXT: [[INDEX:%.*]] = phi i64 [ [[INDEX_NEXT:%.*]], [[LOOP_INC:%.*]] ], [ 3, [[ENTRY:%.*]] ]
+; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds [1024 x i8], ptr [[P1]], i64 0, i64 [[INDEX]]
+; CHECK-NEXT: [[LD1:%.*]] = load i8, ptr [[ARRAYIDX]], align 1
+; CHECK-NEXT: [[ARRAYIDX1:%.*]] = getelementptr inbounds [1024 x i8], ptr [[P2]], i64 0, i64 [[INDEX]]
+; CHECK-NEXT: [[LD2:%.*]] = load i8, ptr [[ARRAYIDX1]], align 1
+; CHECK-NEXT: [[CMP3:%.*]] = icmp eq i8 [[LD1]], [[LD2]]
+; CHECK-NEXT: br i1 [[CMP3]], label [[LOOP_INC]], label [[LOOP_END:%.*]]
+; CHECK: loop.inc:
+; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 1
+; CHECK-NEXT: [[EXITCOND:%.*]] = icmp ne i64 [[INDEX_NEXT]], 67
+; CHECK-NEXT: br i1 [[EXITCOND]], label [[LOOP]], label [[LOOP_END]]
+; CHECK: loop.end:
+; CHECK-NEXT: [[RETVAL:%.*]] = phi i64 [ [[INDEX]], [[LOOP]] ], [ 67, [[LOOP_INC]] ]
+; CHECK-NEXT: ret i64 [[RETVAL]]
+;
+entry:
+ %p1 = alloca [1024 x i8]
+ %p2 = alloca [1024 x i8]
+ call void @init_mem(ptr %p1, i64 1024)
+ call void @init_mem(ptr %p2, i64 1024)
+ br label %loop
+
+loop:
+ %index = phi i64 [ %index.next, %loop.inc ], [ 3, %entry ]
+ %arrayidx = getelementptr inbounds [1024 x i8], ptr %p1, i64 0, i64 %index
+ %ld1 = load i8, ptr %arrayidx, align 1
+ %arrayidx1 = getelementptr inbounds [1024 x i8], ptr %p2, i64 0, i64 %index
+ %ld2 = load i8, ptr %arrayidx1, align 1
+ %cmp3 = icmp eq i8 %ld1, %ld2
+ br i1 %cmp3, label %loop.inc, label %loop.end
+
+loop.inc:
+ %index.next = add i64 %index, 1
+ %exitcond = icmp ne i64 %index.next, 67
+ br i1 %exitcond, label %loop, label %loop.end
+
+loop.end:
+ %retval = phi i64 [ %index, %loop ], [ 67, %loop.inc ]
+ ret i64 %retval
+}
+
+
+define i64 @same_exit_block_pre_inc_use1_alloca_diff_type() {
+; CHECK-LABEL: define i64 @same_exit_block_pre_inc_use1_alloca_diff_type(
+; CHECK-SAME: ) #[[ATTR0]] {
+; CHECK-NEXT: entry:
+; CHECK-NEXT: [[P1:%.*]] = alloca [40 x i32], align 4
+; CHECK-NEXT: [[P2:%.*]] = alloca [40 x i32], align 4
+; CHECK-NEXT: call void @init_mem(ptr [[P1]], i64 1024)
+; CHECK-NEXT: call void @init_mem(ptr [[P2]], i64 1024)
+; CHECK-NEXT: br label [[LAND_RHS:%.*]]
+; CHECK: loop:
+; CHECK-NEXT: [[INDEX:%.*]] = phi i64 [ [[INDEX_NEXT:%.*]], [[FOR_INC:%.*]] ], [ 3, [[ENTRY:%.*]] ]
+; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i8, ptr [[P1]], i64 [[INDEX]]
+; CHECK-NEXT: [[TMP38:%.*]] = load i8, ptr [[ARRAYIDX]], align 1
+; CHECK-NEXT: [[ARRAYIDX1:%.*]] = getelementptr inbounds i8, ...
[truncated]
|
NumExitBlocks++; | ||
} | ||
} | ||
if (NumExitBlocks > 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If BB1
ends with conditional BranchInst, it has two jumps. Both jumps can't go outside of the loop; otherwise, BB1
wouldn't belong to TheLoop
. So, the only way BB1
can have multiple Exit Blocks is when it ends with SwitchInst. So, couldn't we check if BB1 ends with a switch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may be right at the moment that the only way this can happen is with a switch statement, but I can't 100% convince myself of that. :) That's why I wrote this in a more paranoid way by checking the number of blocks manually. What I could do is bail out early if the number of successors > 2, then add the single successor outside of the loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switch support has only be added recently, so would probably be good to add a test if possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I already have with @early_exit_has_multiple_outside_successors
, unless you were thinking of a different test?
/// Returns true if this is a supported early exit loop that we can | ||
/// vectorize. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does "supported" mean in this context? Would it suffice to say "Returns true if this is an early exit loop that can be vectorized".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
/// uncountable exiting block that is not the latch. | ||
bool HasSpeculativeEarlyExit = false; | ||
|
||
/// Keeps track of all the exits with known or countable exit-not-taken |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be "known or uncountable"? Although if true then doesn't that cover the entire space and thus "Keeps track of all loop exits." would suffice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adjusted the comment - hopefully makes more sense!
@@ -1070,6 +1070,50 @@ bool LoopVectorizationLegality::canVectorizeMemory() { | |||
return false; | |||
} | |||
|
|||
// For loops with uncountable early exiting blocks that are not the latch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"that are not the latch" can be dropped because by definition the latch block cannot be early exiting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is true, because you can have loops like this:
while.cond: ; preds = %while.body, %entry
%i.0 = phi i32 [ %start, %entry ], [ %inc, %while.body ]
%inc = add i32 %i.0, 1
%cmp.not = icmp eq i32 %inc, %end
br i1 %cmp.not, label %cleanup, label %while.body
while.body: ; preds = %while.cond
%idxprom = zext i32 %inc to i64
%arrayidx = getelementptr inbounds i32, ptr %p, i64 %idxprom
%0 = load i32, ptr %arrayidx, align 4, !tbaa !6
%cmp1 = icmp eq i32 %0, 7
br i1 %cmp1, label %cleanup, label %while.cond, !llvm.loop !10
cleanup: ; preds = %while.cond, %while.body
%retval.0 = phi i32 [ 1, %while.body ], [ 0, %while.cond ]
ret i32 %retval.0
We do see code like this being generated from C:
unsigned foo(int *p, unsigned start, unsigned end) {
unsigned i = start;
while (++i != end)
if (p[i] == 7)
return 1;
return 0;
}
In this case the latch block either jumps to the beginning of the loop or takes an early exit.
if (Call && getVectorIntrinsicIDForCall(Call, TLI)) | ||
continue; | ||
if (Call) { | ||
NumCalls++; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the use, is this too broad? For example, would it prevent loops that call sqrt intrinsics from being vectorisable? What about keeping a count of non-store instructions that can modify memory? This can be added to the code below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I was probably being overly cautious here, but I'm worried about more than just stores. The function call could have side-effects, such as changing system state or generating exceptions. I might be able to relax this to permit calls to intrinsics that have no side-effects, and do not write to memory. I'll see how easy this is to do - it would require adding new tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This now makes me wonder about operations such as integer divides that may generate exceptions and whether or not such operations should be considered legal in an early exit block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can allow any instruction that may trigger UB or a trap in the header if it would be executed unconditionally in the vectorized loop, but this should probably be checked outside of LAA, as it is irrelevant for memory dependence analysis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. It might make more sense to analyse all instructions in the early exiting block by checking for I.mayHaveSideEffects(), I.mayThrow(), I.mayReadOrWriteMemory(), etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've reverted this code in favour of calling isSafeToSpeculativelyExecute
from LoopVectorizationLegality. This covers more cases such as divides, while at the same time permitting calls to intrinsics such as sqrt.
// The vectoriser does not yet handle loops that may fault, but this will | ||
// be improved in a follow-on patch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest "TODO: Handle loops that may fault."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Up to you but my suggestion was to replace the original comment with the TODO.
BasicBlock *LatchBB = TheLoop->getLoopLatch(); | ||
for (Instruction &I : *LatchBB) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a naive question but is the latch block the only one that needs to be check? Are there no other blocks between the existing an latch blocks?
I answered my own question but leaving comment here because it is referenced below.
// Keep a record of all the exiting blocks with exact exit counts, as well as | ||
// those with inexact counts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above, isn't this just all exiting blocks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
const SCEV *EC = | ||
PSE.getSE()->getPredicatedExitCount(TheLoop, BB1, &Predicates); | ||
if (isa<SCEVCouldNotCompute>(EC)) { | ||
UncountableExitingBlocks.push_back(BB1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it matter if BB1 has no out-of-loop successors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's an exiting block at least one successor must be outside the loop. I've added an assert for this.
// The only supported early exit loops so far are ones where the early | ||
// exiting block is a unique predecessor of the latch block. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This answers my earlier question but that means the code in canVectorizeMemory
is fragile. Can an assert be added to canVectorizeMemory
to ensure this is the case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
if (Reductions.size() || FixedOrderRecurrences.size()) { | ||
reportVectorizationFailure( | ||
"Found reductions or recurrences in early-exit loop", | ||
"Cannot vectorize early exit loop with reductions or recurrences", | ||
"RecurrencesInEarlyExitLoop", ORE, TheLoop); | ||
return false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a quick test that's worth having early before having to iterate across basic blocks etc...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
// The vectoriser cannot handle loads that occur after the early exit block. | ||
BasicBlock *LatchBB = TheLoop->getLoopLatch(); | ||
assert(LatchBB->getUniquePredecessor() == | ||
getUncountableExitingBlocks()[0] && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getSpeculativeEarlyExitingBlock()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
// The vectoriser does not yet handle loops that may fault, but this will | ||
// be improved in a follow-on patch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Up to you but my suggestion was to replace the original comment with the TODO.
// The only supported early exit loops so far are ones where the early | ||
// exiting block is a unique predecessor of the latch block. | ||
BasicBlock *LatchPredBB = LatchBB->getUniquePredecessor(); | ||
if (!LatchPredBB || LatchPredBB != getUncountableExitingBlocks()[0]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could use getUncountableExitingBlocks()
? Also, perhaps !LatchPredBB
is redundant given it'll fail the !=
test anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
LLVM_DEBUG(dbgs() << "LV: Not vectorizing: Auto-vectorization of early " | ||
<< "exit loops is not yet supported.\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is a reportVectorizationFailure
required here until vectorisation is enabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
SmallVector<const SCEVPredicate *, 4> Predicates; | ||
for (BasicBlock *BB1 : ExitingBlocks) { | ||
const SCEV *EC = | ||
PSE.getSE()->getPredicatedExitCount(TheLoop, BB1, &Predicates); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given Predicates
is not used, can you pass nullptr
here? From what I can see all the functions getPredicatedExitCount
represent support not caring about the predicates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that's true because if the loop really does require predicates then getPredicatedExitCount
will return SCEVCouldNotCompute for the exit count and we will incorrectly treat this as an uncountable exiting block. If I pass in nullptr
here the simple_early_exit.ll
test file starts failing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I didn't see this parameter controlling any code paths but I do see an assert that would fail without it.
What worried me was that you're clearing Predicates after iterating across all blocks and that made me assume it's contents didn't matter, but then I wondered why you need to manually clear it at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a comment here or below why predicates is passed and dropped below?
// For loops with uncountable early exiting blocks that are not the latch | ||
// it's necessary to perform extra checks, since the vectoriser is currently | ||
// only capable of handling simple search loops. | ||
if (IsEarlyExitLoop) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what others think but I see canVectorizeMemory
as being about load/store address analysis rather than "the only place to analyse load/store instructions". In this case you don't care about the addresses, you just want to guard against introducing side effects which I think can all be contained in isVectorizableEarlyExitLoop
?
It kind of feels odd for canVectorizeMemory
to depend on work done by isVectorizableEarlyExitLoop
, when the "reason to reject an instruction" is largely the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO it would make sense to keep the checks together in isVectorizableEarlyExitLoop
in particular with the extra checks recently added there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
for (auto *BB : TheLoop->blocks()) | ||
for (auto &I : *BB) | ||
if (!IsSafeOperation(&I)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do early exiting loops introduce any special requirement here? I initially thought you'd restrict based on safe-to-speculate for only the blocks between the exit block and the latch block. Which, given the current restriction, means just the latch block along the same lines as why you disallow loads in the latch block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you have to restrict for all blocks up to and including the early exit too, because any operation after the vector lane that triggers the exit could be unsafe. For example, see test @loop_contains_unsafe_div
where the early exit block looks like this:
loop:
%index = phi i64 [ %index.next, %loop.inc ], [ 3, %entry ]
%arrayidx = getelementptr inbounds i8, ptr %p1, i64 %index
%ld1 = load i32, ptr %arrayidx, align 1
%div = udiv i32 20000, %ld1
%cmp = icmp eq i32 %div, 1
br i1 %cmp, label %loop.inc, label %loop.end
It could be lane 1 of the comparison that triggers the early exit, but lane 3 could trap with divide-by-zero. If this was a normal loop where the divide was conditionally executed, i.e. something like
for (int i = 0; i < n; i++) {
if (mask[i]) {
dst[i] = 20000 / src[i];
}
}
then the vectoriser can at least create a vector expression for the mask and select between src
and a vector of ones to avoid unnecessarily creating an exception. However, in the test above the vectoriser has to perform the divide first in order to know what the mask should be.
I suppose in future improvements if we can prove that the udiv
can be safely moved after the vector comparison (i.e. because the comparison does not depend upon the udiv
) then we might be able to create a mask and select between src and 1.
For now, I thought it best to avoid over-complicating the initial early exit vectorisation work by bailing out on edge cases like this.
/// Keep track of all the loop exiting blocks. | ||
SmallVector<BasicBlock *, 4> CountableExitingBlocks; | ||
SmallVector<BasicBlock *, 4> UncountableExitingBlocks; | ||
|
||
/// Keep track of the destinations of all uncountable exits. | ||
SmallVector<BasicBlock *, 4> UncountableExitBlocks; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of the three only UncountableExitingBlocks
looks to be used by this patch. I'd just like confirmation this it intentional and will be built upon in a following PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, these will be used in follow-on patches. However, if you prefer to keep this initial patch clean I can remove them and add them later. I was just trying to reduce the size of follow-on patches by moving it into this patch that's all.
|
||
/// Returns the destination of a speculative early exiting block. | ||
BasicBlock *getSpeculativeEarlyExitBlock() const { | ||
assert(getUncountableExitBlocks().size() == 1 && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both getSpeculativeEarlyExitBloc
and getUncountableExitingBlocks
assert that there's exactly one entry. Would it be simpler to store the pointers directly, rather than in a 1 element vector?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could do that, and would be happy to do so if people prefer. I wrote it this way because in future I thought it's possible we may support multiple early exits. @paulwalker-arm @huntergr-arm any preference?
@@ -436,7 +470,7 @@ class LoopVectorizationLegality { | |||
/// we read and write from memory. This method checks if it is | |||
/// legal to vectorize the code, considering only memory constrains. | |||
/// Returns true if the loop is vectorizable | |||
bool canVectorizeMemory(); | |||
bool canVectorizeMemory(bool IsEarlyExitLoop); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Document argument
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
// For loops with uncountable early exiting blocks that are not the latch | ||
// it's necessary to perform extra checks, since the vectoriser is currently | ||
// only capable of handling simple search loops. | ||
if (IsEarlyExitLoop) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO it would make sense to keep the checks together in isVectorizableEarlyExitLoop
in particular with the extra checks recently added there
@@ -446,6 +480,9 @@ class LoopVectorizationLegality { | |||
/// specific checks for outer loop vectorization. | |||
bool canVectorizeOuterLoop(); | |||
|
|||
/// Returns true if this is an early exit loop that can be vectorized. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would probably be helpful to summarize the required conditions here (or somewhere else)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
// Check all instructions in the loop to see if they could potentially | ||
// generate exceptions or have side-effects. | ||
auto IsSafeOperation = [](Instruction *I) -> bool { | ||
// Is this a divide? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment doesn't seem to match the code below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hopefully this reads better now!
getUncountableExitingBlocks()[0] && | ||
"Expected latch predecessor to be the early exiting block"); | ||
|
||
for (Instruction &I : *LatchBB) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have any restrictions that there is only the latch after the early exit? If not, all blocks between early exit and latch would need to be check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, if you look in isVectorizableEarlyExitLoop
you'll see this code:
// The only supported early exit loops so far are ones where the early
// exiting block is a unique predecessor of the latch block.
BasicBlock *LatchPredBB = LatchBB->getUniquePredecessor();
if (LatchPredBB != getUncountableExitingBlocks()[0]) {
reportVectorizationFailure("Early exit is not the latch predecessor",
"Cannot vectorize early exit loop",
"EarlyExitNotLatchPredecessor", ORE, TheLoop);
return false;
}
which is why a few lines above I am able to add this assert:
assert(LatchBB->getUniquePredecessor() ==
getUncountableExitingBlocks()[0] &&
"Expected latch predecessor to be the early exiting block");
Although if I move the new code from canVectorizeMemory
to isVectorizableEarlyExitLoop
I can get rid of the assert.
I added this restriction as a simple optimisation to avoid messier CFGs, such as conditionally executed blocks after the early exit:
loop:
...
br i1 %cmp, label %early.exit, label %loop.bb1
loop.bb1:
...
br i1 %cmp2, label %do.stuff, label %loop.latch
do.stuff:
...
br label %loop.latch
loop.latch:
...
br i1 %cmp3, label %normal.exit, label %loop
In practice we can't yet permit any useful work after the early exit anyway until we can teach the vectoriser that subsequent blocks are effectively if-converted and require a mask, and the operations would need to occur before the early exit.
@@ -1442,6 +1487,126 @@ bool LoopVectorizationLegality::canVectorizeLoopNestCFG( | |||
return Result; | |||
} | |||
|
|||
bool LoopVectorizationLegality::isVectorizableEarlyExitLoop() { | |||
// At least one of the exiting blocks must be the latch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Below just checks if there's a latch, not if it is exiting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I should move the comment lower down to the code where it's checked:
if (isa<SCEVCouldNotCompute>(
PSE.getSE()->getPredicatedExitCount(TheLoop, LatchBB, &Predicates))) {
reportVectorizationFailure(
"Cannot determine exact exit count for latch block",
"Cannot vectorize early exit loop",
"UnknownLatchExitCountEarlyExitLoop", ORE, TheLoop);
return false;
}
There is a test for this - see @early_exit_infinite_loop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few final comments but otherwise this looks good to me.
SmallVector<const SCEVPredicate *, 4> Predicates; | ||
for (BasicBlock *BB1 : ExitingBlocks) { | ||
const SCEV *EC = | ||
PSE.getSE()->getPredicatedExitCount(TheLoop, BB1, &Predicates); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I didn't see this parameter controlling any code paths but I do see an assert that would fail without it.
What worried me was that you're clearing Predicates after iterating across all blocks and that made me assume it's contents didn't matter, but then I wondered why you need to manually clear it at all?
This patch is split off from PR llvm#88385 and concerns only the code related to the legality of vectorising early exit loops. It is the first step in adding support for vectorisation of a simple class of loops that typically involves searching for something, i.e. for (int i = 0; i < n; i++) { if (p[i] == val) return i; } return n; or for (int i = 0; i < n; i++) { if (p1[i] != p2[i]) return i; } return n; In this initial commit LoopVectorizationLegality will only consider early exit loops legal for vectorising if they follow these criteria: 1. There are no stores in the loop. 2. The loop must have only one early exit like those shown in the above example. I have referred to such exits as speculative early exits, to distinguish from existing support for early exits where the exit-not-taken count is known exactly at compile time. 3. The early exit block dominates the latch block. 4. The latch block must have an exact exit count. 5. There are no loads after the early exit block. 6. The loop must not contain reductions or recurrences. I don't see anything fundamental blocking vectorisation of such loops, but I just haven't done the work to support them yet. 7. We must be able to prove at compile-time that loops will not contain faulting loads. Tests have been added here: Transforms/LoopVectorize/AArch64/simple_early_exit.ll
* Reverted LoopAccessAnalysis changes in favour of new code in LoopVectorizationLegality.cpp that checks if instructions are safe to speculatively execute. * Updated some comments and done minor code refactoring. * Added new tests for demonstrating under which circumstances we can speculatively execute calls and divides. * Added new test where the early exit is in a conditional block. * Added new test containing store instructions. * Added new test containing loads after early exit.
* Combined changes from canVectorizeMemory into isVectorizableEarlyExitLoop. * Documented restrictions for isVectorizableEarlyExitLoop in header file. * Added reportVectorizationFailure call in LoopVectorize when we bail out. * General code refactor, clean-up.
Rebased to fix the conflict (that wasn't a conflict) warning |
* Remove code to check loads after early exit, since at the moment this is redundant. We only vectorise if we can prove that none of the loads will fault, so there is no harm in speculatively executing. Any outside users of a load in the latch block must have come from a normal exit and so we will extract the last lane of the vectorised in the same way as a normal loop. * Re-wrote the test loop_contains_load_after_early_exit to show it is legal to vectorise when the load in the latch block is dereferenceable. The test also contains an outside use of that load. At least once a patch lands to start vectorising this loop it will be obvious whether the code is correct or not. * Updated a comment.
@@ -0,0 +1,1969 @@ | |||
; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 4 | |||
; RUN: opt -S < %s -p loop-vectorize -mattr=+sve -debug-only=loop-vectorize \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's nothing target specific in this test at the moment, can it be moved out so we have test coverage for the legality checks independent of the target?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, seems reasonable. We will need a few SVE specific tests once we start vectorising these loops just to ensure the cost model decides it's worth vectorising, but that's fine!
* Move simple_early_exit.ll test file to top level directory and changed command line to be target-independent. As a result of this I had to update the tests due to some alignment differences.
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/123/builds/5918 Here is the relevant piece of the build log for the reference
|
@@ -0,0 +1,1941 @@ | |||
; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will need REQUIRES:asserts I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, you're right! Good spot. I'll fix asap.
|
||
/// Indicates whether this loop has a speculative early exit, i.e. an | ||
/// uncountable exiting block that is not the latch. | ||
bool HasSpeculativeEarlyExit = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency, with SCEV terminology and the variables below, Speculative
-> Uncountable
?
/// uncountable exiting block that is not the latch. | ||
bool HasSpeculativeEarlyExit = false; | ||
|
||
/// Keep track of all the loop exiting blocks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this comment accurate? AFAICT they only keep track of a single countable and uncountable exiting block, if the backedge taken count is not computable.
SmallVector<const SCEVPredicate *, 4> Predicates; | ||
for (BasicBlock *BB1 : ExitingBlocks) { | ||
const SCEV *EC = | ||
PSE.getSE()->getPredicatedExitCount(TheLoop, BB1, &Predicates); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a comment here or below why predicates is passed and dropped below?
} | ||
|
||
// The latch block must have a countable exit. | ||
if (isa<SCEVCouldNotCompute>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move simpler checks earlier before iterating over the whole loop? Can also check if LatchBB is in the collected countable exiting blocks?
|
||
LLVM_DEBUG( | ||
dbgs() | ||
<< "LV: Found an early exit. Retrying with speculative exit count.\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to be consistent with SCEV terminology, i.e. symbolic max BTC. Or more specifically the computable exiting count for the loop latch, as with the current restrictions getSymbolicMaxBackedgeTakenCount() == >getPredicatedExitCount(TheLoop, Latch)
?
return false; | ||
} | ||
|
||
BasicBlock *BB2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clearer to name ExitBlock?
Hi @fhahn, I'll take a look at addressing your comments in a follow-up patch. I had used the term speculative for things like |
* Rename Speculative -> Uncountable and update tests. * Add comments explaining why it's safe to ignore the predicates when building up a list of exiting blocks. * Reshuffle some code to do (hopefully) cheaper checks first.
llvm#107004) This patch is split off from PR llvm#88385 and concerns only the code related to the legality of vectorising early exit loops. It is the first step in adding support for vectorisation of a simple class of loops that typically involves searching for something, i.e. for (int i = 0; i < n; i++) { if (p[i] == val) return i; } return n; or for (int i = 0; i < n; i++) { if (p1[i] != p2[i]) return i; } return n; In this initial commit LoopVectorizationLegality will only consider early exit loops legal for vectorising if they follow these criteria: 1. There are no stores in the loop. 2. The loop must have only one early exit like those shown in the above example. I have referred to such exits as speculative early exits, to distinguish from existing support for early exits where the exit-not-taken count is known exactly at compile time. 3. The early exit block dominates the latch block. 4. The latch block must have an exact exit count. 5. There are no loads after the early exit block. 6. The loop must not contain reductions or recurrences. I don't see anything fundamental blocking vectorisation of such loops, but I just haven't done the work to support them yet. 7. We must be able to prove at compile-time that loops will not contain faulting loads. Tests have been added here: Transforms/LoopVectorize/AArch64/simple_early_exit.ll
This patch follows on from PR llvm#107004 by adding support for vectorisation of a simple class of loops that typically involves searching for something, i.e. for (int i = 0; i < n; i++) { if (p[i] == val) return i; } return n; or for (int i = 0; i < n; i++) { if (p1[i] != p2[i]) return i; } return n; In this initial commit we will only vectorise early exit loops legal if they follow these criteria: 1. There are no stores in the loop. 2. The loop must have only one early uncountable exit like those shown in the above example. 3. The early exit block dominates the latch block. 4. The latch block must have an exact exit count. 6. The loop must not contain reductions or recurrences. 7. We must be able to prove at compile-time that loops will not contain faulting loads. For point 7 once this patch lands I intend to follow up by supporting some limited cases of faulting loops where we can version the loop based on pointer alignment. For example, it turns out in the SPEC2017 benchmark (xalancbmk) there is a std::find loop that we can vectorise provided we add SCEV checks for the initial pointer being aligned to a multiple of the VF. In practice, the pointer is regularly aligned to at least 32/64 bytes and since the VF is a power of 2, any vector loads <= 32/64 bytes in size will always fault on the first lane, following the same behaviour as the scalar loop. Given we already do such speculative versioning for loops with unknown strides, alignment-based versioning doesn't seem to be any worse at least for loops with only one load. This patch makes use of the existing experimental_cttz_elems intrinsic that's required in the vectorised early exit block to determine the first lane that triggered the exit. This intrinsic has generic lowering support so it's guaranteed to work for all targets. Tests have been updated here: Transforms/LoopVectorize/simple_early_exit.ll
* Rename Speculative -> Uncountable and update tests. * Add comments explaining why it's safe to ignore the predicates when building up a list of exiting blocks. * Reshuffle some code to do (hopefully) cheaper checks first.
This patch follows on from PR llvm#107004 by adding support for vectorisation of a simple class of loops that typically involves searching for something, i.e. for (int i = 0; i < n; i++) { if (p[i] == val) return i; } return n; or for (int i = 0; i < n; i++) { if (p1[i] != p2[i]) return i; } return n; In this initial commit we will only vectorise early exit loops legal if they follow these criteria: 1. There are no stores in the loop. 2. The loop must have only one early uncountable exit like those shown in the above example. 3. The early exit block dominates the latch block. 4. The latch block must have an exact exit count. 6. The loop must not contain reductions or recurrences. 7. We must be able to prove at compile-time that loops will not contain faulting loads. For point 7 once this patch lands I intend to follow up by supporting some limited cases of faulting loops where we can version the loop based on pointer alignment. For example, it turns out in the SPEC2017 benchmark (xalancbmk) there is a std::find loop that we can vectorise provided we add SCEV checks for the initial pointer being aligned to a multiple of the VF. In practice, the pointer is regularly aligned to at least 32/64 bytes and since the VF is a power of 2, any vector loads <= 32/64 bytes in size will always fault on the first lane, following the same behaviour as the scalar loop. Given we already do such speculative versioning for loops with unknown strides, alignment-based versioning doesn't seem to be any worse at least for loops with only one load. This patch makes use of the existing experimental_cttz_elems intrinsic that's required in the vectorised early exit block to determine the first lane that triggered the exit. This intrinsic has generic lowering support so it's guaranteed to work for all targets. Tests have been updated here: Transforms/LoopVectorize/simple_early_exit.ll
…lvm#109300) * Rename Speculative -> Uncountable and update tests. * Add comments explaining why it's safe to ignore the predicates when building up a list of exiting blocks. * Reshuffle some code to do (hopefully) cheaper checks first.
This patch follows on from PR llvm#107004 by adding support for vectorisation of a simple class of loops that typically involves searching for something, i.e. for (int i = 0; i < n; i++) { if (p[i] == val) return i; } return n; or for (int i = 0; i < n; i++) { if (p1[i] != p2[i]) return i; } return n; In this initial commit we will only vectorise early exit loops legal if they follow these criteria: 1. There are no stores in the loop. 2. The loop must have only one early uncountable exit like those shown in the above example. 3. The early exit block dominates the latch block. 4. The latch block must have an exact exit count. 6. The loop must not contain reductions or recurrences. 7. We must be able to prove at compile-time that loops will not contain faulting loads. For point 7 once this patch lands I intend to follow up by supporting some limited cases of faulting loops where we can version the loop based on pointer alignment. For example, it turns out in the SPEC2017 benchmark (xalancbmk) there is a std::find loop that we can vectorise provided we add SCEV checks for the initial pointer being aligned to a multiple of the VF. In practice, the pointer is regularly aligned to at least 32/64 bytes and since the VF is a power of 2, any vector loads <= 32/64 bytes in size will always fault on the first lane, following the same behaviour as the scalar loop. Given we already do such speculative versioning for loops with unknown strides, alignment-based versioning doesn't seem to be any worse at least for loops with only one load. This patch makes use of the existing experimental_cttz_elems intrinsic that's required in the vectorised early exit block to determine the first lane that triggered the exit. This intrinsic has generic lowering support so it's guaranteed to work for all targets. Tests have been updated here: Transforms/LoopVectorize/simple_early_exit.ll
…lvm#109300) * Rename Speculative -> Uncountable and update tests. * Add comments explaining why it's safe to ignore the predicates when building up a list of exiting blocks. * Reshuffle some code to do (hopefully) cheaper checks first.
This patch follows on from PR llvm#107004 by adding support for vectorisation of a simple class of loops that typically involves searching for something, i.e. for (int i = 0; i < n; i++) { if (p[i] == val) return i; } return n; or for (int i = 0; i < n; i++) { if (p1[i] != p2[i]) return i; } return n; In this initial commit we will only vectorise early exit loops legal if they follow these criteria: 1. There are no stores in the loop. 2. The loop must have only one early uncountable exit like those shown in the above example. 3. The early exit block dominates the latch block. 4. The latch block must have an exact exit count. 6. The loop must not contain reductions or recurrences. 7. We must be able to prove at compile-time that loops will not contain faulting loads. For point 7 once this patch lands I intend to follow up by supporting some limited cases of faulting loops where we can version the loop based on pointer alignment. For example, it turns out in the SPEC2017 benchmark (xalancbmk) there is a std::find loop that we can vectorise provided we add SCEV checks for the initial pointer being aligned to a multiple of the VF. In practice, the pointer is regularly aligned to at least 32/64 bytes and since the VF is a power of 2, any vector loads <= 32/64 bytes in size will always fault on the first lane, following the same behaviour as the scalar loop. Given we already do such speculative versioning for loops with unknown strides, alignment-based versioning doesn't seem to be any worse at least for loops with only one load. This patch makes use of the existing experimental_cttz_elems intrinsic that's required in the vectorised early exit block to determine the first lane that triggered the exit. This intrinsic has generic lowering support so it's guaranteed to work for all targets. Tests have been updated here: Transforms/LoopVectorize/simple_early_exit.ll
This patch is split off from PR #88385 and concerns only the code related to the legality of vectorising early exit loops. It is the first step in adding support for vectorisation of a simple class of loops that typically involves searching for something, i.e.
or
In this initial commit LoopVectorizationLegality will only consider early exit loops legal for vectorising if they follow these criteria:
Tests have been added here:
Transforms/LoopVectorize/AArch64/simple_early_exit.ll