[LoopVectorize] Teach LoopVectorizationLegality about more early exits #107004

david-arm · 2024-09-02T15:42:53Z

This patch is split off from PR #88385 and concerns only the code related to the legality of vectorising early exit loops. It is the first step in adding support for vectorisation of a simple class of loops that typically involves searching for something, i.e.

  for (int i = 0; i < n; i++) {
    if (p[i] == val)
      return i;
  }
  return n;

or

  for (int i = 0; i < n; i++) {
    if (p1[i] != p2[i])
      return i;
  }
  return n;

In this initial commit LoopVectorizationLegality will only consider early exit loops legal for vectorising if they follow these criteria:

There are no stores in the loop.
The loop must have only one early exit like those shown in the above example. I have referred to such exits as speculative early exits, to distinguish from existing support for early exits where the exit-not-taken count is known exactly at compile time.
The early exit block dominates the latch block.
The latch block must have an exact exit count.
There are no loads after the early exit block.
The loop must not contain reductions or recurrences. I don't see anything fundamental blocking vectorisation of such loops, but I just haven't done the work to support them yet.
We must be able to prove at compile-time that loops will not contain faulting loads.

Tests have been added here:

Transforms/LoopVectorize/AArch64/simple_early_exit.ll

llvmbot · 2024-09-02T15:43:26Z

@llvm/pr-subscribers-llvm-analysis

@llvm/pr-subscribers-llvm-transforms

Author: David Sherwood (david-arm)

Changes

This patch is split off from PR #88385 and concerns only the code related to the legality of vectorising early exit loops. It is the first step in adding support for vectorisation of a simple class of loops that typically involves searching for something, i.e.

for (int i = 0; i < n; i++) {
if (p[i] == val)
return i;
}
return n;

or

for (int i = 0; i < n; i++) {
if (p1[i] != p2[i])
return i;
}
return n;

In this initial commit LoopVectorizationLegality will only consider early exit loops legal for vectorising if they follow these criteria:

There are no stores in the loop.
The loop must have only one early exit like those shown in the above example. I have referred to such exits as speculative early exits, to distinguish from existing support for early exits where the exit-not-taken count is known exactly at compile time.
The early exit block dominates the latch block.
The latch block must have an exact exit count.
There are no loads after the early exit block.
The loop must not contain reductions or recurrences. I don't see anything fundamental blocking vectorisation of such loops, but I just haven't done the work to support them yet.
We must be able to prove at compile-time that loops will not contain faulting loads.

Tests have been added here:

Transforms/LoopVectorize/AArch64/simple_early_exit.ll

Patch is 82.58 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/107004.diff

9 Files Affected:

(modified) llvm/include/llvm/Analysis/LoopAccessAnalysis.h (+3-1)
(modified) llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h (+49-1)
(modified) llvm/lib/Analysis/LoopAccessAnalysis.cpp (+5-2)
(modified) llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp (+146-12)
(modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+6)
(added) llvm/test/Transforms/LoopVectorize/AArch64/simple_early_exit.ll (+1584)
(modified) llvm/test/Transforms/LoopVectorize/X86/vectorization-remarks-missed.ll (+5-5)
(modified) llvm/test/Transforms/LoopVectorize/control-flow.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/remarks-multi-exit-loops.ll (+1-1)

diff --git a/llvm/include/llvm/Analysis/LoopAccessAnalysis.h b/llvm/include/llvm/Analysis/LoopAccessAnalysis.h
index 73d9c26ed6b1b7..5ea2202c4d4559 100644
--- a/llvm/include/llvm/Analysis/LoopAccessAnalysis.h
+++ b/llvm/include/llvm/Analysis/LoopAccessAnalysis.h
@@ -660,7 +660,8 @@ class LoopAccessInfo {
   bool isInvariant(Value *V) const;
 
   unsigned getNumStores() const { return NumStores; }
-  unsigned getNumLoads() const { return NumLoads;}
+  unsigned getNumLoads() const { return NumLoads; }
+  unsigned getNumCalls() const { return NumCalls; }
 
   /// The diagnostics report generated for the analysis.  E.g. why we
   /// couldn't analyze the loop.
@@ -753,6 +754,7 @@ class LoopAccessInfo {
 
   unsigned NumLoads = 0;
   unsigned NumStores = 0;
+  unsigned NumCalls = 0;
 
   /// Cache the result of analyzeLoop.
   bool CanVecMem = false;
diff --git a/llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h b/llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h
index 0f4d1355dd2bfe..bb21f97c0ae521 100644
--- a/llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h
+++ b/llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h
@@ -377,6 +377,24 @@ class LoopVectorizationLegality {
     return LAI->getDepChecker().getMaxSafeVectorWidthInBits();
   }
 
+  /// Returns true if the loop has a speculative early exit, i.e. an
+  /// uncountable exit that isn't the latch block.
+  bool hasSpeculativeEarlyExit() const { return HasSpeculativeEarlyExit; }
+
+  /// Returns the speculative early exiting block.
+  BasicBlock *getSpeculativeEarlyExitingBlock() const {
+    assert(getUncountableExitingBlocks().size() == 1 &&
+           "Expected only a single uncountable exiting block");
+    return getUncountableExitingBlocks()[0];
+  }
+
+  /// Returns the destination of a speculative early exiting block.
+  BasicBlock *getSpeculativeEarlyExitBlock() const {
+    assert(getUncountableExitBlocks().size() == 1 &&
+           "Expected only a single uncountable exit block");
+    return getUncountableExitBlocks()[0];
+  }
+
   /// Returns true if vector representation of the instruction \p I
   /// requires mask.
   bool isMaskRequired(const Instruction *I) const {
@@ -404,6 +422,22 @@ class LoopVectorizationLegality {
 
   DominatorTree *getDominatorTree() const { return DT; }
 
+  /// Returns all exiting blocks with a countable exit, i.e. the
+  /// exit-not-taken count is known exactly at compile time.
+  const SmallVector<BasicBlock *, 4> &getCountableExitingBlocks() const {
+    return CountableExitingBlocks;
+  }
+
+  /// Returns all the exiting blocks with an uncountable exit.
+  const SmallVector<BasicBlock *, 4> &getUncountableExitingBlocks() const {
+    return UncountableExitingBlocks;
+  }
+
+  /// Returns all the exit blocks from uncountable exiting blocks.
+  SmallVector<BasicBlock *, 4> getUncountableExitBlocks() const {
+    return UncountableExitBlocks;
+  }
+
 private:
   /// Return true if the pre-header, exiting and latch blocks of \p Lp and all
   /// its nested loops are considered legal for vectorization. These legal
@@ -436,7 +470,7 @@ class LoopVectorizationLegality {
   /// we read and write from memory. This method checks if it is
   /// legal to vectorize the code, considering only memory constrains.
   /// Returns true if the loop is vectorizable
-  bool canVectorizeMemory();
+  bool canVectorizeMemory(bool IsEarlyExitLoop);
 
   /// Return true if we can vectorize this loop using the IF-conversion
   /// transformation.
@@ -446,6 +480,10 @@ class LoopVectorizationLegality {
   /// specific checks for outer loop vectorization.
   bool canVectorizeOuterLoop();
 
+  /// Returns true if this is a supported early exit loop that we can
+  /// vectorize.
+  bool isVectorizableEarlyExitLoop();
+
   /// Return true if all of the instructions in the block can be speculatively
   /// executed, and record the loads/stores that require masking.
   /// \p SafePtrs is a list of addresses that are known to be legal and we know
@@ -551,6 +589,16 @@ class LoopVectorizationLegality {
   /// (potentially) make a better decision on the maximum VF and enable
   /// the use of those function variants.
   bool VecCallVariantsFound = false;
+
+  /// Indicates whether this loop has a speculative early exit, i.e. an
+  /// uncountable exiting block that is not the latch.
+  bool HasSpeculativeEarlyExit = false;
+
+  /// Keeps track of all the exits with known or countable exit-not-taken
+  /// counts.
+  SmallVector<BasicBlock *, 4> CountableExitingBlocks;
+  SmallVector<BasicBlock *, 4> UncountableExitingBlocks;
+  SmallVector<BasicBlock *, 4> UncountableExitBlocks;
 };
 
 } // namespace llvm
diff --git a/llvm/lib/Analysis/LoopAccessAnalysis.cpp b/llvm/lib/Analysis/LoopAccessAnalysis.cpp
index 980f142f113265..bf367a77aae1ce 100644
--- a/llvm/lib/Analysis/LoopAccessAnalysis.cpp
+++ b/llvm/lib/Analysis/LoopAccessAnalysis.cpp
@@ -2445,8 +2445,11 @@ bool LoopAccessInfo::analyzeLoop(AAResults *AA, const LoopInfo *LI,
       // vectorize a loop if it contains known function calls that don't set
       // the flag. Therefore, it is safe to ignore this read from memory.
       auto *Call = dyn_cast<CallInst>(&I);
-      if (Call && getVectorIntrinsicIDForCall(Call, TLI))
-        continue;
+      if (Call) {
+        NumCalls++;
+        if (getVectorIntrinsicIDForCall(Call, TLI))
+          continue;
+      }
 
       // If this is a load, save it. If this instruction can read from memory
       // but is not a load, then we quit. Notice that we don't handle function
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
index 66a779da8c25bc..3efdfd1fa2bc6e 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
@@ -1048,7 +1048,7 @@ bool LoopVectorizationLegality::canVectorizeInstrs() {
   return true;
 }
 
-bool LoopVectorizationLegality::canVectorizeMemory() {
+bool LoopVectorizationLegality::canVectorizeMemory(bool IsEarlyExitLoop) {
   LAI = &LAIs.getInfo(*TheLoop);
   const OptimizationRemarkAnalysis *LAR = LAI->getReport();
   if (LAR) {
@@ -1070,6 +1070,50 @@ bool LoopVectorizationLegality::canVectorizeMemory() {
     return false;
   }
 
+  // For loops with uncountable early exiting blocks that are not the latch
+  // it's necessary to perform extra checks, since the vectoriser is currently
+  // only capable of handling simple search loops.
+  if (IsEarlyExitLoop) {
+    // We don't support calls or any memory accesses that write to memory.
+    if (LAI->getNumStores()) {
+      reportVectorizationFailure(
+          "Writes to memory unsupported in early exit loops",
+          "Cannot vectorize early exit loop with writes to memory",
+          "WritesInEarlyExitLoop", ORE, TheLoop);
+      return false;
+    }
+
+    if (LAI->getNumCalls()) {
+      reportVectorizationFailure(
+          "Calls unsupported in early exit loops",
+          "Cannot vectorize early exit loop with function calls",
+          "CallsInEarlyExitLoop", ORE, TheLoop);
+      return false;
+    }
+
+    // The vectoriser cannot handle loads that occur after the early exit block.
+    BasicBlock *LatchBB = TheLoop->getLoopLatch();
+    for (Instruction &I : *LatchBB) {
+      if (I.mayReadFromMemory()) {
+        reportVectorizationFailure(
+            "Loads not permitted after early exit",
+            "Cannot vectorize early exit loop with loads after early exit",
+            "LoadsAfterEarlyExit", ORE, TheLoop);
+        return false;
+      }
+    }
+
+    // The vectoriser does not yet handle loops that may fault, but this will
+    // be improved in a follow-on patch.
+    if (!isDereferenceableReadOnlyLoop(TheLoop, PSE.getSE(), DT, AC)) {
+      reportVectorizationFailure(
+          "Loop may fault",
+          "Cannot vectorize potentially faulting early exit loop",
+          "PotentiallyFaultingEarlyExitLoop", ORE, TheLoop);
+      return false;
+    }
+  }
+
   // We can vectorize stores to invariant address when final reduction value is
   // guaranteed to be stored at the end of the loop. Also, if decision to
   // vectorize loop is made, runtime checks are added so as to make sure that
@@ -1442,6 +1486,95 @@ bool LoopVectorizationLegality::canVectorizeLoopNestCFG(
   return Result;
 }
 
+bool LoopVectorizationLegality::isVectorizableEarlyExitLoop() {
+  // At least one of the exiting blocks must be the latch.
+  BasicBlock *LatchBB = TheLoop->getLoopLatch();
+  if (!LatchBB) {
+    reportVectorizationFailure("Loop does not have a latch",
+                               "Cannot vectorize early exit loop",
+                               "NoLatchEarlyExit", ORE, TheLoop);
+    return false;
+  }
+
+  SmallVector<BasicBlock *, 8> ExitingBlocks;
+  TheLoop->getExitingBlocks(ExitingBlocks);
+
+  // Keep a record of all the exiting blocks with exact exit counts, as well as
+  // those with inexact counts.
+  SmallVector<const SCEVPredicate *, 4> Predicates;
+  for (BasicBlock *BB1 : ExitingBlocks) {
+    const SCEV *EC =
+        PSE.getSE()->getPredicatedExitCount(TheLoop, BB1, &Predicates);
+    if (isa<SCEVCouldNotCompute>(EC)) {
+      UncountableExitingBlocks.push_back(BB1);
+
+      unsigned NumExitBlocks = 0;
+      for (BasicBlock *BB2 : successors(BB1)) {
+        if (!TheLoop->contains(BB2)) {
+          UncountableExitBlocks.push_back(BB2);
+          NumExitBlocks++;
+        }
+      }
+      if (NumExitBlocks > 1) {
+        reportVectorizationFailure(
+            "Early exiting block has more than one successor outside of loop",
+            "Too many successors from early exiting block",
+            "EarlyExitTooManySuccessors", ORE, TheLoop);
+        return false;
+      }
+    } else
+      CountableExitingBlocks.push_back(BB1);
+  }
+  Predicates.clear();
+
+  // We only support one uncountable early exit.
+  if (getUncountableExitingBlocks().size() != 1) {
+    reportVectorizationFailure(
+        "Loop has too many uncountable exits",
+        "Cannot vectorize early exit loop with more than one early exit",
+        "TooManyUncountableEarlyExits", ORE, TheLoop);
+    return false;
+  }
+
+  // The only supported early exit loops so far are ones where the early
+  // exiting block is a unique predecessor of the latch block.
+  BasicBlock *LatchPredBB = LatchBB->getUniquePredecessor();
+  if (!LatchPredBB || LatchPredBB != getUncountableExitingBlocks()[0]) {
+    reportVectorizationFailure("Early exit is not the latch predecessor",
+                               "Cannot vectorize early exit loop",
+                               "EarlyExitNotLatchPredecessor", ORE, TheLoop);
+    return false;
+  }
+
+  if (Reductions.size() || FixedOrderRecurrences.size()) {
+    reportVectorizationFailure(
+        "Found reductions or recurrences in early-exit loop",
+        "Cannot vectorize early exit loop with reductions or recurrences",
+        "RecurrencesInEarlyExitLoop", ORE, TheLoop);
+    return false;
+  }
+
+  LLVM_DEBUG(
+      dbgs()
+      << "LV: Found an early exit. Retrying with speculative exit count.\n");
+  if (isa<SCEVCouldNotCompute>(
+          PSE.getSE()->getPredicatedExitCount(TheLoop, LatchBB, &Predicates))) {
+    reportVectorizationFailure(
+        "Cannot determine exact exit count for latch block",
+        "Cannot vectorize early exit loop",
+        "UnknownLatchExitCountEarlyExitLoop", ORE, TheLoop);
+    return false;
+  }
+
+  const SCEV *SpecExitCount = PSE.getSymbolicMaxBackedgeTakenCount();
+  assert(!isa<SCEVCouldNotCompute>(SpecExitCount) &&
+         "Failed to get symbolic expression for backedge taken count");
+
+  LLVM_DEBUG(dbgs() << "LV: Found speculative backedge taken count: "
+                    << *SpecExitCount << '\n');
+  return true;
+}
+
 bool LoopVectorizationLegality::canVectorize(bool UseVPlanNativePath) {
   // Store the result and return it at the end instead of exiting early, in case
   // allowExtraAnalysis is used to report multiple reasons for not vectorizing.
@@ -1500,19 +1633,20 @@ bool LoopVectorizationLegality::canVectorize(bool UseVPlanNativePath) {
       return false;
   }
 
-  // Go over each instruction and look at memory deps.
-  if (!canVectorizeMemory()) {
-    LLVM_DEBUG(dbgs() << "LV: Can't vectorize due to memory conflicts\n");
-    if (DoExtraAnalysis)
-      Result = false;
-    else
-      return false;
+  HasSpeculativeEarlyExit = false;
+  if (isa<SCEVCouldNotCompute>(PSE.getBackedgeTakenCount())) {
+    if (!isVectorizableEarlyExitLoop()) {
+      if (DoExtraAnalysis)
+        Result = false;
+      else
+        return false;
+    } else
+      HasSpeculativeEarlyExit = true;
   }
 
-  if (isa<SCEVCouldNotCompute>(PSE.getBackedgeTakenCount())) {
-    reportVectorizationFailure("could not determine number of loop iterations",
-                               "could not determine number of loop iterations",
-                               "CantComputeNumberOfIterations", ORE, TheLoop);
+  // Go over each instruction and look at memory deps.
+  if (!canVectorizeMemory(HasSpeculativeEarlyExit)) {
+    LLVM_DEBUG(dbgs() << "LV: Can't vectorize due to memory conflicts\n");
     if (DoExtraAnalysis)
       Result = false;
     else
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index a8722db654f5c9..bef7aa264157ce 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -9802,6 +9802,12 @@ bool LoopVectorizePass::processLoop(Loop *L) {
     return false;
   }
 
+  if (LVL.hasSpeculativeEarlyExit()) {
+    LLVM_DEBUG(dbgs() << "LV: Not vectorizing: Auto-vectorization of early "
+                      << "exit loops is not yet supported.\n");
+    return false;
+  }
+
   // Entrance to the VPlan-native vectorization path. Outer loops are processed
   // here. They may require CFG and instruction level transformations before
   // even evaluating whether vectorization is profitable. Since we cannot modify
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/simple_early_exit.ll b/llvm/test/Transforms/LoopVectorize/AArch64/simple_early_exit.ll
new file mode 100644
index 00000000000000..d677598399f1c2
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/simple_early_exit.ll
@@ -0,0 +1,1584 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 4
+; RUN: opt -S < %s -p loop-vectorize -mattr=+sve -debug-only=loop-vectorize \
+; RUN:    -mtriple aarch64-linux-gnu 2>%t | FileCheck %s --check-prefixes=CHECK
+; RUN: cat %t | FileCheck %s --check-prefix=DEBUG
+
+declare void @init_mem(ptr, i64);
+
+define i64 @same_exit_block_pre_inc_use1() {
+; DEBUG-LABEL: LV: Checking a loop in 'same_exit_block_pre_inc_use1'
+; DEBUG:       LV: Found an early exit. Retrying with speculative exit count.
+; DEBUG-NEXT:  LV: Found speculative backedge taken count: 63
+; DEBUG-NEXT:  LV: We can vectorize this loop!
+; DEBUG-NEXT:  LV: Not vectorizing: Auto-vectorization of early exit loops is not yet supported.
+; CHECK-LABEL: define i64 @same_exit_block_pre_inc_use1(
+; CHECK-SAME: ) #[[ATTR0:[0-9]+]] {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[P1:%.*]] = alloca [1024 x i8], align 4
+; CHECK-NEXT:    [[P2:%.*]] = alloca [1024 x i8], align 4
+; CHECK-NEXT:    call void @init_mem(ptr [[P1]], i64 1024)
+; CHECK-NEXT:    call void @init_mem(ptr [[P2]], i64 1024)
+; CHECK-NEXT:    br label [[LAND_RHS:%.*]]
+; CHECK:       loop:
+; CHECK-NEXT:    [[INDEX:%.*]] = phi i64 [ [[INDEX_NEXT:%.*]], [[FOR_INC:%.*]] ], [ 3, [[ENTRY:%.*]] ]
+; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds i8, ptr [[P1]], i64 [[INDEX]]
+; CHECK-NEXT:    [[TMP38:%.*]] = load i8, ptr [[ARRAYIDX]], align 1
+; CHECK-NEXT:    [[ARRAYIDX1:%.*]] = getelementptr inbounds i8, ptr [[P2]], i64 [[INDEX]]
+; CHECK-NEXT:    [[TMP39:%.*]] = load i8, ptr [[ARRAYIDX1]], align 1
+; CHECK-NEXT:    [[CMP3:%.*]] = icmp eq i8 [[TMP38]], [[TMP39]]
+; CHECK-NEXT:    br i1 [[CMP3]], label [[FOR_INC]], label [[FOR_END_LOOPEXIT:%.*]]
+; CHECK:       loop.inc:
+; CHECK-NEXT:    [[INDEX_NEXT]] = add i64 [[INDEX]], 1
+; CHECK-NEXT:    [[EXITCOND:%.*]] = icmp ne i64 [[INDEX_NEXT]], 67
+; CHECK-NEXT:    br i1 [[EXITCOND]], label [[LAND_RHS]], label [[FOR_END_LOOPEXIT]]
+; CHECK:       loop.end:
+; CHECK-NEXT:    [[START_0_LCSSA:%.*]] = phi i64 [ [[INDEX]], [[LAND_RHS]] ], [ 67, [[FOR_INC]] ]
+; CHECK-NEXT:    ret i64 [[START_0_LCSSA]]
+;
+entry:
+  %p1 = alloca [1024 x i8]
+  %p2 = alloca [1024 x i8]
+  call void @init_mem(ptr %p1, i64 1024)
+  call void @init_mem(ptr %p2, i64 1024)
+  br label %loop
+
+loop:
+  %index = phi i64 [ %index.next, %loop.inc ], [ 3, %entry ]
+  %arrayidx = getelementptr inbounds i8, ptr %p1, i64 %index
+  %ld1 = load i8, ptr %arrayidx, align 1
+  %arrayidx1 = getelementptr inbounds i8, ptr %p2, i64 %index
+  %ld2 = load i8, ptr %arrayidx1, align 1
+  %cmp3 = icmp eq i8 %ld1, %ld2
+  br i1 %cmp3, label %loop.inc, label %loop.end
+
+loop.inc:
+  %index.next = add i64 %index, 1
+  %exitcond = icmp ne i64 %index.next, 67
+  br i1 %exitcond, label %loop, label %loop.end
+
+loop.end:
+  %retval = phi i64 [ %index, %loop ], [ 67, %loop.inc ]
+  ret i64 %retval
+}
+
+
+define i64 @same_exit_block_pre_inc_use1_gep_two_indices() {
+; CHECK-LABEL: define i64 @same_exit_block_pre_inc_use1_gep_two_indices(
+; CHECK-SAME: ) #[[ATTR0]] {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[P1:%.*]] = alloca [1024 x i8], align 4
+; CHECK-NEXT:    [[P2:%.*]] = alloca [1024 x i8], align 4
+; CHECK-NEXT:    call void @init_mem(ptr [[P1]], i64 1024)
+; CHECK-NEXT:    call void @init_mem(ptr [[P2]], i64 1024)
+; CHECK-NEXT:    br label [[LOOP:%.*]]
+; CHECK:       loop:
+; CHECK-NEXT:    [[INDEX:%.*]] = phi i64 [ [[INDEX_NEXT:%.*]], [[LOOP_INC:%.*]] ], [ 3, [[ENTRY:%.*]] ]
+; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [1024 x i8], ptr [[P1]], i64 0, i64 [[INDEX]]
+; CHECK-NEXT:    [[LD1:%.*]] = load i8, ptr [[ARRAYIDX]], align 1
+; CHECK-NEXT:    [[ARRAYIDX1:%.*]] = getelementptr inbounds [1024 x i8], ptr [[P2]], i64 0, i64 [[INDEX]]
+; CHECK-NEXT:    [[LD2:%.*]] = load i8, ptr [[ARRAYIDX1]], align 1
+; CHECK-NEXT:    [[CMP3:%.*]] = icmp eq i8 [[LD1]], [[LD2]]
+; CHECK-NEXT:    br i1 [[CMP3]], label [[LOOP_INC]], label [[LOOP_END:%.*]]
+; CHECK:       loop.inc:
+; CHECK-NEXT:    [[INDEX_NEXT]] = add i64 [[INDEX]], 1
+; CHECK-NEXT:    [[EXITCOND:%.*]] = icmp ne i64 [[INDEX_NEXT]], 67
+; CHECK-NEXT:    br i1 [[EXITCOND]], label [[LOOP]], label [[LOOP_END]]
+; CHECK:       loop.end:
+; CHECK-NEXT:    [[RETVAL:%.*]] = phi i64 [ [[INDEX]], [[LOOP]] ], [ 67, [[LOOP_INC]] ]
+; CHECK-NEXT:    ret i64 [[RETVAL]]
+;
+entry:
+  %p1 = alloca [1024 x i8]
+  %p2 = alloca [1024 x i8]
+  call void @init_mem(ptr %p1, i64 1024)
+  call void @init_mem(ptr %p2, i64 1024)
+  br label %loop
+
+loop:
+  %index = phi i64 [ %index.next, %loop.inc ], [ 3, %entry ]
+  %arrayidx = getelementptr inbounds [1024 x i8], ptr %p1, i64 0, i64 %index
+  %ld1 = load i8, ptr %arrayidx, align 1
+  %arrayidx1 = getelementptr inbounds [1024 x i8], ptr %p2, i64 0, i64 %index
+  %ld2 = load i8, ptr %arrayidx1, align 1
+  %cmp3 = icmp eq i8 %ld1, %ld2
+  br i1 %cmp3, label %loop.inc, label %loop.end
+
+loop.inc:
+  %index.next = add i64 %index, 1
+  %exitcond = icmp ne i64 %index.next, 67
+  br i1 %exitcond, label %loop, label %loop.end
+
+loop.end:
+  %retval = phi i64 [ %index, %loop ], [ 67, %loop.inc ]
+  ret i64 %retval
+}
+
+
+define i64 @same_exit_block_pre_inc_use1_alloca_diff_type() {
+; CHECK-LABEL: define i64 @same_exit_block_pre_inc_use1_alloca_diff_type(
+; CHECK-SAME: ) #[[ATTR0]] {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[P1:%.*]] = alloca [40 x i32], align 4
+; CHECK-NEXT:    [[P2:%.*]] = alloca [40 x i32], align 4
+; CHECK-NEXT:    call void @init_mem(ptr [[P1]], i64 1024)
+; CHECK-NEXT:    call void @init_mem(ptr [[P2]], i64 1024)
+; CHECK-NEXT:    br label [[LAND_RHS:%.*]]
+; CHECK:       loop:
+; CHECK-NEXT:    [[INDEX:%.*]] = phi i64 [ [[INDEX_NEXT:%.*]], [[FOR_INC:%.*]] ], [ 3, [[ENTRY:%.*]] ]
+; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds i8, ptr [[P1]], i64 [[INDEX]]
+; CHECK-NEXT:    [[TMP38:%.*]] = load i8, ptr [[ARRAYIDX]], align 1
+; CHECK-NEXT:    [[ARRAYIDX1:%.*]] = getelementptr inbounds i8, ...
[truncated]

igogo-x86 · 2024-09-03T12:27:07Z

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp

+          NumExitBlocks++;
+        }
+      }
+      if (NumExitBlocks > 1) {


If BB1 ends with conditional BranchInst, it has two jumps. Both jumps can't go outside of the loop; otherwise, BB1 wouldn't belong to TheLoop. So, the only way BB1 can have multiple Exit Blocks is when it ends with SwitchInst. So, couldn't we check if BB1 ends with a switch?

You may be right at the moment that the only way this can happen is with a switch statement, but I can't 100% convince myself of that. :) That's why I wrote this in a more paranoid way by checking the number of blocks manually. What I could do is bail out early if the number of successors > 2, then add the single successor outside of the loop.

Switch support has only be added recently, so would probably be good to add a test if possible?

I think I already have with @early_exit_has_multiple_outside_successors, unless you were thinking of a different test?

paulwalker-arm · 2024-09-05T10:48:52Z

llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h

+  /// Returns true if this is a supported early exit loop that we can
+  /// vectorize.


What does "supported" mean in this context? Would it suffice to say "Returns true if this is an early exit loop that can be vectorized".

paulwalker-arm · 2024-09-05T10:55:31Z

llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h

+  /// uncountable exiting block that is not the latch.
+  bool HasSpeculativeEarlyExit = false;
+
+  /// Keeps track of all the exits with known or countable exit-not-taken


Should this be "known or uncountable"? Although if true then doesn't that cover the entire space and thus "Keeps track of all loop exits." would suffice?

Adjusted the comment - hopefully makes more sense!

paulwalker-arm · 2024-09-05T11:03:58Z

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp

@@ -1070,6 +1070,50 @@ bool LoopVectorizationLegality::canVectorizeMemory() {
    return false;
  }

+  // For loops with uncountable early exiting blocks that are not the latch


"that are not the latch" can be dropped because by definition the latch block cannot be early exiting?

I don't think this is true, because you can have loops like this:

while.cond: ; preds = %while.body, %entry %i.0 = phi i32 [ %start, %entry ], [ %inc, %while.body ] %inc = add i32 %i.0, 1 %cmp.not = icmp eq i32 %inc, %end br i1 %cmp.not, label %cleanup, label %while.body while.body: ; preds = %while.cond %idxprom = zext i32 %inc to i64 %arrayidx = getelementptr inbounds i32, ptr %p, i64 %idxprom %0 = load i32, ptr %arrayidx, align 4, !tbaa !6 %cmp1 = icmp eq i32 %0, 7 br i1 %cmp1, label %cleanup, label %while.cond, !llvm.loop !10 cleanup: ; preds = %while.cond, %while.body %retval.0 = phi i32 [ 1, %while.body ], [ 0, %while.cond ] ret i32 %retval.0

We do see code like this being generated from C:

unsigned foo(int *p, unsigned start, unsigned end) { unsigned i = start; while (++i != end) if (p[i] == 7) return 1; return 0; }

In this case the latch block either jumps to the beginning of the loop or takes an early exit.

paulwalker-arm · 2024-09-05T12:07:04Z

llvm/lib/Analysis/LoopAccessAnalysis.cpp

-      if (Call && getVectorIntrinsicIDForCall(Call, TLI))
-        continue;
+      if (Call) {
+        NumCalls++;


Based on the use, is this too broad? For example, would it prevent loops that call sqrt intrinsics from being vectorisable? What about keeping a count of non-store instructions that can modify memory? This can be added to the code below.

Hmm, I was probably being overly cautious here, but I'm worried about more than just stores. The function call could have side-effects, such as changing system state or generating exceptions. I might be able to relax this to permit calls to intrinsics that have no side-effects, and do not write to memory. I'll see how easy this is to do - it would require adding new tests.

This now makes me wonder about operations such as integer divides that may generate exceptions and whether or not such operations should be considered legal in an early exit block.

I don't think we can allow any instruction that may trigger UB or a trap in the header if it would be executed unconditionally in the vectorized loop, but this should probably be checked outside of LAA, as it is irrelevant for memory dependence analysis.

Agreed. It might make more sense to analyse all instructions in the early exiting block by checking for I.mayHaveSideEffects(), I.mayThrow(), I.mayReadOrWriteMemory(), etc.

I've reverted this code in favour of calling isSafeToSpeculativelyExecute from LoopVectorizationLegality. This covers more cases such as divides, while at the same time permitting calls to intrinsics such as sqrt.

paulwalker-arm · 2024-09-05T12:12:00Z

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp

+    // The vectoriser does not yet handle loops that may fault, but this will
+    // be improved in a follow-on patch.


Suggest "TODO: Handle loops that may fault."

Up to you but my suggestion was to replace the original comment with the TODO.

paulwalker-arm · 2024-09-05T12:18:04Z

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp

+    BasicBlock *LatchBB = TheLoop->getLoopLatch();
+    for (Instruction &I : *LatchBB) {


~~This seems like a naive question but is the latch block the only one that needs to be check? Are there no other blocks between the existing an latch blocks?~~

I answered my own question but leaving comment here because it is referenced below.

paulwalker-arm · 2024-09-05T12:24:47Z

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp

+  // Keep a record of all the exiting blocks with exact exit counts, as well as
+  // those with inexact counts.


As above, isn't this just all exiting blocks?

paulwalker-arm · 2024-09-05T12:29:13Z

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp

+    const SCEV *EC =
+        PSE.getSE()->getPredicatedExitCount(TheLoop, BB1, &Predicates);
+    if (isa<SCEVCouldNotCompute>(EC)) {
+      UncountableExitingBlocks.push_back(BB1);


Does it matter if BB1 has no out-of-loop successors?

If it's an exiting block at least one successor must be outside the loop. I've added an assert for this.

paulwalker-arm · 2024-09-05T12:32:14Z

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp

+  // The only supported early exit loops so far are ones where the early
+  // exiting block is a unique predecessor of the latch block.


This answers my earlier question but that means the code in canVectorizeMemory is fragile. Can an assert be added to canVectorizeMemory to ensure this is the case?

paulwalker-arm · 2024-09-05T12:36:10Z

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp

+  if (Reductions.size() || FixedOrderRecurrences.size()) {
+    reportVectorizationFailure(
+        "Found reductions or recurrences in early-exit loop",
+        "Cannot vectorize early exit loop with reductions or recurrences",
+        "RecurrencesInEarlyExitLoop", ORE, TheLoop);
+    return false;
+  }


This seems like a quick test that's worth having early before having to iterate across basic blocks etc...

paulwalker-arm · 2024-09-12T16:28:47Z

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp

+    // The vectoriser cannot handle loads that occur after the early exit block.
+    BasicBlock *LatchBB = TheLoop->getLoopLatch();
+    assert(LatchBB->getUniquePredecessor() ==
+               getUncountableExitingBlocks()[0] &&


getSpeculativeEarlyExitingBlock()?

paulwalker-arm · 2024-09-12T16:32:02Z

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp

+    // The vectoriser does not yet handle loops that may fault, but this will
+    // be improved in a follow-on patch.


Up to you but my suggestion was to replace the original comment with the TODO.

paulwalker-arm · 2024-09-12T16:37:18Z

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp

+  // The only supported early exit loops so far are ones where the early
+  // exiting block is a unique predecessor of the latch block.
+  BasicBlock *LatchPredBB = LatchBB->getUniquePredecessor();
+  if (!LatchPredBB || LatchPredBB != getUncountableExitingBlocks()[0]) {


This could use getUncountableExitingBlocks()? Also, perhaps !LatchPredBB is redundant given it'll fail the != test anyway?

paulwalker-arm · 2024-09-12T16:41:02Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+    LLVM_DEBUG(dbgs() << "LV: Not vectorizing: Auto-vectorization of early "
+                      << "exit loops is not yet supported.\n");


Is a reportVectorizationFailure required here until vectorisation is enabled?

paulwalker-arm · 2024-09-12T17:01:05Z

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp

+  SmallVector<const SCEVPredicate *, 4> Predicates;
+  for (BasicBlock *BB1 : ExitingBlocks) {
+    const SCEV *EC =
+        PSE.getSE()->getPredicatedExitCount(TheLoop, BB1, &Predicates);


Given Predicates is not used, can you pass nullptr here? From what I can see all the functions getPredicatedExitCount represent support not caring about the predicates.

I don't think that's true because if the loop really does require predicates then getPredicatedExitCount will return SCEVCouldNotCompute for the exit count and we will incorrectly treat this as an uncountable exiting block. If I pass in nullptr here the simple_early_exit.ll test file starts failing.

OK, I didn't see this parameter controlling any code paths but I do see an assert that would fail without it.

What worried me was that you're clearing Predicates after iterating across all blocks and that made me assume it's contents didn't matter, but then I wondered why you need to manually clear it at all?

Could you add a comment here or below why predicates is passed and dropped below?

paulwalker-arm · 2024-09-12T17:18:05Z

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp

+  // For loops with uncountable early exiting blocks that are not the latch
+  // it's necessary to perform extra checks, since the vectoriser is currently
+  // only capable of handling simple search loops.
+  if (IsEarlyExitLoop) {


Not sure what others think but I see canVectorizeMemory as being about load/store address analysis rather than "the only place to analyse load/store instructions". In this case you don't care about the addresses, you just want to guard against introducing side effects which I think can all be contained in isVectorizableEarlyExitLoop?

It kind of feels odd for canVectorizeMemory to depend on work done by isVectorizableEarlyExitLoop, when the "reason to reject an instruction" is largely the same.

IMO it would make sense to keep the checks together in isVectorizableEarlyExitLoop in particular with the extra checks recently added there

paulwalker-arm · 2024-09-12T17:21:44Z

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp

+
+  for (auto *BB : TheLoop->blocks())
+    for (auto &I : *BB)
+      if (!IsSafeOperation(&I)) {


Do early exiting loops introduce any special requirement here? I initially thought you'd restrict based on safe-to-speculate for only the blocks between the exit block and the latch block. Which, given the current restriction, means just the latch block along the same lines as why you disallow loads in the latch block.

I think you have to restrict for all blocks up to and including the early exit too, because any operation after the vector lane that triggers the exit could be unsafe. For example, see test @loop_contains_unsafe_div where the early exit block looks like this:

loop: %index = phi i64 [ %index.next, %loop.inc ], [ 3, %entry ] %arrayidx = getelementptr inbounds i8, ptr %p1, i64 %index %ld1 = load i32, ptr %arrayidx, align 1 %div = udiv i32 20000, %ld1 %cmp = icmp eq i32 %div, 1 br i1 %cmp, label %loop.inc, label %loop.end

It could be lane 1 of the comparison that triggers the early exit, but lane 3 could trap with divide-by-zero. If this was a normal loop where the divide was conditionally executed, i.e. something like

for (int i = 0; i < n; i++) { if (mask[i]) { dst[i] = 20000 / src[i]; } }

then the vectoriser can at least create a vector expression for the mask and select between src and a vector of ones to avoid unnecessarily creating an exception. However, in the test above the vectoriser has to perform the divide first in order to know what the mask should be.

I suppose in future improvements if we can prove that the udiv can be safely moved after the vector comparison (i.e. because the comparison does not depend upon the udiv) then we might be able to create a mask and select between src and 1.

For now, I thought it best to avoid over-complicating the initial early exit vectorisation work by bailing out on edge cases like this.

paulwalker-arm · 2024-09-12T17:28:27Z

llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h

+  /// Keep track of all the loop exiting blocks.
+  SmallVector<BasicBlock *, 4> CountableExitingBlocks;
+  SmallVector<BasicBlock *, 4> UncountableExitingBlocks;
+
+  /// Keep track of the destinations of all uncountable exits.
+  SmallVector<BasicBlock *, 4> UncountableExitBlocks;


Of the three only UncountableExitingBlocks looks to be used by this patch. I'd just like confirmation this it intentional and will be built upon in a following PR.

Yeah, these will be used in follow-on patches. However, if you prefer to keep this initial patch clean I can remove them and add them later. I was just trying to reduce the size of follow-on patches by moving it into this patch that's all.

fhahn · 2024-09-13T08:34:36Z

llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h

+
+  /// Returns the destination of a speculative early exiting block.
+  BasicBlock *getSpeculativeEarlyExitBlock() const {
+    assert(getUncountableExitBlocks().size() == 1 &&


Both getSpeculativeEarlyExitBloc and getUncountableExitingBlocks assert that there's exactly one entry. Would it be simpler to store the pointers directly, rather than in a 1 element vector?

I could do that, and would be happy to do so if people prefer. I wrote it this way because in future I thought it's possible we may support multiple early exits. @paulwalker-arm @huntergr-arm any preference?

fhahn · 2024-09-13T08:34:47Z

llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h

@@ -436,7 +470,7 @@ class LoopVectorizationLegality {
  /// we read and write from memory. This method checks if it is
  /// legal to vectorize the code, considering only memory constrains.
  /// Returns true if the loop is vectorizable
-  bool canVectorizeMemory();
+  bool canVectorizeMemory(bool IsEarlyExitLoop);


Document argument

fhahn · 2024-09-13T08:37:44Z

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp

+  // For loops with uncountable early exiting blocks that are not the latch
+  // it's necessary to perform extra checks, since the vectoriser is currently
+  // only capable of handling simple search loops.
+  if (IsEarlyExitLoop) {


IMO it would make sense to keep the checks together in isVectorizableEarlyExitLoop in particular with the extra checks recently added there

fhahn · 2024-09-13T08:38:28Z

llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h

@@ -446,6 +480,9 @@ class LoopVectorizationLegality {
  /// specific checks for outer loop vectorization.
  bool canVectorizeOuterLoop();

+  /// Returns true if this is an early exit loop that can be vectorized.


It would probably be helpful to summarize the required conditions here (or somewhere else)

fhahn · 2024-09-13T08:39:12Z

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp

+  // Check all instructions in the loop to see if they could potentially
+  // generate exceptions or have side-effects.
+  auto IsSafeOperation = [](Instruction *I) -> bool {
+    // Is this a divide?


The comment doesn't seem to match the code below?

Hopefully this reads better now!

fhahn · 2024-09-13T08:40:10Z

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp

+               getUncountableExitingBlocks()[0] &&
+           "Expected latch predecessor to be the early exiting block");
+
+    for (Instruction &I : *LatchBB) {


Do we have any restrictions that there is only the latch after the early exit? If not, all blocks between early exit and latch would need to be check?

Yeah, if you look in isVectorizableEarlyExitLoop you'll see this code:

// The only supported early exit loops so far are ones where the early // exiting block is a unique predecessor of the latch block. BasicBlock *LatchPredBB = LatchBB->getUniquePredecessor(); if (LatchPredBB != getUncountableExitingBlocks()[0]) { reportVectorizationFailure("Early exit is not the latch predecessor", "Cannot vectorize early exit loop", "EarlyExitNotLatchPredecessor", ORE, TheLoop); return false; }

which is why a few lines above I am able to add this assert:

assert(LatchBB->getUniquePredecessor() == getUncountableExitingBlocks()[0] && "Expected latch predecessor to be the early exiting block");

Although if I move the new code from canVectorizeMemory to isVectorizableEarlyExitLoop I can get rid of the assert.

I added this restriction as a simple optimisation to avoid messier CFGs, such as conditionally executed blocks after the early exit:

loop: ... br i1 %cmp, label %early.exit, label %loop.bb1 loop.bb1: ... br i1 %cmp2, label %do.stuff, label %loop.latch do.stuff: ... br label %loop.latch loop.latch: ... br i1 %cmp3, label %normal.exit, label %loop

In practice we can't yet permit any useful work after the early exit anyway until we can teach the vectoriser that subsequent blocks are effectively if-converted and require a mask, and the operations would need to occur before the early exit.

fhahn · 2024-09-13T08:41:26Z

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp

@@ -1442,6 +1487,126 @@ bool LoopVectorizationLegality::canVectorizeLoopNestCFG(
  return Result;
 }

+bool LoopVectorizationLegality::isVectorizableEarlyExitLoop() {
+  // At least one of the exiting blocks must be the latch.


Below just checks if there's a latch, not if it is exiting?

Hmm, I should move the comment lower down to the code where it's checked:

if (isa<SCEVCouldNotCompute>( PSE.getSE()->getPredicatedExitCount(TheLoop, LatchBB, &Predicates))) { reportVectorizationFailure( "Cannot determine exact exit count for latch block", "Cannot vectorize early exit loop", "UnknownLatchExitCountEarlyExitLoop", ORE, TheLoop); return false; }

There is a test for this - see @early_exit_infinite_loop

paulwalker-arm

A few final comments but otherwise this looks good to me.

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp

paulwalker-arm · 2024-09-17T12:46:40Z

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp

+  SmallVector<const SCEVPredicate *, 4> Predicates;
+  for (BasicBlock *BB1 : ExitingBlocks) {
+    const SCEV *EC =
+        PSE.getSE()->getPredicatedExitCount(TheLoop, BB1, &Predicates);


OK, I didn't see this parameter controlling any code paths but I do see an assert that would fail without it.

What worried me was that you're clearing Predicates after iterating across all blocks and that made me assume it's contents didn't matter, but then I wondered why you need to manually clear it at all?

This patch is split off from PR llvm#88385 and concerns only the code related to the legality of vectorising early exit loops. It is the first step in adding support for vectorisation of a simple class of loops that typically involves searching for something, i.e. for (int i = 0; i < n; i++) { if (p[i] == val) return i; } return n; or for (int i = 0; i < n; i++) { if (p1[i] != p2[i]) return i; } return n; In this initial commit LoopVectorizationLegality will only consider early exit loops legal for vectorising if they follow these criteria: 1. There are no stores in the loop. 2. The loop must have only one early exit like those shown in the above example. I have referred to such exits as speculative early exits, to distinguish from existing support for early exits where the exit-not-taken count is known exactly at compile time. 3. The early exit block dominates the latch block. 4. The latch block must have an exact exit count. 5. There are no loads after the early exit block. 6. The loop must not contain reductions or recurrences. I don't see anything fundamental blocking vectorisation of such loops, but I just haven't done the work to support them yet. 7. We must be able to prove at compile-time that loops will not contain faulting loads. Tests have been added here: Transforms/LoopVectorize/AArch64/simple_early_exit.ll

* Reverted LoopAccessAnalysis changes in favour of new code in LoopVectorizationLegality.cpp that checks if instructions are safe to speculatively execute. * Updated some comments and done minor code refactoring. * Added new tests for demonstrating under which circumstances we can speculatively execute calls and divides. * Added new test where the early exit is in a conditional block. * Added new test containing store instructions. * Added new test containing loads after early exit.

* Combined changes from canVectorizeMemory into isVectorizableEarlyExitLoop. * Documented restrictions for isVectorizableEarlyExitLoop in header file. * Added reportVectorizationFailure call in LoopVectorize when we bail out. * General code refactor, clean-up.

david-arm · 2024-09-17T14:48:37Z

Rebased to fix the conflict (that wasn't a conflict) warning

* Remove code to check loads after early exit, since at the moment this is redundant. We only vectorise if we can prove that none of the loads will fault, so there is no harm in speculatively executing. Any outside users of a load in the latch block must have come from a normal exit and so we will extract the last lane of the vectorised in the same way as a normal loop. * Re-wrote the test loop_contains_load_after_early_exit to show it is legal to vectorise when the load in the latch block is dereferenceable. The test also contains an outside use of that load. At least once a patch lands to start vectorising this loop it will be obvious whether the code is correct or not. * Updated a comment.

fhahn · 2024-09-18T11:20:06Z

llvm/test/Transforms/LoopVectorize/AArch64/simple_early_exit.ll

@@ -0,0 +1,1969 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 4
+; RUN: opt -S < %s -p loop-vectorize -mattr=+sve -debug-only=loop-vectorize \


There's nothing target specific in this test at the moment, can it be moved out so we have test coverage for the legality checks independent of the target?

Sure, seems reasonable. We will need a few SVE specific tests once we start vectorising these loops just to ensure the cost model decides it's worth vectorising, but that's fine!

* Move simple_early_exit.ll test file to top level directory and changed command line to be target-independent. As a result of this I had to update the tests due to some alignment differences.

llvm-ci · 2024-09-19T08:51:54Z

LLVM Buildbot has detected a new failure on builder clang-hip-vega20 running on hip-vega20-0 while building llvm at step 3 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/123/builds/5918

Here is the relevant piece of the build log for the reference

Step 3 (annotate) failure: '../llvm-zorg/zorg/buildbot/builders/annotated/hip-build.sh --jobs=' (failure)
...
[38/40] : && /buildbot/hip-vega20-0/clang-hip-vega20/llvm/bin/clang++ -O3 -DNDEBUG  External/HIP/CMakeFiles/memmove-hip-6.0.2.dir/memmove.hip.o -o External/HIP/memmove-hip-6.0.2  --rocm-path=/buildbot/Externals/hip/rocm-6.0.2 --hip-link -rtlib=compiler-rt -unwindlib=libgcc -frtlib-add-rpath && cd /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP && /usr/local/bin/cmake -E create_symlink /buildbot/llvm-test-suite/External/HIP/memmove.reference_output /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/memmove.reference_output-hip-6.0.2
[39/40] /buildbot/hip-vega20-0/clang-hip-vega20/llvm/bin/clang++ -DNDEBUG  -O3 -DNDEBUG   -w -Werror=date-time --rocm-path=/buildbot/Externals/hip/rocm-6.0.2 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -xhip -mfma -MD -MT External/HIP/CMakeFiles/TheNextWeek-hip-6.0.2.dir/workload/ray-tracing/TheNextWeek/main.cc.o -MF External/HIP/CMakeFiles/TheNextWeek-hip-6.0.2.dir/workload/ray-tracing/TheNextWeek/main.cc.o.d -o External/HIP/CMakeFiles/TheNextWeek-hip-6.0.2.dir/workload/ray-tracing/TheNextWeek/main.cc.o -c /buildbot/llvm-test-suite/External/HIP/workload/ray-tracing/TheNextWeek/main.cc
[40/40] : && /buildbot/hip-vega20-0/clang-hip-vega20/llvm/bin/clang++ -O3 -DNDEBUG  External/HIP/CMakeFiles/TheNextWeek-hip-6.0.2.dir/workload/ray-tracing/TheNextWeek/main.cc.o -o External/HIP/TheNextWeek-hip-6.0.2  --rocm-path=/buildbot/Externals/hip/rocm-6.0.2 --hip-link -rtlib=compiler-rt -unwindlib=libgcc -frtlib-add-rpath && cd /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP && /usr/local/bin/cmake -E create_symlink /buildbot/llvm-test-suite/External/HIP/TheNextWeek.reference_output /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/TheNextWeek.reference_output-hip-6.0.2
+ build_step 'Testing HIP test-suite'
+ echo '@@@BUILD_STEP Testing HIP test-suite@@@'
@@@BUILD_STEP Testing HIP test-suite@@@
+ ninja -v check-hip-simple
[0/1] cd /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP && /buildbot/hip-vega20-0/clang-hip-vega20/llvm/bin/llvm-lit -sv empty-hip-6.0.2.test with-fopenmp-hip-6.0.2.test saxpy-hip-6.0.2.test memmove-hip-6.0.2.test InOneWeekend-hip-6.0.2.test TheNextWeek-hip-6.0.2.test blender.test
-- Testing: 7 tests, 7 workers --
Testing:  0.. 10.. 20.. 30.. 40
FAIL: test-suite :: External/HIP/InOneWeekend-hip-6.0.2.test (4 of 7)
******************** TEST 'test-suite :: External/HIP/InOneWeekend-hip-6.0.2.test' FAILED ********************

/buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/tools/timeit-target --timeout 7200 --limit-core 0 --limit-cpu 7200 --limit-file-size 209715200 --limit-rss-size 838860800 --append-exitstatus --redirect-output /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/Output/InOneWeekend-hip-6.0.2.test.out --redirect-input /dev/null --summary /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/Output/InOneWeekend-hip-6.0.2.test.time /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/InOneWeekend-hip-6.0.2
cd /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP ; /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/tools/fpcmp-target /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/Output/InOneWeekend-hip-6.0.2.test.out InOneWeekend.reference_output-hip-6.0.2

+ cd /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP
+ /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/tools/fpcmp-target /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/Output/InOneWeekend-hip-6.0.2.test.out InOneWeekend.reference_output-hip-6.0.2
/buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/tools/fpcmp-target: Comparison failed, textual difference between 'M' and 'i'

Input 1:
Memory access fault by GPU node-1 (Agent handle: 0x561b1190eac0) on address (nil). Reason: Page not present or supervisor privilege.
exit 134

Input 2:
image width = 1200 height = 675
block size = (16, 16) grid size = (75, 43)
Start rendering by GPU.
Done.
gpu.ppm and ref.ppm are the same.
exit 0

********************
/usr/bin/strip: /bin/bash.stripped: Bad file descriptor
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90.. 
********************
Failed Tests (1):
  test-suite :: External/HIP/InOneWeekend-hip-6.0.2.test


Testing Time: 336.24s

Total Discovered Tests: 7
  Passed: 6 (85.71%)
  Failed: 1 (14.29%)
FAILED: External/HIP/CMakeFiles/check-hip-simple-hip-6.0.2 
cd /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP && /buildbot/hip-vega20-0/clang-hip-vega20/llvm/bin/llvm-lit -sv empty-hip-6.0.2.test with-fopenmp-hip-6.0.2.test saxpy-hip-6.0.2.test memmove-hip-6.0.2.test InOneWeekend-hip-6.0.2.test TheNextWeek-hip-6.0.2.test blender.test
ninja: build stopped: subcommand failed.
Step 12 (Testing HIP test-suite) failure: Testing HIP test-suite (failure)
@@@BUILD_STEP Testing HIP test-suite@@@
+ ninja -v check-hip-simple
[0/1] cd /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP && /buildbot/hip-vega20-0/clang-hip-vega20/llvm/bin/llvm-lit -sv empty-hip-6.0.2.test with-fopenmp-hip-6.0.2.test saxpy-hip-6.0.2.test memmove-hip-6.0.2.test InOneWeekend-hip-6.0.2.test TheNextWeek-hip-6.0.2.test blender.test
-- Testing: 7 tests, 7 workers --
Testing:  0.. 10.. 20.. 30.. 40
FAIL: test-suite :: External/HIP/InOneWeekend-hip-6.0.2.test (4 of 7)
******************** TEST 'test-suite :: External/HIP/InOneWeekend-hip-6.0.2.test' FAILED ********************

/buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/tools/timeit-target --timeout 7200 --limit-core 0 --limit-cpu 7200 --limit-file-size 209715200 --limit-rss-size 838860800 --append-exitstatus --redirect-output /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/Output/InOneWeekend-hip-6.0.2.test.out --redirect-input /dev/null --summary /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/Output/InOneWeekend-hip-6.0.2.test.time /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/InOneWeekend-hip-6.0.2
cd /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP ; /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/tools/fpcmp-target /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/Output/InOneWeekend-hip-6.0.2.test.out InOneWeekend.reference_output-hip-6.0.2

+ cd /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP
+ /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/tools/fpcmp-target /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/Output/InOneWeekend-hip-6.0.2.test.out InOneWeekend.reference_output-hip-6.0.2
/buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/tools/fpcmp-target: Comparison failed, textual difference between 'M' and 'i'

Input 1:
Memory access fault by GPU node-1 (Agent handle: 0x561b1190eac0) on address (nil). Reason: Page not present or supervisor privilege.
exit 134

Input 2:
image width = 1200 height = 675
block size = (16, 16) grid size = (75, 43)
Start rendering by GPU.
Done.
gpu.ppm and ref.ppm are the same.
exit 0

********************
/usr/bin/strip: /bin/bash.stripped: Bad file descriptor
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90.. 
********************
Failed Tests (1):
  test-suite :: External/HIP/InOneWeekend-hip-6.0.2.test


Testing Time: 336.24s

Total Discovered Tests: 7
  Passed: 6 (85.71%)
  Failed: 1 (14.29%)
FAILED: External/HIP/CMakeFiles/check-hip-simple-hip-6.0.2 
cd /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP && /buildbot/hip-vega20-0/clang-hip-vega20/llvm/bin/llvm-lit -sv empty-hip-6.0.2.test with-fopenmp-hip-6.0.2.test saxpy-hip-6.0.2.test memmove-hip-6.0.2.test InOneWeekend-hip-6.0.2.test TheNextWeek-hip-6.0.2.test blender.test
ninja: build stopped: subcommand failed.
program finished with exit code 1
elapsedTime=454.757562

fhahn · 2024-09-19T08:57:18Z

llvm/test/Transforms/LoopVectorize/simple_early_exit.ll

@@ -0,0 +1,1941 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 4


This will need REQUIRES:asserts I think

Ah yes, you're right! Good spot. I'll fix asap.

fhahn · 2024-09-19T09:00:43Z

llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h

+
+  /// Indicates whether this loop has a speculative early exit, i.e. an
+  /// uncountable exiting block that is not the latch.
+  bool HasSpeculativeEarlyExit = false;


For consistency, with SCEV terminology and the variables below, Speculative -> Uncountable?

fhahn · 2024-09-19T09:02:56Z

llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h

+  /// uncountable exiting block that is not the latch.
+  bool HasSpeculativeEarlyExit = false;
+
+  /// Keep track of all the loop exiting blocks.


Is this comment accurate? AFAICT they only keep track of a single countable and uncountable exiting block, if the backedge taken count is not computable.

fhahn · 2024-09-19T09:04:51Z

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp

+  SmallVector<const SCEVPredicate *, 4> Predicates;
+  for (BasicBlock *BB1 : ExitingBlocks) {
+    const SCEV *EC =
+        PSE.getSE()->getPredicatedExitCount(TheLoop, BB1, &Predicates);


Could you add a comment here or below why predicates is passed and dropped below?

fhahn · 2024-09-19T09:07:28Z

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp

+    }
+
+  // The latch block must have a countable exit.
+  if (isa<SCEVCouldNotCompute>(


Move simpler checks earlier before iterating over the whole loop? Can also check if LatchBB is in the collected countable exiting blocks?

fhahn · 2024-09-19T09:11:05Z

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp

+
+  LLVM_DEBUG(
+      dbgs()
+      << "LV: Found an early exit. Retrying with speculative exit count.\n");


Would be good to be consistent with SCEV terminology, i.e. symbolic max BTC. Or more specifically the computable exiting count for the loop latch, as with the current restrictions getSymbolicMaxBackedgeTakenCount() == >getPredicatedExitCount(TheLoop, Latch)?

fhahn · 2024-09-19T09:13:00Z

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp

+        return false;
+      }
+
+      BasicBlock *BB2;


clearer to name ExitBlock?

david-arm · 2024-09-19T12:21:11Z

Hi @fhahn, I'll take a look at addressing your comments in a follow-up patch. I had used the term speculative for things like HasSpeculativeEarlyExit to try and indicate that even the loop has to have more than just an uncountable early exit, because the exit count from the latch needs to be countable too in order for this flag to be true. But perhaps that's unnecessary and a bit pedantic and I think I'm happy to rename for consistency.

* Rename Speculative -> Uncountable and update tests. * Add comments explaining why it's safe to ignore the predicates when building up a list of exiting blocks. * Reshuffle some code to do (hopefully) cheaper checks first.

llvm#107004) This patch is split off from PR llvm#88385 and concerns only the code related to the legality of vectorising early exit loops. It is the first step in adding support for vectorisation of a simple class of loops that typically involves searching for something, i.e. for (int i = 0; i < n; i++) { if (p[i] == val) return i; } return n; or for (int i = 0; i < n; i++) { if (p1[i] != p2[i]) return i; } return n; In this initial commit LoopVectorizationLegality will only consider early exit loops legal for vectorising if they follow these criteria: 1. There are no stores in the loop. 2. The loop must have only one early exit like those shown in the above example. I have referred to such exits as speculative early exits, to distinguish from existing support for early exits where the exit-not-taken count is known exactly at compile time. 3. The early exit block dominates the latch block. 4. The latch block must have an exact exit count. 5. There are no loads after the early exit block. 6. The loop must not contain reductions or recurrences. I don't see anything fundamental blocking vectorisation of such loops, but I just haven't done the work to support them yet. 7. We must be able to prove at compile-time that loops will not contain faulting loads. Tests have been added here: Transforms/LoopVectorize/AArch64/simple_early_exit.ll

This patch follows on from PR llvm#107004 by adding support for vectorisation of a simple class of loops that typically involves searching for something, i.e. for (int i = 0; i < n; i++) { if (p[i] == val) return i; } return n; or for (int i = 0; i < n; i++) { if (p1[i] != p2[i]) return i; } return n; In this initial commit we will only vectorise early exit loops legal if they follow these criteria: 1. There are no stores in the loop. 2. The loop must have only one early uncountable exit like those shown in the above example. 3. The early exit block dominates the latch block. 4. The latch block must have an exact exit count. 6. The loop must not contain reductions or recurrences. 7. We must be able to prove at compile-time that loops will not contain faulting loads. For point 7 once this patch lands I intend to follow up by supporting some limited cases of faulting loops where we can version the loop based on pointer alignment. For example, it turns out in the SPEC2017 benchmark (xalancbmk) there is a std::find loop that we can vectorise provided we add SCEV checks for the initial pointer being aligned to a multiple of the VF. In practice, the pointer is regularly aligned to at least 32/64 bytes and since the VF is a power of 2, any vector loads <= 32/64 bytes in size will always fault on the first lane, following the same behaviour as the scalar loop. Given we already do such speculative versioning for loops with unknown strides, alignment-based versioning doesn't seem to be any worse at least for loops with only one load. This patch makes use of the existing experimental_cttz_elems intrinsic that's required in the vectorised early exit block to determine the first lane that triggered the exit. This intrinsic has generic lowering support so it's guaranteed to work for all targets. Tests have been updated here: Transforms/LoopVectorize/simple_early_exit.ll

* Rename Speculative -> Uncountable and update tests. * Add comments explaining why it's safe to ignore the predicates when building up a list of exiting blocks. * Reshuffle some code to do (hopefully) cheaper checks first.

) * Rename Speculative -> Uncountable and update tests. * Add comments explaining why it's safe to ignore the predicates when building up a list of exiting blocks. * Reshuffle some code to do (hopefully) cheaper checks first.

This patch follows on from PR llvm#107004 by adding support for vectorisation of a simple class of loops that typically involves searching for something, i.e. for (int i = 0; i < n; i++) { if (p[i] == val) return i; } return n; or for (int i = 0; i < n; i++) { if (p1[i] != p2[i]) return i; } return n; In this initial commit we will only vectorise early exit loops legal if they follow these criteria: 1. There are no stores in the loop. 2. The loop must have only one early uncountable exit like those shown in the above example. 3. The early exit block dominates the latch block. 4. The latch block must have an exact exit count. 6. The loop must not contain reductions or recurrences. 7. We must be able to prove at compile-time that loops will not contain faulting loads. For point 7 once this patch lands I intend to follow up by supporting some limited cases of faulting loops where we can version the loop based on pointer alignment. For example, it turns out in the SPEC2017 benchmark (xalancbmk) there is a std::find loop that we can vectorise provided we add SCEV checks for the initial pointer being aligned to a multiple of the VF. In practice, the pointer is regularly aligned to at least 32/64 bytes and since the VF is a power of 2, any vector loads <= 32/64 bytes in size will always fault on the first lane, following the same behaviour as the scalar loop. Given we already do such speculative versioning for loops with unknown strides, alignment-based versioning doesn't seem to be any worse at least for loops with only one load. This patch makes use of the existing experimental_cttz_elems intrinsic that's required in the vectorised early exit block to determine the first lane that triggered the exit. This intrinsic has generic lowering support so it's guaranteed to work for all targets. Tests have been updated here: Transforms/LoopVectorize/simple_early_exit.ll

…lvm#109300) * Rename Speculative -> Uncountable and update tests. * Add comments explaining why it's safe to ignore the predicates when building up a list of exiting blocks. * Reshuffle some code to do (hopefully) cheaper checks first.

This patch follows on from PR llvm#107004 by adding support for vectorisation of a simple class of loops that typically involves searching for something, i.e. for (int i = 0; i < n; i++) { if (p[i] == val) return i; } return n; or for (int i = 0; i < n; i++) { if (p1[i] != p2[i]) return i; } return n; In this initial commit we will only vectorise early exit loops legal if they follow these criteria: 1. There are no stores in the loop. 2. The loop must have only one early uncountable exit like those shown in the above example. 3. The early exit block dominates the latch block. 4. The latch block must have an exact exit count. 6. The loop must not contain reductions or recurrences. 7. We must be able to prove at compile-time that loops will not contain faulting loads. For point 7 once this patch lands I intend to follow up by supporting some limited cases of faulting loops where we can version the loop based on pointer alignment. For example, it turns out in the SPEC2017 benchmark (xalancbmk) there is a std::find loop that we can vectorise provided we add SCEV checks for the initial pointer being aligned to a multiple of the VF. In practice, the pointer is regularly aligned to at least 32/64 bytes and since the VF is a power of 2, any vector loads <= 32/64 bytes in size will always fault on the first lane, following the same behaviour as the scalar loop. Given we already do such speculative versioning for loops with unknown strides, alignment-based versioning doesn't seem to be any worse at least for loops with only one load. This patch makes use of the existing experimental_cttz_elems intrinsic that's required in the vectorised early exit block to determine the first lane that triggered the exit. This intrinsic has generic lowering support so it's guaranteed to work for all targets. Tests have been updated here: Transforms/LoopVectorize/simple_early_exit.ll

…lvm#109300) * Rename Speculative -> Uncountable and update tests. * Add comments explaining why it's safe to ignore the predicates when building up a list of exiting blocks. * Reshuffle some code to do (hopefully) cheaper checks first.

This patch follows on from PR llvm#107004 by adding support for vectorisation of a simple class of loops that typically involves searching for something, i.e. for (int i = 0; i < n; i++) { if (p[i] == val) return i; } return n; or for (int i = 0; i < n; i++) { if (p1[i] != p2[i]) return i; } return n; In this initial commit we will only vectorise early exit loops legal if they follow these criteria: 1. There are no stores in the loop. 2. The loop must have only one early uncountable exit like those shown in the above example. 3. The early exit block dominates the latch block. 4. The latch block must have an exact exit count. 6. The loop must not contain reductions or recurrences. 7. We must be able to prove at compile-time that loops will not contain faulting loads. For point 7 once this patch lands I intend to follow up by supporting some limited cases of faulting loops where we can version the loop based on pointer alignment. For example, it turns out in the SPEC2017 benchmark (xalancbmk) there is a std::find loop that we can vectorise provided we add SCEV checks for the initial pointer being aligned to a multiple of the VF. In practice, the pointer is regularly aligned to at least 32/64 bytes and since the VF is a power of 2, any vector loads <= 32/64 bytes in size will always fault on the first lane, following the same behaviour as the scalar loop. Given we already do such speculative versioning for loops with unknown strides, alignment-based versioning doesn't seem to be any worse at least for loops with only one load. This patch makes use of the existing experimental_cttz_elems intrinsic that's required in the vectorised early exit block to determine the first lane that triggered the exit. This intrinsic has generic lowering support so it's guaranteed to work for all targets. Tests have been updated here: Transforms/LoopVectorize/simple_early_exit.ll

david-arm requested review from fhahn, preames, davemgreen and huntergr-arm September 2, 2024 15:42

llvmbot added vectorizers llvm:analysis llvm:transforms labels Sep 2, 2024

david-arm requested a review from paulwalker-arm September 2, 2024 15:50

igogo-x86 reviewed Sep 3, 2024

View reviewed changes

paulwalker-arm reviewed Sep 5, 2024

View reviewed changes

paulwalker-arm reviewed Sep 12, 2024

View reviewed changes

fhahn reviewed Sep 13, 2024

View reviewed changes

paulwalker-arm approved these changes Sep 17, 2024

View reviewed changes

david-arm added 3 commits September 17, 2024 14:47

Address review comments

0087f54

* Combined changes from canVectorizeMemory into isVectorizableEarlyExitLoop. * Documented restrictions for isVectorizableEarlyExitLoop in header file. * Added reportVectorizationFailure call in LoopVectorize when we bail out. * General code refactor, clean-up.

david-arm force-pushed the ee_legal branch from 432e1dd to 0087f54 Compare September 17, 2024 14:48

fhahn reviewed Sep 18, 2024

View reviewed changes

Address review comments

2ee84a1

* Move simple_early_exit.ll test file to top level directory and changed command line to be target-independent. As a result of this I had to update the tests due to some alignment differences.

david-arm merged commit e762d4d into llvm:main Sep 19, 2024
8 checks passed

fhahn reviewed Sep 19, 2024

View reviewed changes

david-arm mentioned this pull request Sep 20, 2024

[LoopVectorize] Add support for vectorisation of more early exit loops #88385

Open

david-arm deleted the ee_legal branch October 3, 2024 08:53

david-arm mentioned this pull request Nov 7, 2024

[VPlan] Dispatch to multiple exit blocks via middle blocks. #112138

Merged

		/// Returns true if this is a supported early exit loop that we can
		/// vectorize.

		// The vectoriser does not yet handle loops that may fault, but this will
		// be improved in a follow-on patch.

		BasicBlock *LatchBB = TheLoop->getLoopLatch();
		for (Instruction &I : *LatchBB) {

		// Keep a record of all the exiting blocks with exact exit counts, as well as
		// those with inexact counts.

		// The only supported early exit loops so far are ones where the early
		// exiting block is a unique predecessor of the latch block.

		LLVM_DEBUG(dbgs() << "LV: Not vectorizing: Auto-vectorization of early "
		<< "exit loops is not yet supported.\n");

		@@ -0,0 +1,1969 @@
		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 4
		; RUN: opt -S < %s -p loop-vectorize -mattr=+sve -debug-only=loop-vectorize \

		@@ -0,0 +1,1941 @@
		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 4

[LoopVectorize] Teach LoopVectorizationLegality about more early exits #107004

[LoopVectorize] Teach LoopVectorizationLegality about more early exits #107004

Conversation

david-arm commented Sep 2, 2024 • edited Loading

llvmbot commented Sep 2, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

paulwalker-arm left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

david-arm commented Sep 17, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

llvm-ci commented Sep 19, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

david-arm commented Sep 2, 2024 •

edited

Loading

llvmbot commented Sep 2, 2024 •

edited

Loading