Skip to content

Commit

Permalink
[LoopVectorize] Add support for vectorisation of more early exit loops
Browse files Browse the repository at this point in the history
This patch follows on from PR llvm#107004 by adding support for
vectorisation of a simple class of loops that typically involves
searching for something, i.e.

  for (int i = 0; i < n; i++) {
    if (p[i] == val)
      return i;
  }
  return n;

or

  for (int i = 0; i < n; i++) {
    if (p1[i] != p2[i])
      return i;
  }
  return n;

In this initial commit we will only vectorise early exit loops legal
if they follow these criteria:

1. There are no stores in the loop.
2. The loop must have only one early uncountable exit like those shown
in the above example.
3. The early exit block dominates the latch block.
4. The latch block must have an exact exit count.
6. The loop must not contain reductions or recurrences.
7. We must be able to prove at compile-time that loops will not contain
faulting loads.

For point 7 once this patch lands I intend to follow up by supporting
some limited cases of faulting loops where we can version the loop based
on pointer alignment. For example, it turns out in the SPEC2017 benchmark
(xalancbmk) there is a std::find loop that we can vectorise provided we
add SCEV checks for the initial pointer being aligned to a multiple of
the VF. In practice, the pointer is regularly aligned to at least 32/64
bytes and since the VF is a power of 2, any vector loads <= 32/64 bytes
in size will always fault on the first lane, following the same behaviour
as the scalar loop. Given we already do such speculative versioning for
loops with unknown strides, alignment-based versioning doesn't seem to be
any worse at least for loops with only one load.

This patch makes use of the existing experimental_cttz_elems intrinsic
that's required in the vectorised early exit block to determine the first
lane that triggered the exit. This intrinsic has generic lowering support
so it's guaranteed to work for all targets.

Tests have been updated here:

Transforms/LoopVectorize/simple_early_exit.ll
  • Loading branch information
david-arm committed Oct 8, 2024
1 parent 66b2820 commit 39185b4
Show file tree
Hide file tree
Showing 11 changed files with 1,979 additions and 363 deletions.
4 changes: 4 additions & 0 deletions llvm/include/llvm/Support/GenericLoopInfo.h
Original file line number Diff line number Diff line change
Expand Up @@ -294,6 +294,10 @@ template <class BlockT, class LoopT> class LoopBase {
/// Otherwise return null.
BlockT *getUniqueExitBlock() const;

/// Return the unique exit block for the latch, or null if there are multiple
/// different exit blocks.
BlockT *getUniqueLatchExitBlock() const;

/// Return true if this loop does not have any exit blocks.
bool hasNoExitBlocks() const;

Expand Down
10 changes: 10 additions & 0 deletions llvm/include/llvm/Support/GenericLoopInfoImpl.h
Original file line number Diff line number Diff line change
Expand Up @@ -159,6 +159,16 @@ BlockT *LoopBase<BlockT, LoopT>::getUniqueExitBlock() const {
return getExitBlockHelper(this, true).first;
}

template <class BlockT, class LoopT>
BlockT *LoopBase<BlockT, LoopT>::getUniqueLatchExitBlock() const {
const BlockT *Latch = getLoopLatch();
assert(Latch && "Latch block must exists");
SmallVector<BlockT *, 4> ExitBlocks;
getUniqueExitBlocksHelper(this, ExitBlocks,
[Latch](const BlockT *BB) { return BB == Latch; });
return ExitBlocks.size() == 1 ? ExitBlocks[0] : nullptr;
}

/// getExitEdges - Return all pairs of (_inside_block_,_outside_block_).
template <class BlockT, class LoopT>
void LoopBase<BlockT, LoopT>::getExitEdges(
Expand Down
20 changes: 15 additions & 5 deletions llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,12 @@ static cl::opt<bool> EnableHistogramVectorization(
"enable-histogram-loop-vectorization", cl::init(false), cl::Hidden,
cl::desc("Enables autovectorization of some loops containing histograms"));

static cl::opt<bool> AssumeNoMemFault(
"vectorizer-no-mem-fault", cl::init(false), cl::Hidden,
cl::desc("Assume vectorized loops will not have memory faults, which is "
"potentially unsafe but can be useful for testing vectorization "
"of early exit loops."));

/// Maximum vectorization interleave count.
static const unsigned MaxInterleaveFactor = 16;

Expand Down Expand Up @@ -1710,11 +1716,15 @@ bool LoopVectorizationLegality::isVectorizableEarlyExitLoop() {
Predicates.clear();
if (!isDereferenceableReadOnlyLoop(TheLoop, PSE.getSE(), DT, AC,
&Predicates)) {
reportVectorizationFailure(
"Loop may fault",
"Cannot vectorize potentially faulting early exit loop",
"PotentiallyFaultingEarlyExitLoop", ORE, TheLoop);
return false;
if (!AssumeNoMemFault) {
reportVectorizationFailure(
"Loop may fault",
"Cannot vectorize potentially faulting early exit loop",
"PotentiallyFaultingEarlyExitLoop", ORE, TheLoop);
return false;
} else
LLVM_DEBUG(dbgs() << "LV: Assuming early exit vector loop will not "
<< "fault\n");
}

[[maybe_unused]] const SCEV *SymbolicMaxBTC =
Expand Down
Loading

0 comments on commit 39185b4

Please sign in to comment.