LAA: be less conservative in isNoWrap #112553

artagnon · 2024-10-16T14:35:29Z

isNoWrap has exactly one caller which handles Assume = true separately, but too conservatively. Instead, pass Assume to isNoWrap, so it is threaded into getPtrStride, which has the correct handling for the Assume flag. Also note that the Stride == 1 check in isNoWrap is incorrect: getPtrStride returns Strides == 1 or -1, except when isNoWrapAddRec or Assume are true, assuming ShouldCheckWrap is true; we can include the case of -1 Stride, and when isNoWrapAddRec is true. With this change, passing Assume = true to getPtrStride could return a non-unit stride, and we correctly handle that case as well.

llvmbot · 2024-10-16T14:36:03Z

@llvm/pr-subscribers-llvm-transforms

Author: Ramkumar Ramachandra (artagnon)

Changes

isNoWrap has exactly one caller which handles Assume = true separately, but too conservatively. Instead, pass Assume to isNoWrap, so it is threaded into getPtrStride, which has the correct handling for the Assume flag. Also note that the Stride == 1 check in isNoWrap is incorrect: getPtrStride returns Strides == 1 or -1, except when isNoWrapAddRec or Assume are true; we can include the case of -1 Stride, and when isNoWrapAddRec is true. With this change, passing Assume = true to getPtrStride could return a non-unit stride, and we correctly handle that case as well.

-- 8< --
Based on #112544.

Full diff: https://github.com/llvm/llvm-project/pull/112553.diff

4 Files Affected:

(modified) llvm/lib/Analysis/LoopAccessAnalysis.cpp (+4-9)
(modified) llvm/test/Analysis/LoopAccessAnalysis/offset-range-known-via-assume.ll (-1)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/induction-costs.ll (+5-43)
(modified) llvm/test/Transforms/LoopVersioning/wrapping-pointer-non-integral-addrspace.ll (+86-33)

diff --git a/llvm/lib/Analysis/LoopAccessAnalysis.cpp b/llvm/lib/Analysis/LoopAccessAnalysis.cpp
index d35bf6818d4379..9a7b8d2ac28bf1 100644
--- a/llvm/lib/Analysis/LoopAccessAnalysis.cpp
+++ b/llvm/lib/Analysis/LoopAccessAnalysis.cpp
@@ -826,13 +826,12 @@ static bool hasComputableBounds(PredicatedScalarEvolution &PSE, Value *Ptr,
 /// Check whether a pointer address cannot wrap.
 static bool isNoWrap(PredicatedScalarEvolution &PSE,
                      const DenseMap<Value *, const SCEV *> &Strides, Value *Ptr, Type *AccessTy,
-                     Loop *L) {
+                     Loop *L, bool Assume) {
   const SCEV *PtrScev = PSE.getSCEV(Ptr);
   if (PSE.getSE()->isLoopInvariant(PtrScev, L))
     return true;
 
-  int64_t Stride = getPtrStride(PSE, AccessTy, Ptr, L, Strides).value_or(0);
-  return Stride == 1 ||
+  return getPtrStride(PSE, AccessTy, Ptr, L, Strides, Assume).has_value() ||
          PSE.hasNoOverflow(Ptr, SCEVWrapPredicate::IncrementNUSW);
 }
 
@@ -1079,12 +1078,8 @@ bool AccessAnalysis::createCheckForAccess(RuntimePointerChecking &RtCheck,
       if (TranslatedPtrs.size() > 1)
         return false;
 
-      if (!isNoWrap(PSE, StridesMap, Ptr, AccessTy, TheLoop)) {
-        const SCEV *Expr = PSE.getSCEV(Ptr);
-        if (!Assume || !isa<SCEVAddRecExpr>(Expr))
-          return false;
-        PSE.setNoOverflow(Ptr, SCEVWrapPredicate::IncrementNUSW);
-      }
+      if (!isNoWrap(PSE, StridesMap, Ptr, AccessTy, TheLoop, Assume))
+        return false;
     }
     // If there's only one option for Ptr, look it up after bounds and wrap
     // checking, because assumptions might have been added to PSE.
diff --git a/llvm/test/Analysis/LoopAccessAnalysis/offset-range-known-via-assume.ll b/llvm/test/Analysis/LoopAccessAnalysis/offset-range-known-via-assume.ll
index c358b00dad22a6..202f975190209a 100644
--- a/llvm/test/Analysis/LoopAccessAnalysis/offset-range-known-via-assume.ll
+++ b/llvm/test/Analysis/LoopAccessAnalysis/offset-range-known-via-assume.ll
@@ -69,7 +69,6 @@ define void @offset_i32_known_positive_via_assume_forward_dep_1(ptr %A, i64 %off
 ; CHECK-EMPTY:
 ; CHECK-NEXT:      Non vectorizable stores to invariant address were not found in loop.
 ; CHECK-NEXT:      SCEV assumptions:
-; CHECK-NEXT:      {((4 * %offset)<nsw> + %A),+,4}<nw><%loop> Added Flags: <nusw>
 ; CHECK-EMPTY:
 ; CHECK-NEXT:      Expressions re-written:
 ;
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/induction-costs.ll b/llvm/test/Transforms/LoopVectorize/RISCV/induction-costs.ll
index 09933ad4255ed6..49e9abcd9f9192 100644
--- a/llvm/test/Transforms/LoopVectorize/RISCV/induction-costs.ll
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/induction-costs.ll
@@ -22,47 +22,9 @@ define void @skip_free_iv_truncate(i16 %x, ptr %A) #0 {
 ; CHECK-NEXT:    [[TMP5:%.*]] = add i64 [[TMP4]], 1
 ; CHECK-NEXT:    [[TMP6:%.*]] = call i64 @llvm.vscale.i64()
 ; CHECK-NEXT:    [[TMP7:%.*]] = mul i64 [[TMP6]], 8
-; CHECK-NEXT:    [[TMP8:%.*]] = call i64 @llvm.umax.i64(i64 288, i64 [[TMP7]])
+; CHECK-NEXT:    [[TMP8:%.*]] = call i64 @llvm.umax.i64(i64 128, i64 [[TMP7]])
 ; CHECK-NEXT:    [[MIN_ITERS_CHECK:%.*]] = icmp ule i64 [[TMP5]], [[TMP8]]
-; CHECK-NEXT:    br i1 [[MIN_ITERS_CHECK]], label %[[SCALAR_PH:.*]], label %[[VECTOR_SCEVCHECK:.*]]
-; CHECK:       [[VECTOR_SCEVCHECK]]:
-; CHECK-NEXT:    [[SMAX:%.*]] = call i64 @llvm.smax.i64(i64 [[X_I64]], i64 99)
-; CHECK-NEXT:    [[TMP9:%.*]] = sub i64 [[SMAX]], [[X_I64]]
-; CHECK-NEXT:    [[UMIN:%.*]] = call i64 @llvm.umin.i64(i64 [[TMP9]], i64 1)
-; CHECK-NEXT:    [[TMP10:%.*]] = sub i64 [[SMAX]], [[UMIN]]
-; CHECK-NEXT:    [[TMP11:%.*]] = sub i64 [[TMP10]], [[X_I64]]
-; CHECK-NEXT:    [[TMP12:%.*]] = udiv i64 [[TMP11]], 3
-; CHECK-NEXT:    [[TMP13:%.*]] = add i64 [[UMIN]], [[TMP12]]
-; CHECK-NEXT:    [[TMP14:%.*]] = shl nsw i64 [[X_I64]], 1
-; CHECK-NEXT:    [[SCEVGEP:%.*]] = getelementptr i8, ptr [[A]], i64 [[TMP14]]
-; CHECK-NEXT:    [[MUL:%.*]] = call { i64, i1 } @llvm.umul.with.overflow.i64(i64 6, i64 [[TMP13]])
-; CHECK-NEXT:    [[MUL_RESULT:%.*]] = extractvalue { i64, i1 } [[MUL]], 0
-; CHECK-NEXT:    [[MUL_OVERFLOW:%.*]] = extractvalue { i64, i1 } [[MUL]], 1
-; CHECK-NEXT:    [[TMP15:%.*]] = sub i64 0, [[MUL_RESULT]]
-; CHECK-NEXT:    [[TMP16:%.*]] = getelementptr i8, ptr [[SCEVGEP]], i64 [[MUL_RESULT]]
-; CHECK-NEXT:    [[TMP17:%.*]] = icmp ult ptr [[TMP16]], [[SCEVGEP]]
-; CHECK-NEXT:    [[TMP18:%.*]] = or i1 [[TMP17]], [[MUL_OVERFLOW]]
-; CHECK-NEXT:    [[TMP19:%.*]] = shl nsw i64 [[X_I64]], 3
-; CHECK-NEXT:    [[SCEVGEP1:%.*]] = getelementptr i8, ptr [[A]], i64 [[TMP19]]
-; CHECK-NEXT:    [[MUL2:%.*]] = call { i64, i1 } @llvm.umul.with.overflow.i64(i64 24, i64 [[TMP13]])
-; CHECK-NEXT:    [[MUL_RESULT3:%.*]] = extractvalue { i64, i1 } [[MUL2]], 0
-; CHECK-NEXT:    [[MUL_OVERFLOW4:%.*]] = extractvalue { i64, i1 } [[MUL2]], 1
-; CHECK-NEXT:    [[TMP20:%.*]] = sub i64 0, [[MUL_RESULT3]]
-; CHECK-NEXT:    [[TMP21:%.*]] = getelementptr i8, ptr [[SCEVGEP1]], i64 [[MUL_RESULT3]]
-; CHECK-NEXT:    [[TMP22:%.*]] = icmp ult ptr [[TMP21]], [[SCEVGEP1]]
-; CHECK-NEXT:    [[TMP23:%.*]] = or i1 [[TMP22]], [[MUL_OVERFLOW4]]
-; CHECK-NEXT:    [[TMP24:%.*]] = add nsw i64 [[TMP19]], -8
-; CHECK-NEXT:    [[SCEVGEP5:%.*]] = getelementptr i8, ptr [[A]], i64 [[TMP24]]
-; CHECK-NEXT:    [[MUL6:%.*]] = call { i64, i1 } @llvm.umul.with.overflow.i64(i64 24, i64 [[TMP13]])
-; CHECK-NEXT:    [[MUL_RESULT7:%.*]] = extractvalue { i64, i1 } [[MUL6]], 0
-; CHECK-NEXT:    [[MUL_OVERFLOW8:%.*]] = extractvalue { i64, i1 } [[MUL6]], 1
-; CHECK-NEXT:    [[TMP25:%.*]] = sub i64 0, [[MUL_RESULT7]]
-; CHECK-NEXT:    [[TMP26:%.*]] = getelementptr i8, ptr [[SCEVGEP5]], i64 [[MUL_RESULT7]]
-; CHECK-NEXT:    [[TMP27:%.*]] = icmp ult ptr [[TMP26]], [[SCEVGEP5]]
-; CHECK-NEXT:    [[TMP28:%.*]] = or i1 [[TMP27]], [[MUL_OVERFLOW8]]
-; CHECK-NEXT:    [[TMP29:%.*]] = or i1 [[TMP18]], [[TMP23]]
-; CHECK-NEXT:    [[TMP30:%.*]] = or i1 [[TMP29]], [[TMP28]]
-; CHECK-NEXT:    br i1 [[TMP30]], label %[[SCALAR_PH]], label %[[VECTOR_MEMCHECK:.*]]
+; CHECK-NEXT:    br i1 [[MIN_ITERS_CHECK]], label %[[SCALAR_PH:.*]], label %[[VECTOR_MEMCHECK:.*]]
 ; CHECK:       [[VECTOR_MEMCHECK]]:
 ; CHECK-NEXT:    [[TMP31:%.*]] = shl nsw i64 [[X_I64]], 1
 ; CHECK-NEXT:    [[SCEVGEP9:%.*]] = getelementptr i8, ptr [[A]], i64 [[TMP31]]
@@ -130,12 +92,12 @@ define void @skip_free_iv_truncate(i16 %x, ptr %A) #0 {
 ; CHECK:       [[MIDDLE_BLOCK]]:
 ; CHECK-NEXT:    br label %[[SCALAR_PH]]
 ; CHECK:       [[SCALAR_PH]]:
-; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ [[IND_END]], %[[MIDDLE_BLOCK]] ], [ [[X_I64]], %[[ENTRY]] ], [ [[X_I64]], %[[VECTOR_SCEVCHECK]] ], [ [[X_I64]], %[[VECTOR_MEMCHECK]] ]
-; CHECK-NEXT:    [[BC_RESUME_VAL23:%.*]] = phi i32 [ [[IND_END22]], %[[MIDDLE_BLOCK]] ], [ [[X_I32]], %[[ENTRY]] ], [ [[X_I32]], %[[VECTOR_SCEVCHECK]] ], [ [[X_I32]], %[[VECTOR_MEMCHECK]] ]
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ [[IND_END]], %[[MIDDLE_BLOCK]] ], [ [[X_I64]], %[[ENTRY]] ], [ [[X_I64]], %[[VECTOR_MEMCHECK]] ]
+; CHECK-NEXT:    [[BC_RESUME_VAL12:%.*]] = phi i32 [ [[IND_END22]], %[[MIDDLE_BLOCK]] ], [ [[X_I32]], %[[ENTRY]] ], [ [[X_I32]], %[[VECTOR_MEMCHECK]] ]
 ; CHECK-NEXT:    br label %[[LOOP:.*]]
 ; CHECK:       [[LOOP]]:
 ; CHECK-NEXT:    [[IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], %[[LOOP]] ]
-; CHECK-NEXT:    [[IV_CONV:%.*]] = phi i32 [ [[BC_RESUME_VAL23]], %[[SCALAR_PH]] ], [ [[TMP64:%.*]], %[[LOOP]] ]
+; CHECK-NEXT:    [[IV_CONV:%.*]] = phi i32 [ [[BC_RESUME_VAL12]], %[[SCALAR_PH]] ], [ [[TMP64:%.*]], %[[LOOP]] ]
 ; CHECK-NEXT:    [[GEP_I64:%.*]] = getelementptr i64, ptr [[A]], i64 [[IV]]
 ; CHECK-NEXT:    [[TMP61:%.*]] = load i64, ptr [[GEP_I64]], align 8
 ; CHECK-NEXT:    [[TMP62:%.*]] = sext i32 [[IV_CONV]] to i64
diff --git a/llvm/test/Transforms/LoopVersioning/wrapping-pointer-non-integral-addrspace.ll b/llvm/test/Transforms/LoopVersioning/wrapping-pointer-non-integral-addrspace.ll
index 430baa1cb4f8c1..e3cb3831d7938e 100644
--- a/llvm/test/Transforms/LoopVersioning/wrapping-pointer-non-integral-addrspace.ll
+++ b/llvm/test/Transforms/LoopVersioning/wrapping-pointer-non-integral-addrspace.ll
@@ -1,4 +1,5 @@
-; RUN: opt -passes=loop-versioning -S < %s | FileCheck %s -check-prefix=LV
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt -passes=loop-versioning -S < %s | FileCheck %s
 
 ; NB: addrspaces 10-13 are non-integral
 target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128-ni:10:11:12:13"
@@ -12,40 +13,92 @@ target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128-ni:10:11:12:13"
 
 declare i64 @julia_steprange_last_4949()
 
-define void @"japi1_align!_9477"(ptr %arg) {
-; LV-LAVEL: L26.lver.check
-; LV: [[OFMul:%[^ ]*]]  = call { i64, i1 } @llvm.umul.with.overflow.i64(i64 4, i64 [[Step:%[^ ]*]])
-; LV-NEXT: [[OFMulResult:%[^ ]*]] = extractvalue { i64, i1 } [[OFMul]], 0
-; LV-NEXT: [[OFMulOverflow:%[^ ]*]] = extractvalue { i64, i1 } [[OFMul]], 1
-; LV: [[OFNegMulResult:%[^ ]*]] = sub i64 0, [[OFMulResult]]
-; LV-NEXT: [[NegGEP:%[^ ]*]] = getelementptr i8, ptr addrspace(13) [[Base:%[^ ]*]], i64 [[OFNegMulResult]]
-; LV-NEXT: icmp ugt ptr addrspace(13) [[NegGEP]], [[Base]]
-; LV-NOT: inttoptr
-; LV-NOT: ptrtoint
+define void @wrapping_ptr_nonint_addrspace(ptr %arg) {
+; CHECK-LABEL: define void @wrapping_ptr_nonint_addrspace(
+; CHECK-SAME: ptr [[ARG:%.*]]) {
+; CHECK-NEXT:  [[LOOP_LVER_CHECK:.*:]]
+; CHECK-NEXT:    [[LOAD0:%.*]] = load ptr addrspace(10), ptr [[ARG]], align 8
+; CHECK-NEXT:    [[LOAD1:%.*]] = load i32, ptr inttoptr (i64 12 to ptr), align 4
+; CHECK-NEXT:    [[SUB:%.*]] = sub i32 0, [[LOAD1]]
+; CHECK-NEXT:    [[CALL:%.*]] = call i64 @julia_steprange_last_4949()
+; CHECK-NEXT:    [[CAST0:%.*]] = addrspacecast ptr addrspace(10) [[LOAD0]] to ptr addrspace(11)
+; CHECK-NEXT:    [[LOAD2:%.*]] = load ptr addrspace(10), ptr addrspace(11) [[CAST0]], align 8
+; CHECK-NEXT:    [[CAST1:%.*]] = addrspacecast ptr addrspace(10) [[LOAD2]] to ptr addrspace(11)
+; CHECK-NEXT:    [[LOAD3:%.*]] = load ptr addrspace(13), ptr addrspace(11) [[CAST1]], align 8
+; CHECK-NEXT:    [[SEXT:%.*]] = sext i32 [[SUB]] to i64
+; CHECK-NEXT:    [[TMP0:%.*]] = shl i64 [[CALL]], 2
+; CHECK-NEXT:    [[TMP1:%.*]] = shl nsw i64 [[SEXT]], 2
+; CHECK-NEXT:    [[TMP2:%.*]] = add i64 [[TMP0]], [[TMP1]]
+; CHECK-NEXT:    [[TMP3:%.*]] = add i64 [[TMP2]], -4
+; CHECK-NEXT:    [[SCEVGEP:%.*]] = getelementptr i8, ptr addrspace(13) [[LOAD3]], i64 [[TMP3]]
+; CHECK-NEXT:    [[SCEVGEP1:%.*]] = getelementptr i8, ptr addrspace(13) [[LOAD3]], i64 [[TMP1]]
+; CHECK-NEXT:    [[TMP4:%.*]] = add i64 [[TMP0]], -4
+; CHECK-NEXT:    [[SCEVGEP2:%.*]] = getelementptr i8, ptr addrspace(13) [[LOAD3]], i64 [[TMP4]]
+; CHECK-NEXT:    [[BOUND0:%.*]] = icmp ult ptr addrspace(13) [[SCEVGEP]], [[LOAD3]]
+; CHECK-NEXT:    [[BOUND1:%.*]] = icmp ult ptr addrspace(13) [[SCEVGEP2]], [[SCEVGEP1]]
+; CHECK-NEXT:    [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
+; CHECK-NEXT:    br i1 [[FOUND_CONFLICT]], label %[[LOOP_PH_LVER_ORIG:.*]], label %[[LOOP_PH:.*]]
+; CHECK:       [[LOOP_PH_LVER_ORIG]]:
+; CHECK-NEXT:    br label %[[LOOP_LVER_ORIG:.*]]
+; CHECK:       [[LOOP_LVER_ORIG]]:
+; CHECK-NEXT:    [[VALUE_PHI3_LVER_ORIG:%.*]] = phi i64 [ 0, %[[LOOP_PH_LVER_ORIG]] ], [ [[ADD0_LVER_ORIG:%.*]], %[[LOOP_LVER_ORIG]] ]
+; CHECK-NEXT:    [[ADD0_LVER_ORIG]] = add i64 [[VALUE_PHI3_LVER_ORIG]], -1
+; CHECK-NEXT:    [[GEP0_LVER_ORIG:%.*]] = getelementptr inbounds i32, ptr addrspace(13) [[LOAD3]], i64 [[ADD0_LVER_ORIG]]
+; CHECK-NEXT:    [[LOAD4_LVER_ORIG:%.*]] = load i32, ptr addrspace(13) [[GEP0_LVER_ORIG]], align 4
+; CHECK-NEXT:    [[ADD1_LVER_ORIG:%.*]] = add i64 [[ADD0_LVER_ORIG]], [[SEXT]]
+; CHECK-NEXT:    [[GEP1_LVER_ORIG:%.*]] = getelementptr inbounds i32, ptr addrspace(13) [[LOAD3]], i64 [[ADD1_LVER_ORIG]]
+; CHECK-NEXT:    store i32 [[LOAD4_LVER_ORIG]], ptr addrspace(13) [[GEP1_LVER_ORIG]], align 4
+; CHECK-NEXT:    [[CMP_LVER_ORIG:%.*]] = icmp eq i64 [[VALUE_PHI3_LVER_ORIG]], [[CALL]]
+; CHECK-NEXT:    br i1 [[CMP_LVER_ORIG]], label %[[EXIT_LOOPEXIT:.*]], label %[[LOOP_LVER_ORIG]]
+; CHECK:       [[LOOP_PH]]:
+; CHECK-NEXT:    br label %[[LOOP:.*]]
+; CHECK:       [[LOOP]]:
+; CHECK-NEXT:    [[VALUE_PHI3:%.*]] = phi i64 [ 0, %[[LOOP_PH]] ], [ [[ADD0:%.*]], %[[LOOP]] ]
+; CHECK-NEXT:    [[ADD0]] = add i64 [[VALUE_PHI3]], -1
+; CHECK-NEXT:    [[GEP0:%.*]] = getelementptr inbounds i32, ptr addrspace(13) [[LOAD3]], i64 [[ADD0]]
+; CHECK-NEXT:    [[LOAD4:%.*]] = load i32, ptr addrspace(13) [[GEP0]], align 4, !alias.scope [[META0:![0-9]+]]
+; CHECK-NEXT:    [[ADD1:%.*]] = add i64 [[ADD0]], [[SEXT]]
+; CHECK-NEXT:    [[GEP1:%.*]] = getelementptr inbounds i32, ptr addrspace(13) [[LOAD3]], i64 [[ADD1]]
+; CHECK-NEXT:    store i32 [[LOAD4]], ptr addrspace(13) [[GEP1]], align 4, !alias.scope [[META3:![0-9]+]], !noalias [[META0]]
+; CHECK-NEXT:    [[CMP:%.*]] = icmp eq i64 [[VALUE_PHI3]], [[CALL]]
+; CHECK-NEXT:    br i1 [[CMP]], label %[[EXIT_LOOPEXIT3:.*]], label %[[LOOP]]
+; CHECK:       [[EXIT_LOOPEXIT]]:
+; CHECK-NEXT:    br label %[[EXIT:.*]]
+; CHECK:       [[EXIT_LOOPEXIT3]]:
+; CHECK-NEXT:    br label %[[EXIT]]
+; CHECK:       [[EXIT]]:
+; CHECK-NEXT:    ret void
+;
 top:
-  %tmp = load ptr addrspace(10), ptr %arg, align 8
-  %tmp1 = load i32, ptr inttoptr (i64 12 to ptr), align 4
-  %tmp2 = sub i32 0, %tmp1
-  %tmp3 = call i64 @julia_steprange_last_4949()
-  %tmp4 = addrspacecast ptr addrspace(10) %tmp to ptr addrspace(11)
-  %tmp6 = load ptr addrspace(10), ptr addrspace(11) %tmp4, align 8
-  %tmp7 = addrspacecast ptr addrspace(10) %tmp6 to ptr addrspace(11)
-  %tmp9 = load ptr addrspace(13), ptr addrspace(11) %tmp7, align 8
-  %tmp10 = sext i32 %tmp2 to i64
-  br label %L26
+  %load0 = load ptr addrspace(10), ptr %arg, align 8
+  %load1 = load i32, ptr inttoptr (i64 12 to ptr), align 4
+  %sub = sub i32 0, %load1
+  %call = call i64 @julia_steprange_last_4949()
+  %cast0 = addrspacecast ptr addrspace(10) %load0 to ptr addrspace(11)
+  %load2 = load ptr addrspace(10), ptr addrspace(11) %cast0, align 8
+  %cast1 = addrspacecast ptr addrspace(10) %load2 to ptr addrspace(11)
+  %load3 = load ptr addrspace(13), ptr addrspace(11) %cast1, align 8
+  %sext = sext i32 %sub to i64
+  br label %loop
 
-L26:
-  %value_phi3 = phi i64 [ 0, %top ], [ %tmp11, %L26 ]
-  %tmp11 = add i64 %value_phi3, -1
-  %tmp12 = getelementptr inbounds i32, ptr addrspace(13) %tmp9, i64 %tmp11
-  %tmp13 = load i32, ptr addrspace(13) %tmp12, align 4
-  %tmp14 = add i64 %tmp11, %tmp10
-  %tmp15 = getelementptr inbounds i32, ptr addrspace(13) %tmp9, i64 %tmp14
-  store i32 %tmp13, ptr addrspace(13) %tmp15, align 4
-  %tmp16 = icmp eq i64 %value_phi3, %tmp3
-  br i1 %tmp16, label %L45, label %L26
+loop:
+  %value_phi3 = phi i64 [ 0, %top ], [ %add0, %loop ]
+  %add0 = add i64 %value_phi3, -1
+  %gep0 = getelementptr inbounds i32, ptr addrspace(13) %load3, i64 %add0
+  %load4 = load i32, ptr addrspace(13) %gep0, align 4
+  %add1 = add i64 %add0, %sext
+  %gep1 = getelementptr inbounds i32, ptr addrspace(13) %load3, i64 %add1
+  store i32 %load4, ptr addrspace(13) %gep1, align 4
+  %cmp = icmp eq i64 %value_phi3, %call
+  br i1 %cmp, label %exit, label %loop
 
-L45:
+exit:
   ret void
 }
-
+;.
+; CHECK: [[META0]] = !{[[META1:![0-9]+]]}
+; CHECK: [[META1]] = distinct !{[[META1]], [[META2:![0-9]+]]}
+; CHECK: [[META2]] = distinct !{[[META2]], !"LVerDomain"}
+; CHECK: [[META3]] = !{[[META4:![0-9]+]]}
+; CHECK: [[META4]] = distinct !{[[META4]], [[META2]]}
+;.

llvmbot · 2024-10-16T14:36:04Z

@llvm/pr-subscribers-llvm-analysis

Author: Ramkumar Ramachandra (artagnon)

Changes

isNoWrap has exactly one caller which handles Assume = true separately, but too conservatively. Instead, pass Assume to isNoWrap, so it is threaded into getPtrStride, which has the correct handling for the Assume flag. Also note that the Stride == 1 check in isNoWrap is incorrect: getPtrStride returns Strides == 1 or -1, except when isNoWrapAddRec or Assume are true; we can include the case of -1 Stride, and when isNoWrapAddRec is true. With this change, passing Assume = true to getPtrStride could return a non-unit stride, and we correctly handle that case as well.

-- 8< --
Based on #112544.

Full diff: https://github.com/llvm/llvm-project/pull/112553.diff

4 Files Affected:

(modified) llvm/lib/Analysis/LoopAccessAnalysis.cpp (+4-9)
(modified) llvm/test/Analysis/LoopAccessAnalysis/offset-range-known-via-assume.ll (-1)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/induction-costs.ll (+5-43)
(modified) llvm/test/Transforms/LoopVersioning/wrapping-pointer-non-integral-addrspace.ll (+86-33)

diff --git a/llvm/lib/Analysis/LoopAccessAnalysis.cpp b/llvm/lib/Analysis/LoopAccessAnalysis.cpp
index d35bf6818d4379..9a7b8d2ac28bf1 100644
--- a/llvm/lib/Analysis/LoopAccessAnalysis.cpp
+++ b/llvm/lib/Analysis/LoopAccessAnalysis.cpp
@@ -826,13 +826,12 @@ static bool hasComputableBounds(PredicatedScalarEvolution &PSE, Value *Ptr,
 /// Check whether a pointer address cannot wrap.
 static bool isNoWrap(PredicatedScalarEvolution &PSE,
                      const DenseMap<Value *, const SCEV *> &Strides, Value *Ptr, Type *AccessTy,
-                     Loop *L) {
+                     Loop *L, bool Assume) {
   const SCEV *PtrScev = PSE.getSCEV(Ptr);
   if (PSE.getSE()->isLoopInvariant(PtrScev, L))
     return true;
 
-  int64_t Stride = getPtrStride(PSE, AccessTy, Ptr, L, Strides).value_or(0);
-  return Stride == 1 ||
+  return getPtrStride(PSE, AccessTy, Ptr, L, Strides, Assume).has_value() ||
          PSE.hasNoOverflow(Ptr, SCEVWrapPredicate::IncrementNUSW);
 }
 
@@ -1079,12 +1078,8 @@ bool AccessAnalysis::createCheckForAccess(RuntimePointerChecking &RtCheck,
       if (TranslatedPtrs.size() > 1)
         return false;
 
-      if (!isNoWrap(PSE, StridesMap, Ptr, AccessTy, TheLoop)) {
-        const SCEV *Expr = PSE.getSCEV(Ptr);
-        if (!Assume || !isa<SCEVAddRecExpr>(Expr))
-          return false;
-        PSE.setNoOverflow(Ptr, SCEVWrapPredicate::IncrementNUSW);
-      }
+      if (!isNoWrap(PSE, StridesMap, Ptr, AccessTy, TheLoop, Assume))
+        return false;
     }
     // If there's only one option for Ptr, look it up after bounds and wrap
     // checking, because assumptions might have been added to PSE.
diff --git a/llvm/test/Analysis/LoopAccessAnalysis/offset-range-known-via-assume.ll b/llvm/test/Analysis/LoopAccessAnalysis/offset-range-known-via-assume.ll
index c358b00dad22a6..202f975190209a 100644
--- a/llvm/test/Analysis/LoopAccessAnalysis/offset-range-known-via-assume.ll
+++ b/llvm/test/Analysis/LoopAccessAnalysis/offset-range-known-via-assume.ll
@@ -69,7 +69,6 @@ define void @offset_i32_known_positive_via_assume_forward_dep_1(ptr %A, i64 %off
 ; CHECK-EMPTY:
 ; CHECK-NEXT:      Non vectorizable stores to invariant address were not found in loop.
 ; CHECK-NEXT:      SCEV assumptions:
-; CHECK-NEXT:      {((4 * %offset)<nsw> + %A),+,4}<nw><%loop> Added Flags: <nusw>
 ; CHECK-EMPTY:
 ; CHECK-NEXT:      Expressions re-written:
 ;
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/induction-costs.ll b/llvm/test/Transforms/LoopVectorize/RISCV/induction-costs.ll
index 09933ad4255ed6..49e9abcd9f9192 100644
--- a/llvm/test/Transforms/LoopVectorize/RISCV/induction-costs.ll
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/induction-costs.ll
@@ -22,47 +22,9 @@ define void @skip_free_iv_truncate(i16 %x, ptr %A) #0 {
 ; CHECK-NEXT:    [[TMP5:%.*]] = add i64 [[TMP4]], 1
 ; CHECK-NEXT:    [[TMP6:%.*]] = call i64 @llvm.vscale.i64()
 ; CHECK-NEXT:    [[TMP7:%.*]] = mul i64 [[TMP6]], 8
-; CHECK-NEXT:    [[TMP8:%.*]] = call i64 @llvm.umax.i64(i64 288, i64 [[TMP7]])
+; CHECK-NEXT:    [[TMP8:%.*]] = call i64 @llvm.umax.i64(i64 128, i64 [[TMP7]])
 ; CHECK-NEXT:    [[MIN_ITERS_CHECK:%.*]] = icmp ule i64 [[TMP5]], [[TMP8]]
-; CHECK-NEXT:    br i1 [[MIN_ITERS_CHECK]], label %[[SCALAR_PH:.*]], label %[[VECTOR_SCEVCHECK:.*]]
-; CHECK:       [[VECTOR_SCEVCHECK]]:
-; CHECK-NEXT:    [[SMAX:%.*]] = call i64 @llvm.smax.i64(i64 [[X_I64]], i64 99)
-; CHECK-NEXT:    [[TMP9:%.*]] = sub i64 [[SMAX]], [[X_I64]]
-; CHECK-NEXT:    [[UMIN:%.*]] = call i64 @llvm.umin.i64(i64 [[TMP9]], i64 1)
-; CHECK-NEXT:    [[TMP10:%.*]] = sub i64 [[SMAX]], [[UMIN]]
-; CHECK-NEXT:    [[TMP11:%.*]] = sub i64 [[TMP10]], [[X_I64]]
-; CHECK-NEXT:    [[TMP12:%.*]] = udiv i64 [[TMP11]], 3
-; CHECK-NEXT:    [[TMP13:%.*]] = add i64 [[UMIN]], [[TMP12]]
-; CHECK-NEXT:    [[TMP14:%.*]] = shl nsw i64 [[X_I64]], 1
-; CHECK-NEXT:    [[SCEVGEP:%.*]] = getelementptr i8, ptr [[A]], i64 [[TMP14]]
-; CHECK-NEXT:    [[MUL:%.*]] = call { i64, i1 } @llvm.umul.with.overflow.i64(i64 6, i64 [[TMP13]])
-; CHECK-NEXT:    [[MUL_RESULT:%.*]] = extractvalue { i64, i1 } [[MUL]], 0
-; CHECK-NEXT:    [[MUL_OVERFLOW:%.*]] = extractvalue { i64, i1 } [[MUL]], 1
-; CHECK-NEXT:    [[TMP15:%.*]] = sub i64 0, [[MUL_RESULT]]
-; CHECK-NEXT:    [[TMP16:%.*]] = getelementptr i8, ptr [[SCEVGEP]], i64 [[MUL_RESULT]]
-; CHECK-NEXT:    [[TMP17:%.*]] = icmp ult ptr [[TMP16]], [[SCEVGEP]]
-; CHECK-NEXT:    [[TMP18:%.*]] = or i1 [[TMP17]], [[MUL_OVERFLOW]]
-; CHECK-NEXT:    [[TMP19:%.*]] = shl nsw i64 [[X_I64]], 3
-; CHECK-NEXT:    [[SCEVGEP1:%.*]] = getelementptr i8, ptr [[A]], i64 [[TMP19]]
-; CHECK-NEXT:    [[MUL2:%.*]] = call { i64, i1 } @llvm.umul.with.overflow.i64(i64 24, i64 [[TMP13]])
-; CHECK-NEXT:    [[MUL_RESULT3:%.*]] = extractvalue { i64, i1 } [[MUL2]], 0
-; CHECK-NEXT:    [[MUL_OVERFLOW4:%.*]] = extractvalue { i64, i1 } [[MUL2]], 1
-; CHECK-NEXT:    [[TMP20:%.*]] = sub i64 0, [[MUL_RESULT3]]
-; CHECK-NEXT:    [[TMP21:%.*]] = getelementptr i8, ptr [[SCEVGEP1]], i64 [[MUL_RESULT3]]
-; CHECK-NEXT:    [[TMP22:%.*]] = icmp ult ptr [[TMP21]], [[SCEVGEP1]]
-; CHECK-NEXT:    [[TMP23:%.*]] = or i1 [[TMP22]], [[MUL_OVERFLOW4]]
-; CHECK-NEXT:    [[TMP24:%.*]] = add nsw i64 [[TMP19]], -8
-; CHECK-NEXT:    [[SCEVGEP5:%.*]] = getelementptr i8, ptr [[A]], i64 [[TMP24]]
-; CHECK-NEXT:    [[MUL6:%.*]] = call { i64, i1 } @llvm.umul.with.overflow.i64(i64 24, i64 [[TMP13]])
-; CHECK-NEXT:    [[MUL_RESULT7:%.*]] = extractvalue { i64, i1 } [[MUL6]], 0
-; CHECK-NEXT:    [[MUL_OVERFLOW8:%.*]] = extractvalue { i64, i1 } [[MUL6]], 1
-; CHECK-NEXT:    [[TMP25:%.*]] = sub i64 0, [[MUL_RESULT7]]
-; CHECK-NEXT:    [[TMP26:%.*]] = getelementptr i8, ptr [[SCEVGEP5]], i64 [[MUL_RESULT7]]
-; CHECK-NEXT:    [[TMP27:%.*]] = icmp ult ptr [[TMP26]], [[SCEVGEP5]]
-; CHECK-NEXT:    [[TMP28:%.*]] = or i1 [[TMP27]], [[MUL_OVERFLOW8]]
-; CHECK-NEXT:    [[TMP29:%.*]] = or i1 [[TMP18]], [[TMP23]]
-; CHECK-NEXT:    [[TMP30:%.*]] = or i1 [[TMP29]], [[TMP28]]
-; CHECK-NEXT:    br i1 [[TMP30]], label %[[SCALAR_PH]], label %[[VECTOR_MEMCHECK:.*]]
+; CHECK-NEXT:    br i1 [[MIN_ITERS_CHECK]], label %[[SCALAR_PH:.*]], label %[[VECTOR_MEMCHECK:.*]]
 ; CHECK:       [[VECTOR_MEMCHECK]]:
 ; CHECK-NEXT:    [[TMP31:%.*]] = shl nsw i64 [[X_I64]], 1
 ; CHECK-NEXT:    [[SCEVGEP9:%.*]] = getelementptr i8, ptr [[A]], i64 [[TMP31]]
@@ -130,12 +92,12 @@ define void @skip_free_iv_truncate(i16 %x, ptr %A) #0 {
 ; CHECK:       [[MIDDLE_BLOCK]]:
 ; CHECK-NEXT:    br label %[[SCALAR_PH]]
 ; CHECK:       [[SCALAR_PH]]:
-; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ [[IND_END]], %[[MIDDLE_BLOCK]] ], [ [[X_I64]], %[[ENTRY]] ], [ [[X_I64]], %[[VECTOR_SCEVCHECK]] ], [ [[X_I64]], %[[VECTOR_MEMCHECK]] ]
-; CHECK-NEXT:    [[BC_RESUME_VAL23:%.*]] = phi i32 [ [[IND_END22]], %[[MIDDLE_BLOCK]] ], [ [[X_I32]], %[[ENTRY]] ], [ [[X_I32]], %[[VECTOR_SCEVCHECK]] ], [ [[X_I32]], %[[VECTOR_MEMCHECK]] ]
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ [[IND_END]], %[[MIDDLE_BLOCK]] ], [ [[X_I64]], %[[ENTRY]] ], [ [[X_I64]], %[[VECTOR_MEMCHECK]] ]
+; CHECK-NEXT:    [[BC_RESUME_VAL12:%.*]] = phi i32 [ [[IND_END22]], %[[MIDDLE_BLOCK]] ], [ [[X_I32]], %[[ENTRY]] ], [ [[X_I32]], %[[VECTOR_MEMCHECK]] ]
 ; CHECK-NEXT:    br label %[[LOOP:.*]]
 ; CHECK:       [[LOOP]]:
 ; CHECK-NEXT:    [[IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], %[[LOOP]] ]
-; CHECK-NEXT:    [[IV_CONV:%.*]] = phi i32 [ [[BC_RESUME_VAL23]], %[[SCALAR_PH]] ], [ [[TMP64:%.*]], %[[LOOP]] ]
+; CHECK-NEXT:    [[IV_CONV:%.*]] = phi i32 [ [[BC_RESUME_VAL12]], %[[SCALAR_PH]] ], [ [[TMP64:%.*]], %[[LOOP]] ]
 ; CHECK-NEXT:    [[GEP_I64:%.*]] = getelementptr i64, ptr [[A]], i64 [[IV]]
 ; CHECK-NEXT:    [[TMP61:%.*]] = load i64, ptr [[GEP_I64]], align 8
 ; CHECK-NEXT:    [[TMP62:%.*]] = sext i32 [[IV_CONV]] to i64
diff --git a/llvm/test/Transforms/LoopVersioning/wrapping-pointer-non-integral-addrspace.ll b/llvm/test/Transforms/LoopVersioning/wrapping-pointer-non-integral-addrspace.ll
index 430baa1cb4f8c1..e3cb3831d7938e 100644
--- a/llvm/test/Transforms/LoopVersioning/wrapping-pointer-non-integral-addrspace.ll
+++ b/llvm/test/Transforms/LoopVersioning/wrapping-pointer-non-integral-addrspace.ll
@@ -1,4 +1,5 @@
-; RUN: opt -passes=loop-versioning -S < %s | FileCheck %s -check-prefix=LV
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt -passes=loop-versioning -S < %s | FileCheck %s
 
 ; NB: addrspaces 10-13 are non-integral
 target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128-ni:10:11:12:13"
@@ -12,40 +13,92 @@ target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128-ni:10:11:12:13"
 
 declare i64 @julia_steprange_last_4949()
 
-define void @"japi1_align!_9477"(ptr %arg) {
-; LV-LAVEL: L26.lver.check
-; LV: [[OFMul:%[^ ]*]]  = call { i64, i1 } @llvm.umul.with.overflow.i64(i64 4, i64 [[Step:%[^ ]*]])
-; LV-NEXT: [[OFMulResult:%[^ ]*]] = extractvalue { i64, i1 } [[OFMul]], 0
-; LV-NEXT: [[OFMulOverflow:%[^ ]*]] = extractvalue { i64, i1 } [[OFMul]], 1
-; LV: [[OFNegMulResult:%[^ ]*]] = sub i64 0, [[OFMulResult]]
-; LV-NEXT: [[NegGEP:%[^ ]*]] = getelementptr i8, ptr addrspace(13) [[Base:%[^ ]*]], i64 [[OFNegMulResult]]
-; LV-NEXT: icmp ugt ptr addrspace(13) [[NegGEP]], [[Base]]
-; LV-NOT: inttoptr
-; LV-NOT: ptrtoint
+define void @wrapping_ptr_nonint_addrspace(ptr %arg) {
+; CHECK-LABEL: define void @wrapping_ptr_nonint_addrspace(
+; CHECK-SAME: ptr [[ARG:%.*]]) {
+; CHECK-NEXT:  [[LOOP_LVER_CHECK:.*:]]
+; CHECK-NEXT:    [[LOAD0:%.*]] = load ptr addrspace(10), ptr [[ARG]], align 8
+; CHECK-NEXT:    [[LOAD1:%.*]] = load i32, ptr inttoptr (i64 12 to ptr), align 4
+; CHECK-NEXT:    [[SUB:%.*]] = sub i32 0, [[LOAD1]]
+; CHECK-NEXT:    [[CALL:%.*]] = call i64 @julia_steprange_last_4949()
+; CHECK-NEXT:    [[CAST0:%.*]] = addrspacecast ptr addrspace(10) [[LOAD0]] to ptr addrspace(11)
+; CHECK-NEXT:    [[LOAD2:%.*]] = load ptr addrspace(10), ptr addrspace(11) [[CAST0]], align 8
+; CHECK-NEXT:    [[CAST1:%.*]] = addrspacecast ptr addrspace(10) [[LOAD2]] to ptr addrspace(11)
+; CHECK-NEXT:    [[LOAD3:%.*]] = load ptr addrspace(13), ptr addrspace(11) [[CAST1]], align 8
+; CHECK-NEXT:    [[SEXT:%.*]] = sext i32 [[SUB]] to i64
+; CHECK-NEXT:    [[TMP0:%.*]] = shl i64 [[CALL]], 2
+; CHECK-NEXT:    [[TMP1:%.*]] = shl nsw i64 [[SEXT]], 2
+; CHECK-NEXT:    [[TMP2:%.*]] = add i64 [[TMP0]], [[TMP1]]
+; CHECK-NEXT:    [[TMP3:%.*]] = add i64 [[TMP2]], -4
+; CHECK-NEXT:    [[SCEVGEP:%.*]] = getelementptr i8, ptr addrspace(13) [[LOAD3]], i64 [[TMP3]]
+; CHECK-NEXT:    [[SCEVGEP1:%.*]] = getelementptr i8, ptr addrspace(13) [[LOAD3]], i64 [[TMP1]]
+; CHECK-NEXT:    [[TMP4:%.*]] = add i64 [[TMP0]], -4
+; CHECK-NEXT:    [[SCEVGEP2:%.*]] = getelementptr i8, ptr addrspace(13) [[LOAD3]], i64 [[TMP4]]
+; CHECK-NEXT:    [[BOUND0:%.*]] = icmp ult ptr addrspace(13) [[SCEVGEP]], [[LOAD3]]
+; CHECK-NEXT:    [[BOUND1:%.*]] = icmp ult ptr addrspace(13) [[SCEVGEP2]], [[SCEVGEP1]]
+; CHECK-NEXT:    [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
+; CHECK-NEXT:    br i1 [[FOUND_CONFLICT]], label %[[LOOP_PH_LVER_ORIG:.*]], label %[[LOOP_PH:.*]]
+; CHECK:       [[LOOP_PH_LVER_ORIG]]:
+; CHECK-NEXT:    br label %[[LOOP_LVER_ORIG:.*]]
+; CHECK:       [[LOOP_LVER_ORIG]]:
+; CHECK-NEXT:    [[VALUE_PHI3_LVER_ORIG:%.*]] = phi i64 [ 0, %[[LOOP_PH_LVER_ORIG]] ], [ [[ADD0_LVER_ORIG:%.*]], %[[LOOP_LVER_ORIG]] ]
+; CHECK-NEXT:    [[ADD0_LVER_ORIG]] = add i64 [[VALUE_PHI3_LVER_ORIG]], -1
+; CHECK-NEXT:    [[GEP0_LVER_ORIG:%.*]] = getelementptr inbounds i32, ptr addrspace(13) [[LOAD3]], i64 [[ADD0_LVER_ORIG]]
+; CHECK-NEXT:    [[LOAD4_LVER_ORIG:%.*]] = load i32, ptr addrspace(13) [[GEP0_LVER_ORIG]], align 4
+; CHECK-NEXT:    [[ADD1_LVER_ORIG:%.*]] = add i64 [[ADD0_LVER_ORIG]], [[SEXT]]
+; CHECK-NEXT:    [[GEP1_LVER_ORIG:%.*]] = getelementptr inbounds i32, ptr addrspace(13) [[LOAD3]], i64 [[ADD1_LVER_ORIG]]
+; CHECK-NEXT:    store i32 [[LOAD4_LVER_ORIG]], ptr addrspace(13) [[GEP1_LVER_ORIG]], align 4
+; CHECK-NEXT:    [[CMP_LVER_ORIG:%.*]] = icmp eq i64 [[VALUE_PHI3_LVER_ORIG]], [[CALL]]
+; CHECK-NEXT:    br i1 [[CMP_LVER_ORIG]], label %[[EXIT_LOOPEXIT:.*]], label %[[LOOP_LVER_ORIG]]
+; CHECK:       [[LOOP_PH]]:
+; CHECK-NEXT:    br label %[[LOOP:.*]]
+; CHECK:       [[LOOP]]:
+; CHECK-NEXT:    [[VALUE_PHI3:%.*]] = phi i64 [ 0, %[[LOOP_PH]] ], [ [[ADD0:%.*]], %[[LOOP]] ]
+; CHECK-NEXT:    [[ADD0]] = add i64 [[VALUE_PHI3]], -1
+; CHECK-NEXT:    [[GEP0:%.*]] = getelementptr inbounds i32, ptr addrspace(13) [[LOAD3]], i64 [[ADD0]]
+; CHECK-NEXT:    [[LOAD4:%.*]] = load i32, ptr addrspace(13) [[GEP0]], align 4, !alias.scope [[META0:![0-9]+]]
+; CHECK-NEXT:    [[ADD1:%.*]] = add i64 [[ADD0]], [[SEXT]]
+; CHECK-NEXT:    [[GEP1:%.*]] = getelementptr inbounds i32, ptr addrspace(13) [[LOAD3]], i64 [[ADD1]]
+; CHECK-NEXT:    store i32 [[LOAD4]], ptr addrspace(13) [[GEP1]], align 4, !alias.scope [[META3:![0-9]+]], !noalias [[META0]]
+; CHECK-NEXT:    [[CMP:%.*]] = icmp eq i64 [[VALUE_PHI3]], [[CALL]]
+; CHECK-NEXT:    br i1 [[CMP]], label %[[EXIT_LOOPEXIT3:.*]], label %[[LOOP]]
+; CHECK:       [[EXIT_LOOPEXIT]]:
+; CHECK-NEXT:    br label %[[EXIT:.*]]
+; CHECK:       [[EXIT_LOOPEXIT3]]:
+; CHECK-NEXT:    br label %[[EXIT]]
+; CHECK:       [[EXIT]]:
+; CHECK-NEXT:    ret void
+;
 top:
-  %tmp = load ptr addrspace(10), ptr %arg, align 8
-  %tmp1 = load i32, ptr inttoptr (i64 12 to ptr), align 4
-  %tmp2 = sub i32 0, %tmp1
-  %tmp3 = call i64 @julia_steprange_last_4949()
-  %tmp4 = addrspacecast ptr addrspace(10) %tmp to ptr addrspace(11)
-  %tmp6 = load ptr addrspace(10), ptr addrspace(11) %tmp4, align 8
-  %tmp7 = addrspacecast ptr addrspace(10) %tmp6 to ptr addrspace(11)
-  %tmp9 = load ptr addrspace(13), ptr addrspace(11) %tmp7, align 8
-  %tmp10 = sext i32 %tmp2 to i64
-  br label %L26
+  %load0 = load ptr addrspace(10), ptr %arg, align 8
+  %load1 = load i32, ptr inttoptr (i64 12 to ptr), align 4
+  %sub = sub i32 0, %load1
+  %call = call i64 @julia_steprange_last_4949()
+  %cast0 = addrspacecast ptr addrspace(10) %load0 to ptr addrspace(11)
+  %load2 = load ptr addrspace(10), ptr addrspace(11) %cast0, align 8
+  %cast1 = addrspacecast ptr addrspace(10) %load2 to ptr addrspace(11)
+  %load3 = load ptr addrspace(13), ptr addrspace(11) %cast1, align 8
+  %sext = sext i32 %sub to i64
+  br label %loop
 
-L26:
-  %value_phi3 = phi i64 [ 0, %top ], [ %tmp11, %L26 ]
-  %tmp11 = add i64 %value_phi3, -1
-  %tmp12 = getelementptr inbounds i32, ptr addrspace(13) %tmp9, i64 %tmp11
-  %tmp13 = load i32, ptr addrspace(13) %tmp12, align 4
-  %tmp14 = add i64 %tmp11, %tmp10
-  %tmp15 = getelementptr inbounds i32, ptr addrspace(13) %tmp9, i64 %tmp14
-  store i32 %tmp13, ptr addrspace(13) %tmp15, align 4
-  %tmp16 = icmp eq i64 %value_phi3, %tmp3
-  br i1 %tmp16, label %L45, label %L26
+loop:
+  %value_phi3 = phi i64 [ 0, %top ], [ %add0, %loop ]
+  %add0 = add i64 %value_phi3, -1
+  %gep0 = getelementptr inbounds i32, ptr addrspace(13) %load3, i64 %add0
+  %load4 = load i32, ptr addrspace(13) %gep0, align 4
+  %add1 = add i64 %add0, %sext
+  %gep1 = getelementptr inbounds i32, ptr addrspace(13) %load3, i64 %add1
+  store i32 %load4, ptr addrspace(13) %gep1, align 4
+  %cmp = icmp eq i64 %value_phi3, %call
+  br i1 %cmp, label %exit, label %loop
 
-L45:
+exit:
   ret void
 }
-
+;.
+; CHECK: [[META0]] = !{[[META1:![0-9]+]]}
+; CHECK: [[META1]] = distinct !{[[META1]], [[META2:![0-9]+]]}
+; CHECK: [[META2]] = distinct !{[[META2]], !"LVerDomain"}
+; CHECK: [[META3]] = !{[[META4:![0-9]+]]}
+; CHECK: [[META4]] = distinct !{[[META4]], [[META2]]}
+;.

github-actions · 2024-10-16T14:39:31Z

✅ With the latest revision this PR passed the C/C++ code formatter.

isNoWrap has exactly one caller which handles Assume = true separately, but too conservatively. Instead, pass Assume to isNoWrap, so it is threaded into getPtrStride, which has the correct handling for the Assume flag. Also note that the Stride == 1 check in isNoWrap is incorrect: getPtrStride returns Strides == 1 or -1, except when isNoWrapAddRec or Assume are true; we can include the case of -1 Stride, and when isNoWrapAddRec is true. With this change, passing Assume = true to getPtrStride could return a non-unit stride, and we correctly handle that case as well.

llvm/lib/Analysis/LoopAccessAnalysis.cpp

fhahn

LGTM, thanks!

isNoWrap has exactly one caller which handles Assume = true separately, but too conservatively. Instead, pass Assume to isNoWrap, so it is threaded into getPtrStride, which has the correct handling for the Assume flag. Also note that the Stride == 1 check in isNoWrap is incorrect: getPtrStride returns Strides == 1 or -1, except when isNoWrapAddRec or Assume are true, assuming ShouldCheckWrap is true; we can include the case of -1 Stride, and when isNoWrapAddRec is true. With this change, passing Assume = true to getPtrStride could return a non-unit stride, and we correctly handle that case as well.

artagnon requested review from nikic, fhahn, preames and huntergr-arm October 16, 2024 14:35

llvmbot added llvm:analysis llvm:transforms labels Oct 16, 2024

artagnon force-pushed the laa-nowrap-refine branch from fb91f31 to 26d1446 Compare October 16, 2024 14:40

artagnon force-pushed the laa-nowrap-refine branch from 26d1446 to 6c6c13a Compare October 17, 2024 08:12

fhahn reviewed Oct 21, 2024

View reviewed changes

llvm/lib/Analysis/LoopAccessAnalysis.cpp Show resolved Hide resolved

fhahn approved these changes Oct 22, 2024

View reviewed changes

artagnon merged commit f719cfa into llvm:main Oct 22, 2024
8 checks passed

artagnon deleted the laa-nowrap-refine branch October 22, 2024 08:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LAA: be less conservative in isNoWrap #112553

LAA: be less conservative in isNoWrap #112553

artagnon commented Oct 16, 2024 •

edited

Loading

llvmbot commented Oct 16, 2024

llvmbot commented Oct 16, 2024

github-actions bot commented Oct 16, 2024 •

edited

Loading

fhahn left a comment

LAA: be less conservative in isNoWrap #112553

LAA: be less conservative in isNoWrap #112553

Conversation

artagnon commented Oct 16, 2024 • edited Loading

llvmbot commented Oct 16, 2024

llvmbot commented Oct 16, 2024

github-actions bot commented Oct 16, 2024 • edited Loading

fhahn left a comment

Choose a reason for hiding this comment

artagnon commented Oct 16, 2024 •

edited

Loading

github-actions bot commented Oct 16, 2024 •

edited

Loading