-
Notifications
You must be signed in to change notification settings - Fork 12.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ValueTracking] Add dominating condition support in computeKnownBits() #73662
Conversation
@llvm/pr-subscribers-llvm-transforms @llvm/pr-subscribers-clang Author: Nikita Popov (nikic) ChangesThis adds support for using dominating conditions in computeKnownBits() when called from InstCombine. The implementation uses a DomConditionCache, which stores which branches may provide information that is relevant for a given value. DomConditionCache is similar to AssumptionCache, but does not try to do any kind of automatic tracking. Relevant branches have to be explicitly registered and invalidated values explicitly removed. The necessary tracking is done inside InstCombine. The reason why this doesn't just do exactly the same thing as AssumptionCache is that a lot more transforms touch branches and branch conditions than assumptions. AssumptionCache is an immutable analysis and mostly gets away with this because only a handful of places have to register additional assumptions (mostly as a result of cloning). This is very much not the case for branches. This change has some non-trivial impact on compile-time. The first-order impact is about 0.2%. This is the impact if we compute the KnownBits, but don't use them. The second-order impact is about 0.1% -- however, on stage2-O0-g it's actually a -0.2% improvement, which indicates that this change results in additional optimizations inside clang itself. Patch is 89.02 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/73662.diff 41 Files Affected:
diff --git a/clang/test/CodeGenCXX/RelativeVTablesABI/member-function-pointer.cpp b/clang/test/CodeGenCXX/RelativeVTablesABI/member-function-pointer.cpp
index 24f884a8d23befb..000568b3b6bf09c 100644
--- a/clang/test/CodeGenCXX/RelativeVTablesABI/member-function-pointer.cpp
+++ b/clang/test/CodeGenCXX/RelativeVTablesABI/member-function-pointer.cpp
@@ -14,7 +14,7 @@
// The loading of the virtual function here should be replaced with a llvm.load.relative() call.
// CHECK-NEXT: [[vtable:%.+]] = load ptr, ptr [[this_adj]], align 8
-// CHECK-NEXT: [[offset:%.+]] = add i64 [[fn_ptr]], -1
+// CHECK-NEXT: [[offset:%.+]] = add nsw i64 [[fn_ptr]], -1
// CHECK-NEXT: [[ptr:%.+]] = tail call ptr @llvm.load.relative.i64(ptr [[vtable]], i64 [[offset]])
// CHECK-NEXT: br label %[[memptr_end:.+]]
// CHECK: [[nonvirt]]:
diff --git a/llvm/include/llvm/Analysis/DomConditionCache.h b/llvm/include/llvm/Analysis/DomConditionCache.h
new file mode 100644
index 000000000000000..68e34f773af8ae8
--- /dev/null
+++ b/llvm/include/llvm/Analysis/DomConditionCache.h
@@ -0,0 +1,56 @@
+//===- llvm/Analysis/DomConditionCache.h ------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// Cache for branch conditions that affect a certain value for use by
+// ValueTracking. Unlike AssumptionCache, this class does not perform any
+// automatic analysis or invalidation. The caller is responsible for registering
+// all relevant branches (and re-registering them if they change), and for
+// removing invalidated values from the cache.
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_ANALYSIS_DOMCONDITIONCACHE_H
+#define LLVM_ANALYSIS_DOMCONDITIONCACHE_H
+
+#include "llvm/ADT/ArrayRef.h"
+#include "llvm/ADT/DenseMap.h"
+#include "llvm/ADT/DenseMapInfo.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/IR/ValueHandle.h"
+
+namespace llvm {
+
+class Value;
+class BranchInst;
+
+class DomConditionCache {
+private:
+ /// A map of values about which a branch might be providing information.
+ using AffectedValuesMap = DenseMap<Value *, SmallVector<BranchInst *, 1>>;
+ AffectedValuesMap AffectedValues;
+
+public:
+ /// Add a branch condition to the cache.
+ void registerBranch(BranchInst *BI);
+
+ /// Remove a value from the cache, e.g. because it will be erased.
+ void removeValue(Value *V) { AffectedValues.erase(V); }
+
+ /// Access the list of branches which affect this value.
+ ArrayRef<BranchInst *> conditionsFor(const Value *V) const {
+ auto AVI = AffectedValues.find_as(const_cast<Value *>(V));
+ if (AVI == AffectedValues.end())
+ return ArrayRef<BranchInst *>();
+
+ return AVI->second;
+ }
+};
+
+} // end namespace llvm
+
+#endif // LLVM_ANALYSIS_DOMCONDITIONCACHE_H
diff --git a/llvm/include/llvm/Analysis/SimplifyQuery.h b/llvm/include/llvm/Analysis/SimplifyQuery.h
index f9cc3029221d679..e5e6ae0d3d8e3e8 100644
--- a/llvm/include/llvm/Analysis/SimplifyQuery.h
+++ b/llvm/include/llvm/Analysis/SimplifyQuery.h
@@ -14,6 +14,7 @@
namespace llvm {
class AssumptionCache;
+class DomConditionCache;
class DominatorTree;
class TargetLibraryInfo;
@@ -62,6 +63,7 @@ struct SimplifyQuery {
const DominatorTree *DT = nullptr;
AssumptionCache *AC = nullptr;
const Instruction *CxtI = nullptr;
+ const DomConditionCache *DC = nullptr;
// Wrapper to query additional information for instructions like metadata or
// keywords like nsw, which provides conservative results if those cannot
@@ -80,8 +82,8 @@ struct SimplifyQuery {
const DominatorTree *DT = nullptr,
AssumptionCache *AC = nullptr,
const Instruction *CXTI = nullptr, bool UseInstrInfo = true,
- bool CanUseUndef = true)
- : DL(DL), TLI(TLI), DT(DT), AC(AC), CxtI(CXTI), IIQ(UseInstrInfo),
+ bool CanUseUndef = true, const DomConditionCache *DC = nullptr)
+ : DL(DL), TLI(TLI), DT(DT), AC(AC), CxtI(CXTI), DC(DC), IIQ(UseInstrInfo),
CanUseUndef(CanUseUndef) {}
SimplifyQuery(const DataLayout &DL, const DominatorTree *DT,
diff --git a/llvm/include/llvm/Transforms/InstCombine/InstCombiner.h b/llvm/include/llvm/Transforms/InstCombine/InstCombiner.h
index 160fc2ebe493521..7ba4b49932c1a89 100644
--- a/llvm/include/llvm/Transforms/InstCombine/InstCombiner.h
+++ b/llvm/include/llvm/Transforms/InstCombine/InstCombiner.h
@@ -18,6 +18,7 @@
#ifndef LLVM_TRANSFORMS_INSTCOMBINE_INSTCOMBINER_H
#define LLVM_TRANSFORMS_INSTCOMBINE_INSTCOMBINER_H
+#include "llvm/Analysis/DomConditionCache.h"
#include "llvm/Analysis/InstructionSimplify.h"
#include "llvm/Analysis/TargetFolder.h"
#include "llvm/Analysis/ValueTracking.h"
@@ -76,6 +77,7 @@ class LLVM_LIBRARY_VISIBILITY InstCombiner {
OptimizationRemarkEmitter &ORE;
BlockFrequencyInfo *BFI;
ProfileSummaryInfo *PSI;
+ DomConditionCache DC;
// Optional analyses. When non-null, these can both be used to do better
// combining and will be updated to reflect any changes.
@@ -98,7 +100,9 @@ class LLVM_LIBRARY_VISIBILITY InstCombiner {
const DataLayout &DL, LoopInfo *LI)
: TTI(TTI), Builder(Builder), Worklist(Worklist),
MinimizeSize(MinimizeSize), AA(AA), AC(AC), TLI(TLI), DT(DT), DL(DL),
- SQ(DL, &TLI, &DT, &AC), ORE(ORE), BFI(BFI), PSI(PSI), LI(LI) {}
+ SQ(DL, &TLI, &DT, &AC, nullptr, /*UseInstrInfo*/ true,
+ /*CanUseUndef*/ true, &DC),
+ ORE(ORE), BFI(BFI), PSI(PSI), LI(LI) {}
virtual ~InstCombiner() = default;
diff --git a/llvm/lib/Analysis/CMakeLists.txt b/llvm/lib/Analysis/CMakeLists.txt
index 9d8c9cfda66c921..34ff6bb74c106f4 100644
--- a/llvm/lib/Analysis/CMakeLists.txt
+++ b/llvm/lib/Analysis/CMakeLists.txt
@@ -55,6 +55,7 @@ add_llvm_component_library(LLVMAnalysis
DependenceAnalysis.cpp
DependenceGraphBuilder.cpp
DevelopmentModeInlineAdvisor.cpp
+ DomConditionCache.cpp
DomPrinter.cpp
DomTreeUpdater.cpp
DominanceFrontier.cpp
diff --git a/llvm/lib/Analysis/DomConditionCache.cpp b/llvm/lib/Analysis/DomConditionCache.cpp
new file mode 100644
index 000000000000000..c47dcf6767a8ede
--- /dev/null
+++ b/llvm/lib/Analysis/DomConditionCache.cpp
@@ -0,0 +1,74 @@
+//===- DomConditionCache.cpp ----------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "llvm/Analysis/DomConditionCache.h"
+#include "llvm/IR/PatternMatch.h"
+
+using namespace llvm;
+using namespace llvm::PatternMatch;
+
+// TODO: This code is very similar to findAffectedValues() in
+// AssumptionCache, but currently specialized to just the patterns that
+// computeKnownBits() supports, and without the notion of result elem indices
+// that are AC specific. Deduplicate this code once we have a clearer picture
+// of how much they can be shared.
+static void findAffectedValues(Value *Cond,
+ SmallVectorImpl<Value *> &Affected) {
+ auto AddAffected = [&Affected](Value *V) {
+ if (isa<Argument>(V) || isa<GlobalValue>(V)) {
+ Affected.push_back(V);
+ } else if (auto *I = dyn_cast<Instruction>(V)) {
+ Affected.push_back(I);
+
+ // Peek through unary operators to find the source of the condition.
+ Value *Op;
+ if (match(I, m_PtrToInt(m_Value(Op)))) {
+ if (isa<Instruction>(Op) || isa<Argument>(Op))
+ Affected.push_back(Op);
+ }
+ }
+ };
+
+ ICmpInst::Predicate Pred;
+ Value *A;
+ Constant *C;
+ if (match(Cond, m_ICmp(Pred, m_Value(A), m_Constant(C)))) {
+ AddAffected(A);
+
+ if (Pred == ICmpInst::ICMP_EQ) {
+ Value *X;
+ // (X & C) or (X | C) or (X ^ C).
+ // (X << C) or (X >>_s C) or (X >>_u C).
+ if (match(A, m_BitwiseLogic(m_Value(X), m_ConstantInt())) ||
+ match(A, m_Shift(m_Value(X), m_ConstantInt())))
+ AddAffected(X);
+ } else if (Pred == ICmpInst::ICMP_NE) {
+ Value *X;
+ // Handle (X & pow2 != 0).
+ if (match(A, m_And(m_Value(X), m_Power2())) && match(C, m_Zero()))
+ AddAffected(X);
+ } else if (Pred == ICmpInst::ICMP_ULT) {
+ Value *X;
+ // Handle (A + C1) u< C2, which is the canonical form of A > C3 && A < C4,
+ // and recognized by LVI at least.
+ if (match(A, m_Add(m_Value(X), m_ConstantInt())))
+ AddAffected(X);
+ }
+ }
+}
+
+void DomConditionCache::registerBranch(BranchInst *BI) {
+ assert(BI->isConditional() && "Must be conditional branch");
+ SmallVector<Value *, 16> Affected;
+ findAffectedValues(BI->getCondition(), Affected);
+ for (Value *V : Affected) {
+ auto &AV = AffectedValues[V];
+ if (!is_contained(AV, BI))
+ AV.push_back(BI);
+ }
+}
diff --git a/llvm/lib/Analysis/ValueTracking.cpp b/llvm/lib/Analysis/ValueTracking.cpp
index ccb5ae3ba0a11dd..57b29a92ff61a43 100644
--- a/llvm/lib/Analysis/ValueTracking.cpp
+++ b/llvm/lib/Analysis/ValueTracking.cpp
@@ -26,6 +26,7 @@
#include "llvm/Analysis/AssumeBundleQueries.h"
#include "llvm/Analysis/AssumptionCache.h"
#include "llvm/Analysis/ConstantFolding.h"
+#include "llvm/Analysis/DomConditionCache.h"
#include "llvm/Analysis/GuardUtils.h"
#include "llvm/Analysis/InstructionSimplify.h"
#include "llvm/Analysis/Loads.h"
@@ -713,9 +714,33 @@ static void computeKnownBitsFromCmp(const Value *V, CmpInst::Predicate Pred,
void llvm::computeKnownBitsFromContext(const Value *V, KnownBits &Known,
unsigned Depth, const SimplifyQuery &Q) {
- // Use of assumptions is context-sensitive. If we don't have a context, we
- // cannot use them!
- if (!Q.AC || !Q.CxtI)
+ if (!Q.CxtI)
+ return;
+
+ if (Q.DC && Q.DT) {
+ // Handle dominating conditions.
+ for (BranchInst *BI : Q.DC->conditionsFor(V)) {
+ auto *Cmp = dyn_cast<ICmpInst>(BI->getCondition());
+ if (!Cmp)
+ continue;
+
+ BasicBlockEdge Edge0(BI->getParent(), BI->getSuccessor(0));
+ if (Q.DT->dominates(Edge0, Q.CxtI->getParent()))
+ computeKnownBitsFromCmp(V, Cmp->getPredicate(), Cmp->getOperand(0),
+ Cmp->getOperand(1), Known, Depth, Q);
+
+ BasicBlockEdge Edge1(BI->getParent(), BI->getSuccessor(1));
+ if (Q.DT->dominates(Edge1, Q.CxtI->getParent()))
+ computeKnownBitsFromCmp(V, Cmp->getInversePredicate(),
+ Cmp->getOperand(0), Cmp->getOperand(1), Known,
+ Depth, Q);
+ }
+
+ if (Known.hasConflict())
+ Known.resetAll();
+ }
+
+ if (!Q.AC)
return;
unsigned BitWidth = Known.getBitWidth();
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineInternal.h b/llvm/lib/Transforms/InstCombine/InstCombineInternal.h
index 0bbb22be71569f6..aecb33db1031918 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineInternal.h
+++ b/llvm/lib/Transforms/InstCombine/InstCombineInternal.h
@@ -459,6 +459,7 @@ class LLVM_LIBRARY_VISIBILITY InstCombinerImpl final
// use counts.
SmallVector<Value *> Ops(I.operands());
Worklist.remove(&I);
+ DC.removeValue(&I);
I.eraseFromParent();
for (Value *Op : Ops)
Worklist.handleUseCountDecrement(Op);
diff --git a/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp b/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
index e6088541349784b..aadebcb598c383c 100644
--- a/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
@@ -3070,6 +3070,7 @@ Instruction *InstCombinerImpl::visitBranchInst(BranchInst &BI) {
return nullptr;
}
+ DC.registerBranch(&BI);
return nullptr;
}
diff --git a/llvm/test/CodeGen/BPF/loop-exit-cond.ll b/llvm/test/CodeGen/BPF/loop-exit-cond.ll
index 7666d961753ac40..ff34432009a1117 100644
--- a/llvm/test/CodeGen/BPF/loop-exit-cond.ll
+++ b/llvm/test/CodeGen/BPF/loop-exit-cond.ll
@@ -43,7 +43,7 @@ define dso_local i32 @test(i32 %len, ptr %data) #0 {
; CHECK-NEXT: store i64 [[CONV2]], ptr [[D]], align 8, !tbaa [[TBAA6:![0-9]+]]
; CHECK-NEXT: call void @foo(ptr nonnull @.str, i32 [[I_05]], ptr nonnull [[D]]) #[[ATTR3]]
; CHECK-NEXT: call void @llvm.lifetime.end.p0(i64 8, ptr nonnull [[D]]) #[[ATTR3]]
-; CHECK-NEXT: [[INC]] = add nuw nsw i32 [[I_05]], 1
+; CHECK-NEXT: [[INC]] = add nuw i32 [[I_05]], 1
; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i32 [[INC]], [[LEN]]
; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[IF_END]], label [[FOR_BODY]], !llvm.loop [[LOOP8:![0-9]+]]
; CHECK: if.end:
diff --git a/llvm/test/Transforms/InstCombine/2007-10-31-RangeCrash.ll b/llvm/test/Transforms/InstCombine/2007-10-31-RangeCrash.ll
index f2548f6f8c8c8c3..8b472aa5af09024 100644
--- a/llvm/test/Transforms/InstCombine/2007-10-31-RangeCrash.ll
+++ b/llvm/test/Transforms/InstCombine/2007-10-31-RangeCrash.ll
@@ -1,5 +1,8 @@
; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 4
-; RUN: opt < %s -S -passes=instcombine | FileCheck %s
+; RUN: opt < %s -S -passes='instcombine<no-verify-fixpoint>' | FileCheck %s
+
+; We do not reach a fixpoint, because we first have to infer nsw on the IV add,
+; and could eliminate the icmp slt afterwards, but don't revisit it.
target datalayout = "E-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f128:64:128"
@@ -10,7 +13,7 @@ define i32 @test() {
; CHECK: bb.i:
; CHECK-NEXT: br label [[BB51_I_I:%.*]]
; CHECK: bb27.i.i:
-; CHECK-NEXT: [[TMP50_I_I:%.*]] = add i32 [[X_0_I_I:%.*]], 2
+; CHECK-NEXT: [[TMP50_I_I:%.*]] = add nsw i32 [[X_0_I_I:%.*]], 2
; CHECK-NEXT: br label [[BB51_I_I]]
; CHECK: bb51.i.i:
; CHECK-NEXT: [[X_0_I_I]] = phi i32 [ [[TMP50_I_I]], [[BB27_I_I:%.*]] ], [ 0, [[BB_I]] ]
diff --git a/llvm/test/Transforms/InstCombine/2009-02-20-InstCombine-SROA.ll b/llvm/test/Transforms/InstCombine/2009-02-20-InstCombine-SROA.ll
index 13eca92f5535906..9f69d67f61e115f 100644
--- a/llvm/test/Transforms/InstCombine/2009-02-20-InstCombine-SROA.ll
+++ b/llvm/test/Transforms/InstCombine/2009-02-20-InstCombine-SROA.ll
@@ -80,7 +80,7 @@ define ptr @_Z3fooRSt6vectorIiSaIiEE(ptr %X) {
; IC-NEXT: [[TMP31:%.*]] = load ptr, ptr [[__FIRST_ADDR_I_I]], align 4
; IC-NEXT: [[TMP32:%.*]] = getelementptr i32, ptr [[TMP31]], i32 1
; IC-NEXT: store ptr [[TMP32]], ptr [[__FIRST_ADDR_I_I]], align 4
-; IC-NEXT: [[TMP33:%.*]] = add i32 [[__TRIP_COUNT_0_I_I:%.*]], -1
+; IC-NEXT: [[TMP33:%.*]] = add nsw i32 [[__TRIP_COUNT_0_I_I:%.*]], -1
; IC-NEXT: br label [[BB12_I_I]]
; IC: bb12.i.i:
; IC-NEXT: [[__TRIP_COUNT_0_I_I]] = phi i32 [ [[TMP7]], [[ENTRY:%.*]] ], [ [[TMP33]], [[BB11_I_I]] ]
@@ -188,7 +188,7 @@ define ptr @_Z3fooRSt6vectorIiSaIiEE(ptr %X) {
; IC_SROA-NEXT: br label [[_ZST4FINDIN9__GNU_CXX17__NORMAL_ITERATORIPIST6VECTORIISAIIEEEEIET_S7_S7_RKT0__EXIT]]
; IC_SROA: bb11.i.i:
; IC_SROA-NEXT: [[TMP18:%.*]] = getelementptr i32, ptr [[TMP15]], i32 1
-; IC_SROA-NEXT: [[TMP19:%.*]] = add i32 [[__TRIP_COUNT_0_I_I:%.*]], -1
+; IC_SROA-NEXT: [[TMP19:%.*]] = add nsw i32 [[__TRIP_COUNT_0_I_I:%.*]], -1
; IC_SROA-NEXT: br label [[BB12_I_I]]
; IC_SROA: bb12.i.i:
; IC_SROA-NEXT: [[__FIRST_ADDR_I_I_SROA_0_0]] = phi ptr [ [[TMP2]], [[ENTRY:%.*]] ], [ [[TMP18]], [[BB11_I_I]] ]
diff --git a/llvm/test/Transforms/InstCombine/cast_phi.ll b/llvm/test/Transforms/InstCombine/cast_phi.ll
index feeee16e27f23c8..5b4425b4d83082c 100644
--- a/llvm/test/Transforms/InstCombine/cast_phi.ll
+++ b/llvm/test/Transforms/InstCombine/cast_phi.ll
@@ -319,7 +319,7 @@ define i8 @trunc_in_loop_exit_block() {
; CHECK-NEXT: [[CMP:%.*]] = icmp ult i32 [[IV]], 100
; CHECK-NEXT: br i1 [[CMP]], label [[LOOP_LATCH]], label [[EXIT:%.*]]
; CHECK: loop.latch:
-; CHECK-NEXT: [[IV_NEXT]] = add i32 [[IV]], 1
+; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i32 [[IV]], 1
; CHECK-NEXT: br label [[LOOP]]
; CHECK: exit:
; CHECK-NEXT: [[TRUNC:%.*]] = trunc i32 [[PHI]] to i8
diff --git a/llvm/test/Transforms/InstCombine/icmp-binop.ll b/llvm/test/Transforms/InstCombine/icmp-binop.ll
index 60a12411ee910d2..878f39bb7c9a564 100644
--- a/llvm/test/Transforms/InstCombine/icmp-binop.ll
+++ b/llvm/test/Transforms/InstCombine/icmp-binop.ll
@@ -132,8 +132,7 @@ define i1 @mul_broddV_unkV_eq(i16 %v, i16 %v2) {
; CHECK-NEXT: [[ODD_NOT:%.*]] = icmp eq i16 [[LB]], 0
; CHECK-NEXT: br i1 [[ODD_NOT]], label [[FALSE:%.*]], label [[TRUE:%.*]]
; CHECK: true:
-; CHECK-NEXT: [[MUL:%.*]] = mul i16 [[V:%.*]], [[V2]]
-; CHECK-NEXT: [[CMP:%.*]] = icmp eq i16 [[MUL]], 0
+; CHECK-NEXT: [[CMP:%.*]] = icmp eq i16 [[V:%.*]], 0
; CHECK-NEXT: ret i1 [[CMP]]
; CHECK: false:
; CHECK-NEXT: call void @use64(i16 [[V]])
diff --git a/llvm/test/Transforms/InstCombine/icmp-dom.ll b/llvm/test/Transforms/InstCombine/icmp-dom.ll
index f4b9022d14349b2..c30b2d724afbd59 100644
--- a/llvm/test/Transforms/InstCombine/icmp-dom.ll
+++ b/llvm/test/Transforms/InstCombine/icmp-dom.ll
@@ -75,7 +75,8 @@ define void @idom_sign_bit_check_edge_dominates_select(i64 %a, i64 %b) {
; CHECK: land.lhs.true:
; CHECK-NEXT: br label [[LOR_END:%.*]]
; CHECK: lor.rhs:
-; CHECK-NEXT: [[CMP3_NOT:%.*]] = icmp eq i64 [[A]], [[B:%.*]]
+; CHECK-NEXT: [[SELECT:%.*]] = call i64 @llvm.umax.i64(i64 [[A]], i64 5)
+; CHECK-NEXT: [[CMP3_NOT:%.*]] = icmp eq i64 [[SELECT]], [[B:%.*]]
; CHECK-NEXT: br i1 [[CMP3_NOT]], label [[LOR_END]], label [[LAND_RHS:%.*]]
; CHECK: land.rhs:
; CHECK-NEXT: br label [[LOR_END]]
@@ -385,7 +386,7 @@ define i8 @PR48900_alt(i8 %i, ptr %p) {
; CHECK-NEXT: [[I4:%.*]] = icmp ugt i8 [[SMAX]], -128
; CHECK-NEXT: br i1 [[I4]], label [[TRUELABEL:%.*]], label [[FALSELABEL:%.*]]
; CHECK: truelabel:
-; CHECK-NEXT: [[UMIN:%.*]] = call i8 @llvm.smin.i8(i8 [[SMAX]], i8 -126)
+; CHECK-NEXT: [[UMIN:%.*]] = call i8 @llvm.umin.i8(i8 [[SMAX]], i8 -126)
; CHECK-NEXT: ret i8 [[UMIN]]
; CHECK: falselabel:
; CHECK-NEXT: ret i8 0
diff --git a/llvm/test/Transforms/InstCombine/icmp-mul-zext.ll b/llvm/test/Transforms/InstCombine/icmp-mul-zext.ll
index 095ac5b27f59635..adf78723b1302aa 100644
--- a/llvm/test/Transforms/InstCombine/icmp-mul-zext.ll
+++ b/llvm/test/Transforms/InstCombine/icmp-mul-zext.ll
@@ -17,8 +17,7 @@ define i32 @sterix(i32, i8, i64) {
; CHECK-NEXT: br i1 [[TOBOOL_NOT]], label [[LOR_RHS:%.*]], label [[LOR_END:%.*]]
; CHECK: lor.rhs:
; CHECK-NEXT: [[AND:%.*]] = and i64 [[MUL3]], [[TMP2]]
-; CHECK-NEXT: [[CONV4:%.*]] = trunc i64 [[AND]] to i32
-; CHECK-NEXT: [[TOBOOL7_NOT:%.*]] = icmp eq i32 [[CONV4]], 0
+; CHECK-NEXT: [[TOBOOL7_NOT:%.*]] = icmp eq i64 [[AND]], 0
; CHECK-NEXT: [[TMP3:%.*]] = zext i1 [[TOBOOL7_NOT]] to i32
; CHECK-NEXT: br label [[LOR_END]]
; CHECK: lor.end:
diff --git a/llvm/test/Transforms/InstCombine/icmp-ne-pow2.ll b/llvm/test/Transforms/InstCombine/icmp-ne-pow2.ll
index 224ea3cd76cc6d1..70a2b33d17dd7ca 100644
--- a/llvm/test/Transforms/InstCombine/icmp-ne-pow2.ll
+++ b/llvm/test/Transforms/InstCombine/icmp-ne-pow2.ll
@@ -125,8 +125,7 @@ define i32 @pow2_32_br(i32 %x) {
; CHECK-NEXT: [[CMP_NOT:%.*]] = icmp eq i32 [[AND]], 0
; CHECK-NEXT: br i1 [[CMP_NOT]], label [[FALSE:%.*]], label [[TRUE:%.*]]
; CHECK: True:
-; CHECK-NEXT: [[OR:%.*]] = or i32 [[X]], 4
-; CHECK-NEXT: ret i32 [[OR]]
+; CHECK-NEXT: ret i32 [[X]]
; CHECK: False:
; CHECK-NEXT: ret i32 0
;
@@ -167,8 +166,7 @@ define i64 @pow2_64_br(i64 %x) {
; CHECK-NEXT: [[CMP_NOT:%.*]] = icmp eq i64 [[AND]], 0
; CHECK-NEXT: br i1 [[CMP_NOT]], label [[FALSE:%.*]], label [[TRUE:%.*]]
; CHECK: True:
-; CHECK-NEXT: [[AND2:%.*]] = and i64 [[X]], 1
-; CHECK-NEXT: ret i64 [[AND2]]
+; CHECK-NEXT: ret i64 1
; CHECK: False:
; CHECK-NEXT: ret i64 0
;
@@ -209,8 +207,7 @@ define i16 @pow2_16_br(i16 %x) {
; CHECK-NEXT: [[CMP_NO...
[truncated]
|
Just realized that this doesn't cover uses of isKnownNonNegative() in InstCombine yet, as it currently doesn't go through SimplifyQuery. I'll see about migrating those APIs tomorrow. |
@@ -16,11 +16,15 @@ define i32 @test_asr(i32 %a, i32 %b) { | |||
; CHECK-NEXT: [[C:%.*]] = icmp slt i32 [[A]], 0 | |||
; CHECK-NEXT: br i1 [[C]], label [[BB2:%.*]], label [[BB3:%.*]] | |||
; CHECK: bb2: | |||
; CHECK-NEXT: [[NOT:%.*]] = xor i32 [[A]], -1 | |||
; CHECK-NEXT: [[D:%.*]] = lshr i32 [[NOT]], [[B]] | |||
; CHECK-NEXT: [[NOT2:%.*]] = xor i32 [[D]], -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed by faebb1b.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, I saw that the output changed and thought that it was fixed, but this wasn't sufficient. We now have ashr in one branch and lshr in the other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, the remaining pattern is already handled by an extra simplifycfg+instcombine run, because we have this select fold:
static Value *foldSelectICmpLshrAshr(const ICmpInst *IC, Value *TrueVal, |
I've adjusted the test to check for the combination of the three passes, to show that it folds in that case. But I'd also be open to add a variant of this fold for phi nodes rather than selects, if you think it's necessary.
This is done now, but introduced an additional regression in idioms.ll. |
b51bd83
to
9a3689f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation looks good to me. Waiting for the result of my CI.
https://github.com/dtcxzyw/llvm-ci/actions/runs/7066692655
@goldsteinn Any comments?
My CI detected some significant regressions caused by this patch: |
@dtcxzyw Could you please run another test with this branch? https://github.com/nikic/llvm-project/tree/perf/dom-cond-3 I believe the issue with DILATE is that we have some icmps that get canonicalized from signed to unsigned predicate, and IndVars is not able to perform some reasoning anymore. I think a change like 7fa42da may help, but possibly not completely and it has compile-time impact. So I'm considering disabling the use of dominating conditions for the icmp known bits simplification to not go down this rabbit hole. (The proper solution is probably to have something similar to zext nneg for icmp.) |
I went ahead and pushed a clean up version of that change to this PR, so you can just re-test this PR. I've found that dropping the icmp case also removes most of the second-order compile-time regressions, and also drops the (very minor) regression in the BPF test, so those are also points in favor of doing this. |
✅ With the latest revision this PR passed the C/C++ code formatter. |
Looks like the regression in |
The problem for mpeg2decode seems to be that we do more |
Hi, @nikic, Note that bpf verification failure here does not mean that generated code is incorrect. It means llvm generates 'more complex' code and the current verifier cannot handle that. I created a unit test like below:
The compilation flag: For a llvm whose top commit is the patch, the assembly looks like
In the above, I added a few comments to show why verification failure. The more detail can be found in For a llvm whose top commit is the patch below this commit, the assembly looks like
With '-mllvm -print-after-all', I am able to find why this patch caused the problem.
You can see that idxprom is sign extended and the variable is used to construct the array index. The current For without this patch, the IR eventually removed sext since it knows the value range at CorrelatedValuePropagationPass
So my question is that we would like to undo this transformation in llvm so we can please the verifier. Any suggestion on how to do this? Thanks! cc @4ast cc @eddyz87 @anakryiko @jemarch |
@yonghong-song I think it may be possible to improve CVP to handle this better, in which case we won't need BPF workarounds. I'll look into it. |
@nikic , thanks! |
@nikic and @bjope. Indeed, #75039 didn't resolve the issue and the generated bpf code is the same as without this patch, so some more works are needed. Thank you both for looking at the issue! |
Some additional details on BPF verifier failure. @a = global i32 0, align 4
@g = global i32 0, align 4
define i64 @foo() {
entry:
%a = load i32, ptr @a
%a.cmp = icmp ugt i32 %a, 32
br i1 %a.cmp, label %end, label %l1 ; establish %a in range [0, 32]
l1:
%g = load i32, ptr @g
%g.cmp.1 = icmp sgt i32 %g, -1
%g.cmp.a = icmp slt i32 %g, %a
%cond = and i1 %g.cmp.a, %g.cmp.1
br i1 %cond, label %l2, label %end ; establish %g in range [0, %a]
l2:
%g.64 = sext i32 %g to i64
ret i64 %g.64
end:
ret i64 0
} Also consider the following opt command:
Before this MR generated code looked as follows: ...
entry:
%a = load i32, ptr @a, align 4
%a.cmp = icmp ugt i32 %a, 32
br i1 %a.cmp, label %end, label %l1
l1:
%g = load i32, ptr @g, align 4
%g.cmp.1 = icmp sgt i32 %g, -1
%g.cmp.a = icmp slt i32 %g, %a
%cond = and i1 %g.cmp.a, %g.cmp.1
br i1 %cond, label %l2, label %end
l2:
%g.64 = zext nneg i32 %g to i64 ; <--- note zext nneg
ret i64 %g.64
... After this MR generated code looks as follows: ...
entry:
%a = load i32, ptr @a, align 4
%a.cmp = icmp ugt i32 %a, 32
br i1 %a.cmp, label %end, label %l1
l1:
%g = load i32, ptr @g, align 4
%cond = icmp ult i32 %g, %a ; <--- conditions merged by instcombine
br i1 %cond, label %l2, label %end
l2:
%g.64 = sext i32 %g to i64 ; <--- note sext
ret i64 %g.64
... Updated I think that there are no issues with transformation applied by updated |
@eddyz87 Right, this is exactly the change I have been working on, see #75311. Unfortunately, it doesn't actually fix the BPF case, because that one involves a loop, and LVI is currently terrible at handling those. (Basically, even though the relevant condition is outside the loop, just querying something in the loop will form a cycle, which will be resolved to overdefined.) |
heads-up, we are seeing some performance regressions due to this patch 8about 6-7%! We will add more details about this soon. |
Here is another thing that I noticed after this patch: https://godbolt.org/z/1P7bnKGjh So early instcombine is eliminating an Maybe I should file a separate issue for that? Not sure if it should be consider as a phase ordering problem. Had perhaps been nice if simplifycfg had been able to understand that it basically could match an |
I don't think it's a profitable transformation in simplifycfg with larger amount of instructions. In x86, backend generates much larger code for the example with collapsed cfg: https://godbolt.org/z/YP9GjjP8c. |
Optimization pipeline is doing simplifications and canonicalizations. If you for example use |
You're right, it depends on the backend. For GPU, it sounds good to hoist common operations as selects to realize it. But I'm not sure whether such transformation should happen in simplifycfg pass with specified target info(and option), or just in backend. If in simplifycfg, it may resist other optimizations between basic blocks. |
Currently, LVI will only use conditions like "X < C" to constrain the value of X on the relevant edge. This patch extends it to handle conditions like "X < Y" by querying the known range of Y. This means that getValueFromCondition() and various related APIs can now return nullopt to indicate that they have pushed to the worklist, and need to be called again later. This behavior is currently controlled by a UseBlockValue option, and only enabled for actual edge value handling. All other places deriving constraints from conditions keep using the previous logic for now. This change was originally motivated as a fix for the regression reported in #73662 (comment). Unfortunately, it doesn't actually fix it, because we run into another issue there (LVI currently is really bad at handling values used in loops). This change has some compile-time impact, but it's fairly small, in the 0.05% range.
…mily (#80657) This patch refactors the interface of the `computeKnownFPClass` family to pass `SimplifyQuery` directly. The motivation of this patch is to compute known fpclass with `DomConditionCache`, which was introduced by #73662. With `DomConditionCache`, we can do more optimization with context-sensitive information. Example (extracted from [fmt/format.h](https://github.com/fmtlib/fmt/blob/e17bc67547a66cdd378ca6a90c56b865d30d6168/include/fmt/format.h#L3555-L3566)): ``` define float @test(float %x, i1 %cond) { %i32 = bitcast float %x to i32 %cmp = icmp slt i32 %i32, 0 br i1 %cmp, label %if.then1, label %if.else if.then1: %fneg = fneg float %x br label %if.end if.else: br i1 %cond, label %if.then2, label %if.end if.then2: br label %if.end if.end: %value = phi float [ %fneg, %if.then1 ], [ %x, %if.then2 ], [ %x, %if.else ] %ret = call float @llvm.fabs.f32(float %value) ret float %ret } ``` We can prove the signbit of `%value` is always zero. Then the fabs can be eliminated.
…mily (llvm#80657) This patch refactors the interface of the `computeKnownFPClass` family to pass `SimplifyQuery` directly. The motivation of this patch is to compute known fpclass with `DomConditionCache`, which was introduced by llvm#73662. With `DomConditionCache`, we can do more optimization with context-sensitive information. Example (extracted from [fmt/format.h](https://github.com/fmtlib/fmt/blob/e17bc67547a66cdd378ca6a90c56b865d30d6168/include/fmt/format.h#L3555-L3566)): ``` define float @test(float %x, i1 %cond) { %i32 = bitcast float %x to i32 %cmp = icmp slt i32 %i32, 0 br i1 %cmp, label %if.then1, label %if.else if.then1: %fneg = fneg float %x br label %if.end if.else: br i1 %cond, label %if.then2, label %if.end if.then2: br label %if.end if.end: %value = phi float [ %fneg, %if.then1 ], [ %x, %if.then2 ], [ %x, %if.else ] %ret = call float @llvm.fabs.f32(float %value) ret float %ret } ``` We can prove the signbit of `%value` is always zero. Then the fabs can be eliminated.
Apologies for resurrecting this old thread, but I found a problem with this patch. It seems like it makes a comparator violate strict weak ordering requirements. This manifests as an assertion failure when Clang is compiled with the corresponding libc++ check:
|
Filed #108618 |
This adds support for using dominating conditions in computeKnownBits() when called from InstCombine. The implementation uses a DomConditionCache, which stores which branches may provide information that is relevant for a given value.
DomConditionCache is similar to AssumptionCache, but does not try to do any kind of automatic tracking. Relevant branches have to be explicitly registered and invalidated values explicitly removed. The necessary tracking is done inside InstCombine.
The reason why this doesn't just do exactly the same thing as AssumptionCache is that a lot more transforms touch branches and branch conditions than assumptions. AssumptionCache is an immutable analysis and mostly gets away with this because only a handful of places have to register additional assumptions (mostly as a result of cloning). This is very much not the case for branches.
This change has some non-trivial impact on compile-time. The first-order impact is about 0.2%. This is the impact if we compute the KnownBits, but don't use them. The second-order impact is about 0.1% -- however, on stage2-O0-g it's actually a -0.2% improvement, which indicates that this change results in additional optimizations inside clang itself.
Fixes #74242.