Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CVP] Improve the value solving of select at use #76700

Merged
merged 2 commits into from
Jan 5, 2024

Conversation

dtcxzyw
Copy link
Member

@dtcxzyw dtcxzyw commented Jan 2, 2024

This patch improves the value solving of select at use if the condition is an icmp and we know the result of comparison from LVI->getPredicateAt.

Compile-time impact: http://llvm-compile-time-tracker.com/compare.php?from=7e405eb722e40c79b7726201d0f76b5dab34ba0f&to=3c315b1ddcb0ad82554b33f08b9356679fae4bb7&stat=instructions:u

stage1-O3 stage1-ReleaseThinLTO stage1-ReleaseLTO-g stage1-O0-g stage2-O3 stage2-O0-g stage2-clang
-0.01% +0.01% -0.00% -0.00% -0.08% +0.02% -0.01%

@dtcxzyw dtcxzyw requested a review from preames January 2, 2024 07:03
@dtcxzyw dtcxzyw requested a review from nikic as a code owner January 2, 2024 07:03
@llvmbot
Copy link
Member

llvmbot commented Jan 2, 2024

@llvm/pr-subscribers-llvm-transforms

Author: Yingwei Zheng (dtcxzyw)

Changes

This patch improves the value solving of select at use if the condition is an icmp and we know the result of comparison from LVI->getPredicateAt.

Compile-time impact: http://llvm-compile-time-tracker.com/compare.php?from=7e405eb722e40c79b7726201d0f76b5dab34ba0f&to=3c315b1ddcb0ad82554b33f08b9356679fae4bb7&stat=instructions:u

stage1-O3 stage1-ReleaseThinLTO stage1-ReleaseLTO-g stage1-O0-g stage2-O3 stage2-O0-g stage2-clang
-0.01% +0.01% -0.00% -0.00% -0.08% +0.02% -0.01%

Full diff: https://github.com/llvm/llvm-project/pull/76700.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/Scalar/CorrelatedValuePropagation.cpp (+26-25)
  • (modified) llvm/test/Transforms/CorrelatedValuePropagation/select.ll (+23)
diff --git a/llvm/lib/Transforms/Scalar/CorrelatedValuePropagation.cpp b/llvm/lib/Transforms/Scalar/CorrelatedValuePropagation.cpp
index c44d3748a80d8b..9235850de92f3e 100644
--- a/llvm/lib/Transforms/Scalar/CorrelatedValuePropagation.cpp
+++ b/llvm/lib/Transforms/Scalar/CorrelatedValuePropagation.cpp
@@ -94,6 +94,31 @@ STATISTIC(NumUDivURemsNarrowedExpanded,
           "Number of bound udiv's/urem's expanded");
 STATISTIC(NumZExt, "Number of non-negative deductions");
 
+static Constant *getConstantAt(Value *V, Instruction *At, LazyValueInfo *LVI) {
+  if (Constant *C = LVI->getConstant(V, At))
+    return C;
+
+  // TODO: The following really should be sunk inside LVI's core algorithm, or
+  // at least the outer shims around such.
+  auto *C = dyn_cast<CmpInst>(V);
+  if (!C)
+    return nullptr;
+
+  Value *Op0 = C->getOperand(0);
+  Constant *Op1 = dyn_cast<Constant>(C->getOperand(1));
+  if (!Op1)
+    return nullptr;
+
+  LazyValueInfo::Tristate Result = LVI->getPredicateAt(
+      C->getPredicate(), Op0, Op1, At, /*UseBlockValue=*/false);
+  if (Result == LazyValueInfo::Unknown)
+    return nullptr;
+
+  return (Result == LazyValueInfo::True)
+             ? ConstantInt::getTrue(C->getContext())
+             : ConstantInt::getFalse(C->getContext());
+}
+
 static bool processSelect(SelectInst *S, LazyValueInfo *LVI) {
   if (S->getType()->isVectorTy() || isa<Constant>(S->getCondition()))
     return false;
@@ -106,7 +131,7 @@ static bool processSelect(SelectInst *S, LazyValueInfo *LVI) {
       C = LVI->getConstantOnEdge(S->getCondition(), PN->getIncomingBlock(U),
                                  I->getParent(), I);
     else
-      C = LVI->getConstant(S->getCondition(), I);
+      C = getConstantAt(S->getCondition(), I, LVI);
 
     auto *CI = dyn_cast_or_null<ConstantInt>(C);
     if (!CI)
@@ -1109,30 +1134,6 @@ static bool processAnd(BinaryOperator *BinOp, LazyValueInfo *LVI) {
   return true;
 }
 
-
-static Constant *getConstantAt(Value *V, Instruction *At, LazyValueInfo *LVI) {
-  if (Constant *C = LVI->getConstant(V, At))
-    return C;
-
-  // TODO: The following really should be sunk inside LVI's core algorithm, or
-  // at least the outer shims around such.
-  auto *C = dyn_cast<CmpInst>(V);
-  if (!C) return nullptr;
-
-  Value *Op0 = C->getOperand(0);
-  Constant *Op1 = dyn_cast<Constant>(C->getOperand(1));
-  if (!Op1) return nullptr;
-
-  LazyValueInfo::Tristate Result = LVI->getPredicateAt(
-      C->getPredicate(), Op0, Op1, At, /*UseBlockValue=*/false);
-  if (Result == LazyValueInfo::Unknown)
-    return nullptr;
-
-  return (Result == LazyValueInfo::True) ?
-    ConstantInt::getTrue(C->getContext()) :
-    ConstantInt::getFalse(C->getContext());
-}
-
 static bool runImpl(Function &F, LazyValueInfo *LVI, DominatorTree *DT,
                     const SimplifyQuery &SQ) {
   bool FnChanged = false;
diff --git a/llvm/test/Transforms/CorrelatedValuePropagation/select.ll b/llvm/test/Transforms/CorrelatedValuePropagation/select.ll
index 28a8516a910280..9842328db70209 100644
--- a/llvm/test/Transforms/CorrelatedValuePropagation/select.ll
+++ b/llvm/test/Transforms/CorrelatedValuePropagation/select.ll
@@ -372,4 +372,27 @@ define i64 @select_cond_may_undef(i32 %a) {
   ret i64 %max
 }
 
+define i32 @test_solve_select_at_use(i32 %a, i32 %b, i32 %c) {
+; CHECK-LABEL: define i32 @test_solve_select_at_use
+; CHECK-SAME: (i32 [[A:%.*]], i32 [[B:%.*]], i32 [[C:%.*]]) {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[CMP:%.*]] = icmp slt i32 [[A]], 0
+; CHECK-NEXT:    [[COND:%.*]] = icmp sgt i32 [[A]], -1
+; CHECK-NEXT:    br i1 [[COND]], label [[IF_THEN:%.*]], label [[IF_ELSE:%.*]]
+; CHECK:       if.then:
+; CHECK-NEXT:    ret i32 [[C]]
+; CHECK:       if.else:
+; CHECK-NEXT:    ret i32 [[B]]
+;
+entry:
+  %cmp = icmp slt i32 %a, 0
+  %retval = select i1 %cmp, i32 %b, i32 %c
+  %cond = icmp sgt i32 %a, -1
+  br i1 %cond, label %if.then, label %if.else
+if.then:
+  ret i32 %retval
+if.else:
+  ret i32 %retval
+}
+
 !0 = !{}

dtcxzyw added a commit to dtcxzyw/llvm-opt-benchmark that referenced this pull request Jan 2, 2024
Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dtcxzyw dtcxzyw merged commit 848d7af into llvm:main Jan 5, 2024
5 checks passed
@dtcxzyw dtcxzyw deleted the perf/cvp-select-at-use branch January 5, 2024 18:32
@dtcxzyw
Copy link
Member Author

dtcxzyw commented Jan 6, 2024

Hi @nikic! I think I have figured out how to fix the regression caused by this patch.

Here is the reduced IR:

define void @test_icmp_with_abs_loop(i16 %a, i16 %b) {
  %abs = call i16 @llvm.abs.i16(i16 %a, i1 false)
  %cmp1 = icmp slt i16 %a, 0
  %cmp2 = icmp eq i16 %abs, 1
  %or = select i1 %cmp1, i1 true, i1 %cmp2
  call void @use(i1 %or)
  br i1 %or, label %exit, label %if.then
if.then:
  %cmp3 = icmp ult i16 %a, %b
  call void @use(i1 %cmp3)
  br i1 %cmp3, label %for.body, label %exit
for.body:
  %indvar = phi i32 [0, %if.then], [%indvar.next, %for.body]
  %indvar.next = add i32 %indvar, 1
  %cmp5 = icmp ult i32 %indvar.next, 10
  br i1 %cmp5, label %for.body, label %if.then2
if.then2:
  %cmp4 = icmp ult i16 %abs, %b
  call void @use(i1 %cmp4)
  br label %exit
exit:
  ret void
}
declare i16 @llvm.abs.i16(i16, i1)
declare void @use(i1)

%cmp4 = icmp ult i16 %abs, %b always evaluates to true because %abs == %a and %a <u %b.
I tried to replace the abs at use in CVP. But it only worked with the cases without loops:

define void @test_icmp_with_abs(i16 %a, i16 %b) {
  %abs = call i16 @llvm.abs.i16(i16 %a, i1 false)
  %cmp1 = icmp slt i16 %a, 0
  %cmp2 = icmp eq i16 %abs, 1
  %or = select i1 %cmp1, i1 true, i1 %cmp2
  call void @use(i1 %or)
  br i1 %or, label %exit, label %if.then
if.then:
  %cmp3 = icmp ult i16 %a, %b
  call void @use(i1 %cmp3)
  br i1 %cmp3, label %if.then2, label %exit
if.then2:
  %cmp4 = icmp ult i16 %abs, %b ; %abs will be replaced with %a
  call void @use(i1 %cmp4)
  br label %exit
exit:
  ret void
}
declare i16 @llvm.abs.i16(i16, i1)
declare void @use(i1)

After further investigation I have found solveBlockValueNonLocal failed to given the range of %a at %if.then2 due to the existence of an unrelated loop.

The debug output shows that we hit a cycle and exit early. But I think at least we can infer some information from the idom %if.then :(

// Loop over all of our predecessors, merging what we know from them into
// result. If we encounter an unexplored predecessor, we eagerly explore it
// in a depth first manner. In practice, this has the effect of discovering
// paths we can't analyze eagerly without spending compile times analyzing
// other paths. This heuristic benefits from the fact that predecessors are
// frequently arranged such that dominating ones come first and we quickly
// find a path to function entry. TODO: We should consider explicitly
// canonicalizing to make this true rather than relying on this happy
// accident.
for (BasicBlock *Pred : predecessors(BB)) {
std::optional<ValueLatticeElement> EdgeResult = getEdgeValue(Val, Pred, BB);
if (!EdgeResult)
// Explore that input, then return here
return std::nullopt;
Result.mergeIn(*EdgeResult);
// If we hit overdefined, exit early. The BlockVals entry is already set
// to overdefined.
if (Result.isOverdefined()) {
LLVM_DEBUG(dbgs() << " compute BB '" << BB->getName()
<< "' - overdefined because of pred '"
<< Pred->getName() << "' (non local).\n");
return Result;
}
}

DomTree info is unavailable for LVI because the analysis is also used by LowerSwitch and JumpThreading, which don't update DomTree eagerly.
Could you please give me some suggestions on how to address this problem?
If not, I will fix this issue in ConstraintElimination.

@nikic
Copy link
Contributor

nikic commented Jan 6, 2024

This issue is on my todo list for next week, as it's also needed for #73662 (comment). My tentative plan was to add a new value lattice state to represent cycles, which will mostly get treated as overdefined, but can be preserved for pass-through operations.

justinfargnoli pushed a commit to justinfargnoli/llvm-project that referenced this pull request Jan 28, 2024
This patch improves the value solving of select at use if the condition
is an icmp and we know the result of comparison from
`LVI->getPredicateAt`.

Compile-time impact:
http://llvm-compile-time-tracker.com/compare.php?from=7e405eb722e40c79b7726201d0f76b5dab34ba0f&to=3c315b1ddcb0ad82554b33f08b9356679fae4bb7&stat=instructions:u

|stage1-O3|stage1-ReleaseThinLTO|stage1-ReleaseLTO-g|stage1-O0-g|stage2-O3|stage2-O0-g|stage2-clang|
|--|--|--|--|--|--|--|
|-0.01%|+0.01%|-0.00%|-0.00%|-0.08%|+0.02%|-0.01%|
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants