Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Reassociate] Drop weight reduction to fix issue 91417 #91469

Merged
merged 3 commits into from
May 29, 2024

Conversation

dtcxzyw
Copy link
Member

@dtcxzyw dtcxzyw commented May 8, 2024

See the following case: https://alive2.llvm.org/ce/z/A-fBki

define i3 @src(i3 %0) {
  %2 = mul i3 %0, %0
  %3 = mul i3 %2, %0
  %4 = mul i3 %3, %0
  %5 = mul nsw i3 %4, %0
  ret i3 %5
}

define i3 @tgt(i3 %0) {
  %2 = mul i3 %0, %0
  %5 = mul nsw i3 %2, %0
  ret i3 %5
}

d7aeefe introduced weight reduction during weights combination of the same operand. As the weight of %0 changes from 5 to 3, the nsw flag in %5 should be dropped.

However, the nsw flag isn't cleared by RewriteExprTree since %5 = mul nsw i3 %0, %4 is not included in the range of [ExpressionChangedStart, ExpressionChangedEnd).

Calculated Rank[] = 3
Combine negations for:   %2 = mul i3 %0, %0
Calculated Rank[] = 4
Combine negations for:   %3 = mul i3 %0, %2
Calculated Rank[] = 5
Combine negations for:   %4 = mul i3 %0, %3
Calculated Rank[] = 6
Combine negations for:   %5 = mul nsw i3 %0, %4
LINEARIZE:   %5 = mul nsw i3 %0, %4
OPERAND: i3 %0 (1)
ADD USES LEAF: i3 %0 (1)
OPERAND:   %4 = mul i3 %0, %3 (1)
DIRECT ADD:   %4 = mul i3 %0, %3 (1)
OPERAND: i3 %0 (1)
OPERAND:   %3 = mul i3 %0, %2 (1)
DIRECT ADD:   %3 = mul i3 %0, %2 (1)
OPERAND: i3 %0 (1)
OPERAND:   %2 = mul i3 %0, %0 (1)
DIRECT ADD:   %2 = mul i3 %0, %0 (1)
OPERAND: i3 %0 (1)
OPERAND: i3 %0 (1)
RAIn:   mul i3  [ %0, #3] [ %0, #3] [ %0, #3] 
RAOut:  mul i3  [ %0, #3] [ %0, #3] [ %0, #3] 
RAOut after CSE reorder:        mul i3  [ %0, #3] [ %0, #3] [ %0, #3] 
RA:   %5 = mul nsw i3 %0, %4
TO:   %5 = mul nsw i3 %4, %0
RA:   %4 = mul i3 %0, %3
TO:   %4 = mul i3 %0, %0

The best way to fix this is to inform RewriteExprTree to clear flags of the whole expr tree when weight reduction happens.

But I find that weight reduction based on Carmichael number never happens in practice.
See the coverage result https://dtcxzyw.github.io/llvm-opt-benchmark/coverage/home/dtcxzyw/llvm-project/llvm/lib/Transforms/Scalar/Reassociate.cpp.html#L323

I think it would be better to drop IncorporateWeight.

Fixes #91417

@dtcxzyw dtcxzyw requested review from nikic and RKSimon May 8, 2024 12:47
@llvmbot
Copy link
Member

llvmbot commented May 8, 2024

@llvm/pr-subscribers-llvm-transforms

Author: Yingwei Zheng (dtcxzyw)

Changes

See the following case: https://alive2.llvm.org/ce/z/A-fBki

define i3 @<!-- -->src(i3 %0) {
  %2 = mul i3 %0, %0
  %3 = mul i3 %2, %0
  %4 = mul i3 %3, %0
  %5 = mul nsw i3 %4, %0
  ret i3 %5
}

define i3 @<!-- -->tgt(i3 %0) {
  %2 = mul i3 %0, %0
  %5 = mul nsw i3 %2, %0
  ret i3 %5
}

d7aeefe introduced weight reduction during weights combination of the same operand. As the weight of %0 changes from 5 to 3, the nsw flag in %5 should be dropped.

However, the nsw flag isn't cleared by RewriteExprTree since %5 = mul nsw i3 %0, %4 is not included in the range of [ExpressionChangedStart, ExpressionChangedEnd).

Calculated Rank[] = 3
Combine negations for:   %2 = mul i3 %0, %0
Calculated Rank[] = 4
Combine negations for:   %3 = mul i3 %0, %2
Calculated Rank[] = 5
Combine negations for:   %4 = mul i3 %0, %3
Calculated Rank[] = 6
Combine negations for:   %5 = mul nsw i3 %0, %4
LINEARIZE:   %5 = mul nsw i3 %0, %4
OPERAND: i3 %0 (1)
ADD USES LEAF: i3 %0 (1)
OPERAND:   %4 = mul i3 %0, %3 (1)
DIRECT ADD:   %4 = mul i3 %0, %3 (1)
OPERAND: i3 %0 (1)
OPERAND:   %3 = mul i3 %0, %2 (1)
DIRECT ADD:   %3 = mul i3 %0, %2 (1)
OPERAND: i3 %0 (1)
OPERAND:   %2 = mul i3 %0, %0 (1)
DIRECT ADD:   %2 = mul i3 %0, %0 (1)
OPERAND: i3 %0 (1)
OPERAND: i3 %0 (1)
RAIn:   mul i3  [ %0, #<!-- -->3] [ %0, #<!-- -->3] [ %0, #<!-- -->3] 
RAOut:  mul i3  [ %0, #<!-- -->3] [ %0, #<!-- -->3] [ %0, #<!-- -->3] 
RAOut after CSE reorder:        mul i3  [ %0, #<!-- -->3] [ %0, #<!-- -->3] [ %0, #<!-- -->3] 
RA:   %5 = mul nsw i3 %0, %4
TO:   %5 = mul nsw i3 %4, %0
RA:   %4 = mul i3 %0, %3
TO:   %4 = mul i3 %0, %0

The best way to fix this is to inform RewriteExprTree to clear flags of the whole expr tree when weight reduction happens.

But I find that weight reduction based on Carmichael number never happens in practice.
See the coverage result https://dtcxzyw.github.io/llvm-opt-benchmark/coverage/home/dtcxzyw/llvm-project/llvm/lib/Transforms/Scalar/Reassociate.cpp.html#L323

I think it would be better to drop IncorporateWeight.

Fixes #91417


Full diff: https://github.com/llvm/llvm-project/pull/91469.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/Scalar/Reassociate.cpp (+1-111)
  • (modified) llvm/test/Transforms/Reassociate/repeats.ll (+125-62)
diff --git a/llvm/lib/Transforms/Scalar/Reassociate.cpp b/llvm/lib/Transforms/Scalar/Reassociate.cpp
index d91320863e241..270182f0e5ac6 100644
--- a/llvm/lib/Transforms/Scalar/Reassociate.cpp
+++ b/llvm/lib/Transforms/Scalar/Reassociate.cpp
@@ -302,97 +302,6 @@ static BinaryOperator *LowerNegateToMultiply(Instruction *Neg) {
   return Res;
 }
 
-/// Returns k such that lambda(2^Bitwidth) = 2^k, where lambda is the Carmichael
-/// function. This means that x^(2^k) === 1 mod 2^Bitwidth for
-/// every odd x, i.e. x^(2^k) = 1 for every odd x in Bitwidth-bit arithmetic.
-/// Note that 0 <= k < Bitwidth, and if Bitwidth > 3 then x^(2^k) = 0 for every
-/// even x in Bitwidth-bit arithmetic.
-static unsigned CarmichaelShift(unsigned Bitwidth) {
-  if (Bitwidth < 3)
-    return Bitwidth - 1;
-  return Bitwidth - 2;
-}
-
-/// Add the extra weight 'RHS' to the existing weight 'LHS',
-/// reducing the combined weight using any special properties of the operation.
-/// The existing weight LHS represents the computation X op X op ... op X where
-/// X occurs LHS times.  The combined weight represents  X op X op ... op X with
-/// X occurring LHS + RHS times.  If op is "Xor" for example then the combined
-/// operation is equivalent to X if LHS + RHS is odd, or 0 if LHS + RHS is even;
-/// the routine returns 1 in LHS in the first case, and 0 in LHS in the second.
-static void IncorporateWeight(APInt &LHS, const APInt &RHS, unsigned Opcode) {
-  // If we were working with infinite precision arithmetic then the combined
-  // weight would be LHS + RHS.  But we are using finite precision arithmetic,
-  // and the APInt sum LHS + RHS may not be correct if it wraps (it is correct
-  // for nilpotent operations and addition, but not for idempotent operations
-  // and multiplication), so it is important to correctly reduce the combined
-  // weight back into range if wrapping would be wrong.
-
-  // If RHS is zero then the weight didn't change.
-  if (RHS.isMinValue())
-    return;
-  // If LHS is zero then the combined weight is RHS.
-  if (LHS.isMinValue()) {
-    LHS = RHS;
-    return;
-  }
-  // From this point on we know that neither LHS nor RHS is zero.
-
-  if (Instruction::isIdempotent(Opcode)) {
-    // Idempotent means X op X === X, so any non-zero weight is equivalent to a
-    // weight of 1.  Keeping weights at zero or one also means that wrapping is
-    // not a problem.
-    assert(LHS == 1 && RHS == 1 && "Weights not reduced!");
-    return; // Return a weight of 1.
-  }
-  if (Instruction::isNilpotent(Opcode)) {
-    // Nilpotent means X op X === 0, so reduce weights modulo 2.
-    assert(LHS == 1 && RHS == 1 && "Weights not reduced!");
-    LHS = 0; // 1 + 1 === 0 modulo 2.
-    return;
-  }
-  if (Opcode == Instruction::Add || Opcode == Instruction::FAdd) {
-    // TODO: Reduce the weight by exploiting nsw/nuw?
-    LHS += RHS;
-    return;
-  }
-
-  assert((Opcode == Instruction::Mul || Opcode == Instruction::FMul) &&
-         "Unknown associative operation!");
-  unsigned Bitwidth = LHS.getBitWidth();
-  // If CM is the Carmichael number then a weight W satisfying W >= CM+Bitwidth
-  // can be replaced with W-CM.  That's because x^W=x^(W-CM) for every Bitwidth
-  // bit number x, since either x is odd in which case x^CM = 1, or x is even in
-  // which case both x^W and x^(W - CM) are zero.  By subtracting off multiples
-  // of CM like this weights can always be reduced to the range [0, CM+Bitwidth)
-  // which by a happy accident means that they can always be represented using
-  // Bitwidth bits.
-  // TODO: Reduce the weight by exploiting nsw/nuw?  (Could do much better than
-  // the Carmichael number).
-  if (Bitwidth > 3) {
-    /// CM - The value of Carmichael's lambda function.
-    APInt CM = APInt::getOneBitSet(Bitwidth, CarmichaelShift(Bitwidth));
-    // Any weight W >= Threshold can be replaced with W - CM.
-    APInt Threshold = CM + Bitwidth;
-    assert(LHS.ult(Threshold) && RHS.ult(Threshold) && "Weights not reduced!");
-    // For Bitwidth 4 or more the following sum does not overflow.
-    LHS += RHS;
-    while (LHS.uge(Threshold))
-      LHS -= CM;
-  } else {
-    // To avoid problems with overflow do everything the same as above but using
-    // a larger type.
-    unsigned CM = 1U << CarmichaelShift(Bitwidth);
-    unsigned Threshold = CM + Bitwidth;
-    assert(LHS.getZExtValue() < Threshold && RHS.getZExtValue() < Threshold &&
-           "Weights not reduced!");
-    unsigned Total = LHS.getZExtValue() + RHS.getZExtValue();
-    while (Total >= Threshold)
-      Total -= CM;
-    LHS = Total;
-  }
-}
-
 using RepeatedValue = std::pair<Value*, APInt>;
 
 /// Given an associative binary expression, return the leaf
@@ -559,26 +468,7 @@ static bool LinearizeExprTree(Instruction *I,
                "In leaf map but not visited!");
 
         // Update the number of paths to the leaf.
-        IncorporateWeight(It->second, Weight, Opcode);
-
-#if 0   // TODO: Re-enable once PR13021 is fixed.
-        // The leaf already has one use from inside the expression.  As we want
-        // exactly one such use, drop this new use of the leaf.
-        assert(!Op->hasOneUse() && "Only one use, but we got here twice!");
-        I->setOperand(OpIdx, UndefValue::get(I->getType()));
-        Changed = true;
-
-        // If the leaf is a binary operation of the right kind and we now see
-        // that its multiple original uses were in fact all by nodes belonging
-        // to the expression, then no longer consider it to be a leaf and add
-        // its operands to the expression.
-        if (BinaryOperator *BO = isReassociableOp(Op, Opcode)) {
-          LLVM_DEBUG(dbgs() << "UNLEAF: " << *Op << " (" << It->second << ")\n");
-          Worklist.push_back(std::make_pair(BO, It->second));
-          Leaves.erase(It);
-          continue;
-        }
-#endif
+        It->second += Weight;
 
         // If we still have uses that are not accounted for by the expression
         // then it is not safe to modify the value.
diff --git a/llvm/test/Transforms/Reassociate/repeats.ll b/llvm/test/Transforms/Reassociate/repeats.ll
index c18db19fa73e3..28177f1c0ba5e 100644
--- a/llvm/test/Transforms/Reassociate/repeats.ll
+++ b/llvm/test/Transforms/Reassociate/repeats.ll
@@ -1,56 +1,68 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 4
 ; RUN: opt < %s -passes=reassociate -S | FileCheck %s
 
 ; Tests involving repeated operations on the same value.
 
 define i8 @nilpotent(i8 %x) {
-; CHECK-LABEL: @nilpotent(
+; CHECK-LABEL: define i8 @nilpotent(
+; CHECK-SAME: i8 [[X:%.*]]) {
+; CHECK-NEXT:    ret i8 0
+;
   %tmp = xor i8 %x, %x
   ret i8 %tmp
-; CHECK: ret i8 0
 }
 
 define i2 @idempotent(i2 %x) {
-; CHECK-LABEL: @idempotent(
+; CHECK-LABEL: define i2 @idempotent(
+; CHECK-SAME: i2 [[X:%.*]]) {
+; CHECK-NEXT:    ret i2 -1
+;
   %tmp1 = and i2 %x, %x
   %tmp2 = and i2 %tmp1, %x
   %tmp3 = and i2 %tmp2, %x
   ret i2 %tmp3
-; CHECK: ret i2 %x
 }
 
 define i2 @add(i2 %x) {
-; CHECK-LABEL: @add(
+; CHECK-LABEL: define i2 @add(
+; CHECK-SAME: i2 [[X:%.*]]) {
+; CHECK-NEXT:    ret i2 0
+;
   %tmp1 = add i2 %x, %x
   %tmp2 = add i2 %tmp1, %x
   %tmp3 = add i2 %tmp2, %x
   ret i2 %tmp3
-; CHECK: ret i2 0
 }
 
 define i2 @cst_add() {
-; CHECK-LABEL: @cst_add(
+; CHECK-LABEL: define i2 @cst_add() {
+; CHECK-NEXT:    ret i2 -1
+;
   %tmp1 = add i2 1, 1
   %tmp2 = add i2 %tmp1, 1
   ret i2 %tmp2
-; CHECK: ret i2 -1
 }
 
 define i8 @cst_mul() {
-; CHECK-LABEL: @cst_mul(
+; CHECK-LABEL: define i8 @cst_mul() {
+; CHECK-NEXT:    ret i8 -13
+;
   %tmp1 = mul i8 3, 3
   %tmp2 = mul i8 %tmp1, 3
   %tmp3 = mul i8 %tmp2, 3
   %tmp4 = mul i8 %tmp3, 3
   ret i8 %tmp4
-; CHECK: ret i8 -13
 }
 
 define i3 @foo3x5(i3 %x) {
 ; Can be done with two multiplies.
-; CHECK-LABEL: @foo3x5(
-; CHECK-NEXT: mul
-; CHECK-NEXT: mul
-; CHECK-NEXT: ret
+; CHECK-LABEL: define i3 @foo3x5(
+; CHECK-SAME: i3 [[X:%.*]]) {
+; CHECK-NEXT:    [[TMP3:%.*]] = mul i3 [[X]], [[X]]
+; CHECK-NEXT:    [[TMP4:%.*]] = mul i3 [[TMP3]], [[X]]
+; CHECK-NEXT:    [[TMP5:%.*]] = mul i3 [[TMP4]], [[TMP3]]
+; CHECK-NEXT:    ret i3 [[TMP5]]
+;
   %tmp1 = mul i3 %x, %x
   %tmp2 = mul i3 %tmp1, %x
   %tmp3 = mul i3 %tmp2, %x
@@ -58,12 +70,31 @@ define i3 @foo3x5(i3 %x) {
   ret i3 %tmp4
 }
 
+define i3 @foo3x5_nsw(i3 %x) {
+; Can be done with two multiplies.
+; CHECK-LABEL: define i3 @foo3x5_nsw(
+; CHECK-SAME: i3 [[X:%.*]]) {
+; CHECK-NEXT:    [[TMP3:%.*]] = mul i3 [[X]], [[X]]
+; CHECK-NEXT:    [[TMP2:%.*]] = mul i3 [[TMP3]], [[X]]
+; CHECK-NEXT:    [[TMP4:%.*]] = mul i3 [[TMP2]], [[TMP3]]
+; CHECK-NEXT:    ret i3 [[TMP4]]
+;
+  %tmp1 = mul i3 %x, %x
+  %tmp2 = mul i3 %tmp1, %x
+  %tmp3 = mul i3 %tmp2, %x
+  %tmp4 = mul nsw i3 %tmp3, %x
+  ret i3 %tmp4
+}
+
 define i3 @foo3x6(i3 %x) {
 ; Can be done with two multiplies.
-; CHECK-LABEL: @foo3x6(
-; CHECK-NEXT: mul
-; CHECK-NEXT: mul
-; CHECK-NEXT: ret
+; CHECK-LABEL: define i3 @foo3x6(
+; CHECK-SAME: i3 [[X:%.*]]) {
+; CHECK-NEXT:    [[TMP1:%.*]] = mul i3 [[X]], [[X]]
+; CHECK-NEXT:    [[TMP3:%.*]] = mul i3 [[TMP1]], [[X]]
+; CHECK-NEXT:    [[TMP2:%.*]] = mul i3 [[TMP3]], [[TMP3]]
+; CHECK-NEXT:    ret i3 [[TMP2]]
+;
   %tmp1 = mul i3 %x, %x
   %tmp2 = mul i3 %tmp1, %x
   %tmp3 = mul i3 %tmp2, %x
@@ -74,10 +105,14 @@ define i3 @foo3x6(i3 %x) {
 
 define i3 @foo3x7(i3 %x) {
 ; Can be done with two multiplies.
-; CHECK-LABEL: @foo3x7(
-; CHECK-NEXT: mul
-; CHECK-NEXT: mul
-; CHECK-NEXT: ret
+; CHECK-LABEL: define i3 @foo3x7(
+; CHECK-SAME: i3 [[X:%.*]]) {
+; CHECK-NEXT:    [[TMP5:%.*]] = mul i3 [[X]], [[X]]
+; CHECK-NEXT:    [[TMP7:%.*]] = mul i3 [[TMP5]], [[X]]
+; CHECK-NEXT:    [[TMP3:%.*]] = mul i3 [[TMP7]], [[X]]
+; CHECK-NEXT:    [[TMP6:%.*]] = mul i3 [[TMP3]], [[TMP7]]
+; CHECK-NEXT:    ret i3 [[TMP6]]
+;
   %tmp1 = mul i3 %x, %x
   %tmp2 = mul i3 %tmp1, %x
   %tmp3 = mul i3 %tmp2, %x
@@ -89,10 +124,13 @@ define i3 @foo3x7(i3 %x) {
 
 define i4 @foo4x8(i4 %x) {
 ; Can be done with two multiplies.
-; CHECK-LABEL: @foo4x8(
-; CHECK-NEXT: mul
-; CHECK-NEXT: mul
-; CHECK-NEXT: ret
+; CHECK-LABEL: define i4 @foo4x8(
+; CHECK-SAME: i4 [[X:%.*]]) {
+; CHECK-NEXT:    [[TMP1:%.*]] = mul i4 [[X]], [[X]]
+; CHECK-NEXT:    [[TMP3:%.*]] = mul i4 [[TMP1]], [[TMP1]]
+; CHECK-NEXT:    [[TMP4:%.*]] = mul i4 [[TMP3]], [[TMP3]]
+; CHECK-NEXT:    ret i4 [[TMP4]]
+;
   %tmp1 = mul i4 %x, %x
   %tmp2 = mul i4 %tmp1, %x
   %tmp3 = mul i4 %tmp2, %x
@@ -105,11 +143,14 @@ define i4 @foo4x8(i4 %x) {
 
 define i4 @foo4x9(i4 %x) {
 ; Can be done with three multiplies.
-; CHECK-LABEL: @foo4x9(
-; CHECK-NEXT: mul
-; CHECK-NEXT: mul
-; CHECK-NEXT: mul
-; CHECK-NEXT: ret
+; CHECK-LABEL: define i4 @foo4x9(
+; CHECK-SAME: i4 [[X:%.*]]) {
+; CHECK-NEXT:    [[TMP1:%.*]] = mul i4 [[X]], [[X]]
+; CHECK-NEXT:    [[TMP2:%.*]] = mul i4 [[TMP1]], [[TMP1]]
+; CHECK-NEXT:    [[TMP3:%.*]] = mul i4 [[TMP2]], [[X]]
+; CHECK-NEXT:    [[TMP8:%.*]] = mul i4 [[TMP3]], [[TMP2]]
+; CHECK-NEXT:    ret i4 [[TMP8]]
+;
   %tmp1 = mul i4 %x, %x
   %tmp2 = mul i4 %tmp1, %x
   %tmp3 = mul i4 %tmp2, %x
@@ -123,11 +164,14 @@ define i4 @foo4x9(i4 %x) {
 
 define i4 @foo4x10(i4 %x) {
 ; Can be done with three multiplies.
-; CHECK-LABEL: @foo4x10(
-; CHECK-NEXT: mul
-; CHECK-NEXT: mul
-; CHECK-NEXT: mul
-; CHECK-NEXT: ret
+; CHECK-LABEL: define i4 @foo4x10(
+; CHECK-SAME: i4 [[X:%.*]]) {
+; CHECK-NEXT:    [[TMP1:%.*]] = mul i4 [[X]], [[X]]
+; CHECK-NEXT:    [[TMP4:%.*]] = mul i4 [[TMP1]], [[TMP1]]
+; CHECK-NEXT:    [[TMP2:%.*]] = mul i4 [[TMP4]], [[X]]
+; CHECK-NEXT:    [[TMP3:%.*]] = mul i4 [[TMP2]], [[TMP2]]
+; CHECK-NEXT:    ret i4 [[TMP3]]
+;
   %tmp1 = mul i4 %x, %x
   %tmp2 = mul i4 %tmp1, %x
   %tmp3 = mul i4 %tmp2, %x
@@ -142,12 +186,15 @@ define i4 @foo4x10(i4 %x) {
 
 define i4 @foo4x11(i4 %x) {
 ; Can be done with four multiplies.
-; CHECK-LABEL: @foo4x11(
-; CHECK-NEXT: mul
-; CHECK-NEXT: mul
-; CHECK-NEXT: mul
-; CHECK-NEXT: mul
-; CHECK-NEXT: ret
+; CHECK-LABEL: define i4 @foo4x11(
+; CHECK-SAME: i4 [[X:%.*]]) {
+; CHECK-NEXT:    [[TMP1:%.*]] = mul i4 [[X]], [[X]]
+; CHECK-NEXT:    [[TMP4:%.*]] = mul i4 [[TMP1]], [[TMP1]]
+; CHECK-NEXT:    [[TMP2:%.*]] = mul i4 [[TMP4]], [[X]]
+; CHECK-NEXT:    [[TMP3:%.*]] = mul i4 [[TMP2]], [[X]]
+; CHECK-NEXT:    [[TMP10:%.*]] = mul i4 [[TMP3]], [[TMP2]]
+; CHECK-NEXT:    ret i4 [[TMP10]]
+;
   %tmp1 = mul i4 %x, %x
   %tmp2 = mul i4 %tmp1, %x
   %tmp3 = mul i4 %tmp2, %x
@@ -163,10 +210,14 @@ define i4 @foo4x11(i4 %x) {
 
 define i4 @foo4x12(i4 %x) {
 ; Can be done with two multiplies.
-; CHECK-LABEL: @foo4x12(
-; CHECK-NEXT: mul
-; CHECK-NEXT: mul
-; CHECK-NEXT: ret
+; CHECK-LABEL: define i4 @foo4x12(
+; CHECK-SAME: i4 [[X:%.*]]) {
+; CHECK-NEXT:    [[TMP1:%.*]] = mul i4 [[X]], [[X]]
+; CHECK-NEXT:    [[TMP4:%.*]] = mul i4 [[TMP1]], [[X]]
+; CHECK-NEXT:    [[TMP3:%.*]] = mul i4 [[TMP4]], [[TMP4]]
+; CHECK-NEXT:    [[TMP2:%.*]] = mul i4 [[TMP3]], [[TMP3]]
+; CHECK-NEXT:    ret i4 [[TMP2]]
+;
   %tmp1 = mul i4 %x, %x
   %tmp2 = mul i4 %tmp1, %x
   %tmp3 = mul i4 %tmp2, %x
@@ -183,11 +234,15 @@ define i4 @foo4x12(i4 %x) {
 
 define i4 @foo4x13(i4 %x) {
 ; Can be done with three multiplies.
-; CHECK-LABEL: @foo4x13(
-; CHECK-NEXT: mul
-; CHECK-NEXT: mul
-; CHECK-NEXT: mul
-; CHECK-NEXT: ret
+; CHECK-LABEL: define i4 @foo4x13(
+; CHECK-SAME: i4 [[X:%.*]]) {
+; CHECK-NEXT:    [[TMP1:%.*]] = mul i4 [[X]], [[X]]
+; CHECK-NEXT:    [[TMP2:%.*]] = mul i4 [[TMP1]], [[X]]
+; CHECK-NEXT:    [[TMP3:%.*]] = mul i4 [[TMP2]], [[TMP2]]
+; CHECK-NEXT:    [[TMP4:%.*]] = mul i4 [[TMP3]], [[X]]
+; CHECK-NEXT:    [[TMP12:%.*]] = mul i4 [[TMP4]], [[TMP3]]
+; CHECK-NEXT:    ret i4 [[TMP12]]
+;
   %tmp1 = mul i4 %x, %x
   %tmp2 = mul i4 %tmp1, %x
   %tmp3 = mul i4 %tmp2, %x
@@ -205,11 +260,15 @@ define i4 @foo4x13(i4 %x) {
 
 define i4 @foo4x14(i4 %x) {
 ; Can be done with three multiplies.
-; CHECK-LABEL: @foo4x14(
-; CHECK-NEXT: mul
-; CHECK-NEXT: mul
-; CHECK-NEXT: mul
-; CHECK-NEXT: ret
+; CHECK-LABEL: define i4 @foo4x14(
+; CHECK-SAME: i4 [[X:%.*]]) {
+; CHECK-NEXT:    [[TMP1:%.*]] = mul i4 [[X]], [[X]]
+; CHECK-NEXT:    [[TMP4:%.*]] = mul i4 [[TMP1]], [[X]]
+; CHECK-NEXT:    [[TMP5:%.*]] = mul i4 [[TMP4]], [[TMP4]]
+; CHECK-NEXT:    [[TMP6:%.*]] = mul i4 [[TMP5]], [[X]]
+; CHECK-NEXT:    [[TMP7:%.*]] = mul i4 [[TMP6]], [[TMP6]]
+; CHECK-NEXT:    ret i4 [[TMP7]]
+;
   %tmp1 = mul i4 %x, %x
   %tmp2 = mul i4 %tmp1, %x
   %tmp3 = mul i4 %tmp2, %x
@@ -228,12 +287,16 @@ define i4 @foo4x14(i4 %x) {
 
 define i4 @foo4x15(i4 %x) {
 ; Can be done with four multiplies.
-; CHECK-LABEL: @foo4x15(
-; CHECK-NEXT: mul
-; CHECK-NEXT: mul
-; CHECK-NEXT: mul
-; CHECK-NEXT: mul
-; CHECK-NEXT: ret
+; CHECK-LABEL: define i4 @foo4x15(
+; CHECK-SAME: i4 [[X:%.*]]) {
+; CHECK-NEXT:    [[TMP1:%.*]] = mul i4 [[X]], [[X]]
+; CHECK-NEXT:    [[TMP4:%.*]] = mul i4 [[TMP1]], [[X]]
+; CHECK-NEXT:    [[TMP3:%.*]] = mul i4 [[TMP4]], [[TMP4]]
+; CHECK-NEXT:    [[TMP6:%.*]] = mul i4 [[TMP3]], [[X]]
+; CHECK-NEXT:    [[TMP5:%.*]] = mul i4 [[TMP6]], [[X]]
+; CHECK-NEXT:    [[TMP14:%.*]] = mul i4 [[TMP5]], [[TMP6]]
+; CHECK-NEXT:    ret i4 [[TMP14]]
+;
   %tmp1 = mul i4 %x, %x
   %tmp2 = mul i4 %tmp1, %x
   %tmp3 = mul i4 %tmp2, %x

@AZero13
Copy link
Contributor

AZero13 commented May 8, 2024

Should we even keep the test cases then if this unused transform is gonna be removed anyway?

@dtcxzyw
Copy link
Member Author

dtcxzyw commented May 8, 2024

Should we even keep the test cases then if this unused transform is gonna be removed anyway?

Yes. IIRC these tests are useful to some mutation-based fuzzers.

@nikic
Copy link
Contributor

nikic commented May 9, 2024

The original patch says:

This patch fixes the computation of weights by correctly computing them no matter how big they are, rather than just overflowing and getting a wrong value.

Is "wrong" here in the sense that if overflow happens there may be a miscompile or "wrong" in the sense that the result may be sup-optimal?

@dtcxzyw
Copy link
Member Author

dtcxzyw commented May 13, 2024

The original patch says:

This patch fixes the computation of weights by correctly computing them no matter how big they are, rather than just overflowing and getting a wrong value.

Is "wrong" here in the sense that if overflow happens there may be a miscompile or "wrong" in the sense that the result may be sup-optimal?

See https://bugs.llvm.org/show_bug.cgi?id=13021. The original patch fixes the infinite loop. Reducing weights is just a "side-effect".

@nikic
Copy link
Contributor

nikic commented May 16, 2024

The original patch says:

This patch fixes the computation of weights by correctly computing them no matter how big they are, rather than just overflowing and getting a wrong value.

Is "wrong" here in the sense that if overflow happens there may be a miscompile or "wrong" in the sense that the result may be sup-optimal?

See https://bugs.llvm.org/show_bug.cgi?id=13021. The original patch fixes the infinite loop. Reducing weights is just a "side-effect".

Will reverting this bring back the infinite loop?

@dtcxzyw
Copy link
Member Author

dtcxzyw commented May 17, 2024

The original patch says:

This patch fixes the computation of weights by correctly computing them no matter how big they are, rather than just overflowing and getting a wrong value.

Is "wrong" here in the sense that if overflow happens there may be a miscompile or "wrong" in the sense that the result may be sup-optimal?

See https://bugs.llvm.org/show_bug.cgi?id=13021. The original patch fixes the infinite loop. Reducing weights is just a "side-effect".

Will reverting this bring back the infinite loop?

At least it works well with the tests in repeats.ll and the original cases:

; opt -passes=reassociate test.ll -S
define float @patatino(float %x) {
  %t0 = fmul fast float %x, %x
  %t1 = fmul fast float %t0, %t0
  %t2 = fmul fast float %t1, %t1
  %t3 = fmul fast float %t2, %t2
  %t4 = fmul fast float %t3, %t3
  %t5 = fmul fast float %t4, %t4
  %t6 = fmul fast float %t5, %t5
  %t7 = fmul fast float %t6, %t6
  %t8 = fmul fast float %t7, %t7
  %t9 = fmul fast float %t8, %t8
  %t10 = fmul fast float %t9, %t9
  %t11 = fmul fast float %t10, %t10
  %t12 = fmul fast float %t11, %t11
  %t13 = fmul fast float %t12, %t12
  %t14 = fmul fast float %t13, %t13
  %t15 = fmul fast float %t14, %t14
  %t16 = fmul fast float %t15, %t15
  %t17 = fmul fast float %t16, %t16
  %t18 = fmul fast float %t17, %t17
  %t19 = fmul fast float %t18, %t18
  %t20 = fmul fast float %t19, %t19
  %t21 = fmul fast float %t20, %t20
  %t22 = fmul fast float %t21, %t21
  %t23 = fmul fast float %t22, %t22
  %t24 = fmul fast float %t23, %t23
  %t25 = fmul fast float %t24, %t24
  %t26 = fmul fast float %t25, %t25
  %t27 = fmul fast float %t26, %t26
  %t28 = fmul fast float %t27, %t27
  ret float %t28
}

define i8 @f2(i8 %a) {
  %2 = mul i8 %a, %a
  %3 = mul i8 %2, %2
  %4 = mul i8 %3, %3
  %5 = mul i8 %4, %4
  %6 = mul i8 %5, %5
  %7 = mul i8 %6, %6
  %8 = mul i8 %7, %7
  %9 = mul i8 %8, %8
  %10 = mul i8 %9, %9
  %11 = mul i8 %10, %10
  %12 = mul i8 %11, %11
  %13 = mul i8 %12, %12
  %14 = mul i8 %13, %13
  %15 = mul i8 %14, %14
  %16 = mul i8 %15, %15
  %17 = mul i8 %16, %16
  %18 = mul i8 %17, %17
  %19 = mul i8 %18, %18
  %20 = mul i8 %19, %19
  %21 = mul i8 %20, %20
  %22 = mul i8 %21, %21
  %23 = mul i8 %22, %22
  %24 = mul i8 %23, %23
  %25 = mul i8 %24, %24
  %26 = mul i8 %25, %25
  %27 = mul i8 %26, %26
  %28 = mul i8 %27, %27
  %29 = mul i8 %28, %28
  %30 = mul i8 %29, %29
  %31 = mul i8 %30, %30
  ret i8 %31
}

@dtcxzyw
Copy link
Member Author

dtcxzyw commented May 28, 2024

Ping @nikic

@antoniofrighetto
Copy link
Contributor

Still not completely clear to me what the original benefits of d7aeefe were meant to be, but the fix looks good to me. I confirm too the infinite loop of the original patch doesn't seem to come back.

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dtcxzyw dtcxzyw merged commit 3bcccb6 into llvm:main May 29, 2024
6 checks passed
@dtcxzyw dtcxzyw deleted the perf/simplify-weight-merge branch May 29, 2024 10:09
vg0204 pushed a commit to vg0204/llvm-project that referenced this pull request May 29, 2024
See the following case: https://alive2.llvm.org/ce/z/A-fBki
```
define i3 @src(i3 %0) {
  %2 = mul i3 %0, %0
  %3 = mul i3 %2, %0
  %4 = mul i3 %3, %0
  %5 = mul nsw i3 %4, %0
  ret i3 %5
}

define i3 @tgt(i3 %0) {
  %2 = mul i3 %0, %0
  %5 = mul nsw i3 %2, %0
  ret i3 %5
}
```


llvm@d7aeefe
introduced weight reduction during weights combination of the same
operand. As the weight of `%0` changes from 5 to 3, the nsw flag in `%5`
should be dropped.

However, the nsw flag isn't cleared by `RewriteExprTree` since `%5 = mul
nsw i3 %0, %4` is not included in the range of `[ExpressionChangedStart,
ExpressionChangedEnd)`.
```
Calculated Rank[] = 3
Combine negations for:   %2 = mul i3 %0, %0
Calculated Rank[] = 4
Combine negations for:   %3 = mul i3 %0, %2
Calculated Rank[] = 5
Combine negations for:   %4 = mul i3 %0, %3
Calculated Rank[] = 6
Combine negations for:   %5 = mul nsw i3 %0, %4
LINEARIZE:   %5 = mul nsw i3 %0, %4
OPERAND: i3 %0 (1)
ADD USES LEAF: i3 %0 (1)
OPERAND:   %4 = mul i3 %0, %3 (1)
DIRECT ADD:   %4 = mul i3 %0, %3 (1)
OPERAND: i3 %0 (1)
OPERAND:   %3 = mul i3 %0, %2 (1)
DIRECT ADD:   %3 = mul i3 %0, %2 (1)
OPERAND: i3 %0 (1)
OPERAND:   %2 = mul i3 %0, %0 (1)
DIRECT ADD:   %2 = mul i3 %0, %0 (1)
OPERAND: i3 %0 (1)
OPERAND: i3 %0 (1)
RAIn:   mul i3  [ %0, llvm#3] [ %0, llvm#3] [ %0, llvm#3] 
RAOut:  mul i3  [ %0, llvm#3] [ %0, llvm#3] [ %0, llvm#3] 
RAOut after CSE reorder:        mul i3  [ %0, llvm#3] [ %0, llvm#3] [ %0, llvm#3] 
RA:   %5 = mul nsw i3 %0, %4
TO:   %5 = mul nsw i3 %4, %0
RA:   %4 = mul i3 %0, %3
TO:   %4 = mul i3 %0, %0
```

The best way to fix this is to inform `RewriteExprTree` to clear flags
of the whole expr tree when weight reduction happens.

But I find that weight reduction based on Carmichael number never
happens in practice.
See the coverage result
https://dtcxzyw.github.io/llvm-opt-benchmark/coverage/home/dtcxzyw/llvm-project/llvm/lib/Transforms/Scalar/Reassociate.cpp.html#L323

I think it would be better to drop `IncorporateWeight`.

Fixes llvm#91417
@jplehr
Copy link
Contributor

jplehr commented May 29, 2024

@dtcxzyw
Copy link
Member Author

dtcxzyw commented May 29, 2024

@TomWeaver18
Copy link
Contributor

Hello and good afternoon from the UK! It seems like this change has caused several more build bot failures here:
https://lab.llvm.org/buildbot/#/builders/216/builds/39799
https://lab.llvm.org/buildbot/#/builders/247/builds/19084
https://lab.llvm.org/buildbot/#/builders/139/builds/66429

These have been failing for well over 3 hours now, is there a plan to potentially revert this change to get the build bots clear?

@dtcxzyw
Copy link
Member Author

dtcxzyw commented May 29, 2024

, is there a plan to potentially revert this change to get the build bots clear?

No. I will fix it.

@TomWeaver18
Copy link
Contributor

Many thanks! I look forward to the fix coming in 👍

@dtcxzyw
Copy link
Member Author

dtcxzyw commented May 29, 2024

Should be fixed by 9a28272. Thank you! @nikic

@TomWeaver18
Copy link
Contributor

@dtcxzyw thanks for the fix! :)

@jplehr
Copy link
Contributor

jplehr commented May 29, 2024

Thanks, our bots are green again. :)

; CHECK-LABEL: @idempotent(
; CHECK-LABEL: define i2 @idempotent(
; CHECK-SAME: i2 [[X:%.*]]) {
; CHECK-NEXT: ret i2 -1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this change is not correct. should be ret %x

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will take a look. Thank you!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The repeat count overflows here :(

using RepeatedValue = std::pair<Value*, APInt>;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we are using finite precision arithmetic, and the APInt sum LHS + RHS may not be correct if it wraps (it is correct for nilpotent operations and addition, but not for idempotent operations and multiplication)

We need to rework this patch.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nikic Do you think this fix works? It will break iff there are more than 2^32 instructions in the IR. Anyway, I don't think it makes sense to choose i8 as type of the repeat count for fmul with fp8.

diff --git a/llvm/lib/Transforms/Scalar/Reassociate.cpp b/llvm/lib/Transforms/Scalar/Reassociate.cpp
index 04c54ed69e93..6e547b558316 100644
--- a/llvm/lib/Transforms/Scalar/Reassociate.cpp
+++ b/llvm/lib/Transforms/Scalar/Reassociate.cpp
@@ -384,8 +384,8 @@ static bool LinearizeExprTree(Instruction *I,
   assert((isa<UnaryOperator>(I) || isa<BinaryOperator>(I)) &&
          "Expected a UnaryOperator or BinaryOperator!");
   LLVM_DEBUG(dbgs() << "LINEARIZE: " << *I << '\n');
-  unsigned Bitwidth = I->getType()->getScalarType()->getPrimitiveSizeInBits();
   unsigned Opcode = I->getOpcode();
+  unsigned Bitwidth = (Instruction::isIdempotent(Opcode) || Opcode == Instruction::Mul || Opcode == Instruction::FMul) ? 32 : I->getType()->getScalarSizeInBits();
   assert(I->isAssociative() && I->isCommutative() &&
          "Expected an associative and commutative operation!");
 
diff --git a/llvm/test/Transforms/Reassociate/reassoc_bool_vec.ll b/llvm/test/Transforms/Reassociate/reassoc_bool_vec.ll
index bd0060cc5abb..d4aa5c507ec8 100644
--- a/llvm/test/Transforms/Reassociate/reassoc_bool_vec.ll
+++ b/llvm/test/Transforms/Reassociate/reassoc_bool_vec.ll
@@ -56,19 +56,20 @@ define <8 x i1> @vector2(<8 x i1> %a, <8 x i1> %b0, <8 x i1> %b1, <8 x i1> %b2,
 ; CHECK-NEXT:    [[OR5:%.*]] = or <8 x i1> [[B5]], [[A]]
 ; CHECK-NEXT:    [[OR6:%.*]] = or <8 x i1> [[B6]], [[A]]
 ; CHECK-NEXT:    [[OR7:%.*]] = or <8 x i1> [[B7]], [[A]]
-; CHECK-NEXT:    [[XOR0:%.*]] = xor <8 x i1> [[OR1]], [[OR0]]
-; CHECK-NEXT:    [[XOR2:%.*]] = xor <8 x i1> [[XOR0]], [[OR2]]
-; CHECK-NEXT:    [[OR045:%.*]] = xor <8 x i1> [[XOR2]], [[OR3]]
-; CHECK-NEXT:    [[XOR3:%.*]] = xor <8 x i1> [[OR045]], [[OR4]]
-; CHECK-NEXT:    [[XOR4:%.*]] = xor <8 x i1> [[XOR3]], [[OR5]]
-; CHECK-NEXT:    [[XOR5:%.*]] = xor <8 x i1> [[XOR4]], [[OR6]]
-; CHECK-NEXT:    [[XOR6:%.*]] = xor <8 x i1> [[XOR5]], [[OR7]]
+; CHECK-NEXT:    [[XOR2:%.*]] = xor <8 x i1> [[OR1]], [[OR0]]
+; CHECK-NEXT:    [[OR045:%.*]] = xor <8 x i1> [[XOR2]], [[OR2]]
+; CHECK-NEXT:    [[XOR3:%.*]] = xor <8 x i1> [[OR045]], [[OR3]]
+; CHECK-NEXT:    [[XOR4:%.*]] = xor <8 x i1> [[XOR3]], [[OR4]]
+; CHECK-NEXT:    [[XOR5:%.*]] = xor <8 x i1> [[XOR4]], [[OR5]]
+; CHECK-NEXT:    [[XOR6:%.*]] = xor <8 x i1> [[XOR5]], [[OR6]]
+; CHECK-NEXT:    [[XOR7:%.*]] = xor <8 x i1> [[XOR6]], [[OR7]]
 ; CHECK-NEXT:    [[OR4560:%.*]] = or <8 x i1> [[OR045]], [[XOR2]]
 ; CHECK-NEXT:    [[OR023:%.*]] = or <8 x i1> [[OR4560]], [[XOR3]]
 ; CHECK-NEXT:    [[OR001:%.*]] = or <8 x i1> [[OR023]], [[XOR4]]
 ; CHECK-NEXT:    [[OR0123:%.*]] = or <8 x i1> [[OR001]], [[XOR5]]
 ; CHECK-NEXT:    [[OR01234567:%.*]] = or <8 x i1> [[OR0123]], [[XOR6]]
-; CHECK-NEXT:    ret <8 x i1> [[OR01234567]]
+; CHECK-NEXT:    [[OR1234567:%.*]] = or <8 x i1> [[OR01234567]], [[XOR7]]
+; CHECK-NEXT:    ret <8 x i1> [[OR1234567]]
 ;
   %or0 = or <8 x i1> %b0, %a
   %or1 = or <8 x i1> %b1, %a
diff --git a/llvm/test/Transforms/Reassociate/repeats.ll b/llvm/test/Transforms/Reassociate/repeats.ll
index 28177f1c0ba5..7aa2b8a43213 100644
--- a/llvm/test/Transforms/Reassociate/repeats.ll
+++ b/llvm/test/Transforms/Reassociate/repeats.ll
@@ -15,7 +15,7 @@ define i8 @nilpotent(i8 %x) {
 define i2 @idempotent(i2 %x) {
 ; CHECK-LABEL: define i2 @idempotent(
 ; CHECK-SAME: i2 [[X:%.*]]) {
-; CHECK-NEXT:    ret i2 -1
+; CHECK-NEXT:    ret i2 [[X]]
 ;
   %tmp1 = and i2 %x, %x
   %tmp2 = and i2 %tmp1, %x

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dtcxzyw Can you please revert this patch, pre-commit the regeneration of the check lines and then file a new PR?

Can we replace with APInt with something like uint64_t plus an explicit bailout on overflow?

dtcxzyw added a commit to dtcxzyw/llvm-project that referenced this pull request Jun 3, 2024
dtcxzyw added a commit that referenced this pull request Jun 3, 2024
dtcxzyw added a commit that referenced this pull request Jun 8, 2024
This patch relands #91469 and uses `uint64_t` for repeat count to avoid
a miscompilation caused by overflow
#91469 (comment).
nekoshirro pushed a commit to nekoshirro/Alchemist-LLVM that referenced this pull request Jun 9, 2024
This patch relands llvm#91469 and uses `uint64_t` for repeat count to avoid
a miscompilation caused by overflow
llvm#91469 (comment).

Signed-off-by: Hafidz Muzakky <ais.muzakky@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

incorrect reassociation at low bitwidth
8 participants