Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[InstCombine] Canonicalize the fcmp range check idiom into fabs + fcmp #76367

Merged
merged 5 commits into from
Feb 6, 2024

Conversation

dtcxzyw
Copy link
Member

@dtcxzyw dtcxzyw commented Dec 25, 2023

This patch canonicalizes the fcmp range check idiom into fabs + fcmp since the canonicalized form is better than the original form for the backends.
Godbolt: https://godbolt.org/z/x3eqPb1fz

and (fcmp olt/ole/ult/ule x, C), (fcmp ogt/oge/ugt/uge x, -C) --> fabs(x) olt/ole/ult/ule C
or  (fcmp ogt/oge/ugt/uge x, C), (fcmp olt/ole/ult/ule x, -C) --> fabs(x) ogt/oge/ugt/uge C

Alive2: https://alive2.llvm.org/ce/z/MRtoYq

@llvmbot
Copy link
Member

llvmbot commented Dec 25, 2023

@llvm/pr-subscribers-llvm-transforms

Author: Yingwei Zheng (dtcxzyw)

Changes

This patch canonicalizes the fcmp range check idiom into fabs + fcmp since the canonicalized form is better than the original form for the backends.
Godbolt: https://godbolt.org/z/x3eqPb1fz

and (fcmp olt/ole/ult/ule x, C), (fcmp ogt/oge/ugt/uge x, -C) --> fabs(x) olt/ole/ult/ule C
or  (fcmp ogt/oge/ugt/uge x, C), (fcmp olt/ole/ult/ule x, -C) --> fabs(x) ogt/oge/ugt/uge C

Alive2: https://alive2.llvm.org/ce/z/MRtoYq


Full diff: https://github.com/llvm/llvm-project/pull/76367.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp (+26)
  • (added) llvm/test/Transforms/InstCombine/fcmp-range-check-idiom.ll (+279)
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp b/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
index 5e362f4117d051..689ee6eaabb4b9 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
@@ -1457,6 +1457,32 @@ Value *InstCombinerImpl::foldLogicOfFCmps(FCmpInst *LHS, FCmpInst *RHS,
     }
   }
 
+  // Canonicalize the range check idiom:
+  // and (fcmp olt/ole/ult/ule x, C), (fcmp ogt/oge/ugt/uge x, -C)
+  // --> fabs(x) olt/ole/ult/ule C
+  // or  (fcmp ogt/oge/ugt/uge x, C), (fcmp olt/ole/ult/ule x, -C)
+  // --> fabs(x) ogt/oge/ugt/uge C
+  // TODO: Generalize to handle a negated variable operand?
+  const APFloat *LHSC, *RHSC;
+  if (LHS->hasOneUse() && RHS->hasOneUse() && LHS0 == RHS0 &&
+      FCmpInst::getSwappedPredicate(PredL) == PredR &&
+      match(LHS1, m_APFloatAllowUndef(LHSC)) &&
+      match(RHS1, m_APFloatAllowUndef(RHSC)) &&
+      LHSC->bitwiseIsEqual(neg(*RHSC))) {
+    auto IsLessThanOrLessEqual = [](FCmpInst::Predicate Pred) {
+      return (getFCmpCode(Pred) & 0b0110) == 0b0100;
+    };
+    if (IsLessThanOrLessEqual(IsAnd ? PredR : PredL)) {
+      std::swap(LHSC, RHSC);
+      std::swap(PredL, PredR);
+    }
+    if (IsLessThanOrLessEqual(IsAnd ? PredL : PredR)) {
+      Value *FAbs = Builder.CreateUnaryIntrinsic(Intrinsic::fabs, LHS0);
+      return Builder.CreateFCmp(PredL, FAbs,
+                                ConstantFP::get(LHS0->getType(), *LHSC));
+    }
+  }
+
   return nullptr;
 }
 
diff --git a/llvm/test/Transforms/InstCombine/fcmp-range-check-idiom.ll b/llvm/test/Transforms/InstCombine/fcmp-range-check-idiom.ll
new file mode 100644
index 00000000000000..06e5ca05ff1501
--- /dev/null
+++ b/llvm/test/Transforms/InstCombine/fcmp-range-check-idiom.ll
@@ -0,0 +1,279 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 4
+; RUN: opt -S -passes=instcombine %s | FileCheck %s
+
+declare void @use(i1)
+
+define i1 @test_and_olt(float %x) {
+; CHECK-LABEL: define i1 @test_and_olt(
+; CHECK-SAME: float [[X:%.*]]) {
+; CHECK-NEXT:    [[TMP1:%.*]] = call float @llvm.fabs.f32(float [[X]])
+; CHECK-NEXT:    [[COND:%.*]] = fcmp olt float [[TMP1]], 0x3C00000000000000
+; CHECK-NEXT:    ret i1 [[COND]]
+;
+  %cmp1 = fcmp olt float %x, 0x3C00000000000000
+  %cmp2 = fcmp ogt float %x, 0xBC00000000000000
+  %cond = and i1 %cmp1, %cmp2
+  ret i1 %cond
+}
+define i1 @test_and_ole(float %x) {
+; CHECK-LABEL: define i1 @test_and_ole(
+; CHECK-SAME: float [[X:%.*]]) {
+; CHECK-NEXT:    [[TMP1:%.*]] = call float @llvm.fabs.f32(float [[X]])
+; CHECK-NEXT:    [[COND:%.*]] = fcmp ole float [[TMP1]], 0x3C00000000000000
+; CHECK-NEXT:    ret i1 [[COND]]
+;
+  %cmp1 = fcmp ole float %x, 0x3C00000000000000
+  %cmp2 = fcmp oge float %x, 0xBC00000000000000
+  %cond = and i1 %cmp1, %cmp2
+  ret i1 %cond
+}
+define i1 @test_or_ogt(float %x) {
+; CHECK-LABEL: define i1 @test_or_ogt(
+; CHECK-SAME: float [[X:%.*]]) {
+; CHECK-NEXT:    [[TMP1:%.*]] = call float @llvm.fabs.f32(float [[X]])
+; CHECK-NEXT:    [[COND:%.*]] = fcmp ogt float [[TMP1]], 0x3C00000000000000
+; CHECK-NEXT:    ret i1 [[COND]]
+;
+  %cmp1 = fcmp ogt float %x, 0x3C00000000000000
+  %cmp2 = fcmp olt float %x, 0xBC00000000000000
+  %cond = or i1 %cmp1, %cmp2
+  ret i1 %cond
+}
+define i1 @test_or_oge(float %x) {
+; CHECK-LABEL: define i1 @test_or_oge(
+; CHECK-SAME: float [[X:%.*]]) {
+; CHECK-NEXT:    [[TMP1:%.*]] = call float @llvm.fabs.f32(float [[X]])
+; CHECK-NEXT:    [[COND:%.*]] = fcmp oge float [[TMP1]], 0x3C00000000000000
+; CHECK-NEXT:    ret i1 [[COND]]
+;
+  %cmp1 = fcmp oge float %x, 0x3C00000000000000
+  %cmp2 = fcmp ole float %x, 0xBC00000000000000
+  %cond = or i1 %cmp1, %cmp2
+  ret i1 %cond
+}
+define i1 @test_and_ult(float %x) {
+; CHECK-LABEL: define i1 @test_and_ult(
+; CHECK-SAME: float [[X:%.*]]) {
+; CHECK-NEXT:    [[TMP1:%.*]] = call float @llvm.fabs.f32(float [[X]])
+; CHECK-NEXT:    [[COND:%.*]] = fcmp ult float [[TMP1]], 0x3C00000000000000
+; CHECK-NEXT:    ret i1 [[COND]]
+;
+  %cmp1 = fcmp ult float %x, 0x3C00000000000000
+  %cmp2 = fcmp ugt float %x, 0xBC00000000000000
+  %cond = and i1 %cmp1, %cmp2
+  ret i1 %cond
+}
+define i1 @test_and_ule(float %x) {
+; CHECK-LABEL: define i1 @test_and_ule(
+; CHECK-SAME: float [[X:%.*]]) {
+; CHECK-NEXT:    [[TMP1:%.*]] = call float @llvm.fabs.f32(float [[X]])
+; CHECK-NEXT:    [[COND:%.*]] = fcmp ule float [[TMP1]], 0x3C00000000000000
+; CHECK-NEXT:    ret i1 [[COND]]
+;
+  %cmp1 = fcmp ule float %x, 0x3C00000000000000
+  %cmp2 = fcmp uge float %x, 0xBC00000000000000
+  %cond = and i1 %cmp1, %cmp2
+  ret i1 %cond
+}
+define i1 @test_or_ugt(float %x) {
+; CHECK-LABEL: define i1 @test_or_ugt(
+; CHECK-SAME: float [[X:%.*]]) {
+; CHECK-NEXT:    [[TMP1:%.*]] = call float @llvm.fabs.f32(float [[X]])
+; CHECK-NEXT:    [[COND:%.*]] = fcmp ugt float [[TMP1]], 0x3C00000000000000
+; CHECK-NEXT:    ret i1 [[COND]]
+;
+  %cmp1 = fcmp ugt float %x, 0x3C00000000000000
+  %cmp2 = fcmp ult float %x, 0xBC00000000000000
+  %cond = or i1 %cmp1, %cmp2
+  ret i1 %cond
+}
+define i1 @test_or_uge(float %x) {
+; CHECK-LABEL: define i1 @test_or_uge(
+; CHECK-SAME: float [[X:%.*]]) {
+; CHECK-NEXT:    [[TMP1:%.*]] = call float @llvm.fabs.f32(float [[X]])
+; CHECK-NEXT:    [[COND:%.*]] = fcmp uge float [[TMP1]], 0x3C00000000000000
+; CHECK-NEXT:    ret i1 [[COND]]
+;
+  %cmp1 = fcmp uge float %x, 0x3C00000000000000
+  %cmp2 = fcmp ule float %x, 0xBC00000000000000
+  %cond = or i1 %cmp1, %cmp2
+  ret i1 %cond
+}
+define i1 @test_and_olt_commuted(float %x) {
+; CHECK-LABEL: define i1 @test_and_olt_commuted(
+; CHECK-SAME: float [[X:%.*]]) {
+; CHECK-NEXT:    [[TMP1:%.*]] = call float @llvm.fabs.f32(float [[X]])
+; CHECK-NEXT:    [[COND:%.*]] = fcmp olt float [[TMP1]], 0x3C00000000000000
+; CHECK-NEXT:    ret i1 [[COND]]
+;
+  %cmp1 = fcmp olt float %x, 0x3C00000000000000
+  %cmp2 = fcmp ogt float %x, 0xBC00000000000000
+  %cond = and i1 %cmp2, %cmp1
+  ret i1 %cond
+}
+define i1 @test_and_olt_subnormal(float %x) {
+; CHECK-LABEL: define i1 @test_and_olt_subnormal(
+; CHECK-SAME: float [[X:%.*]]) {
+; CHECK-NEXT:    [[TMP1:%.*]] = call float @llvm.fabs.f32(float [[X]])
+; CHECK-NEXT:    [[COND:%.*]] = fcmp olt float [[TMP1]], 0x36A0000000000000
+; CHECK-NEXT:    ret i1 [[COND]]
+;
+  %cmp1 = fcmp olt float %x, 0x36A0000000000000
+  %cmp2 = fcmp ogt float %x, 0xB6A0000000000000
+  %cond = and i1 %cmp1, %cmp2
+  ret i1 %cond
+}
+define i1 @test_and_olt_infinity(float %x) {
+; CHECK-LABEL: define i1 @test_and_olt_infinity(
+; CHECK-SAME: float [[X:%.*]]) {
+; CHECK-NEXT:    [[TMP1:%.*]] = call float @llvm.fabs.f32(float [[X]])
+; CHECK-NEXT:    [[COND:%.*]] = fcmp one float [[TMP1]], 0x7FF0000000000000
+; CHECK-NEXT:    ret i1 [[COND]]
+;
+  %cmp1 = fcmp olt float %x, 0x7FF0000000000000
+  %cmp2 = fcmp ogt float %x, 0xFFF0000000000000
+  %cond = and i1 %cmp1, %cmp2
+  ret i1 %cond
+}
+define i1 @test_and_olt_zero(float %x) {
+; CHECK-LABEL: define i1 @test_and_olt_zero(
+; CHECK-SAME: float [[X:%.*]]) {
+; CHECK-NEXT:    ret i1 false
+;
+  %cmp1 = fcmp olt float %x, 0x0000000000000000
+  %cmp2 = fcmp ogt float %x, 0x8000000000000000
+  %cond = and i1 %cmp1, %cmp2
+  ret i1 %cond
+}
+define i1 @test_and_ole_zero(float %x) {
+; CHECK-LABEL: define i1 @test_and_ole_zero(
+; CHECK-SAME: float [[X:%.*]]) {
+; CHECK-NEXT:    [[COND:%.*]] = fcmp oeq float [[X]], 0.000000e+00
+; CHECK-NEXT:    ret i1 [[COND]]
+;
+  %cmp1 = fcmp ole float %x, 0x0000000000000000
+  %cmp2 = fcmp oge float %x, 0x8000000000000000
+  %cond = and i1 %cmp1, %cmp2
+  ret i1 %cond
+}
+define i1 @test_and_olt_logical(float %x) {
+; CHECK-LABEL: define i1 @test_and_olt_logical(
+; CHECK-SAME: float [[X:%.*]]) {
+; CHECK-NEXT:    [[TMP1:%.*]] = call float @llvm.fabs.f32(float [[X]])
+; CHECK-NEXT:    [[COND:%.*]] = fcmp olt float [[TMP1]], 0x3C00000000000000
+; CHECK-NEXT:    ret i1 [[COND]]
+;
+  %cmp1 = fcmp olt float %x, 0x3C00000000000000
+  %cmp2 = fcmp ogt float %x, 0xBC00000000000000
+  %cond = select i1 %cmp1, i1 %cmp2, i1 false
+  ret i1 %cond
+}
+define <2 x i1> @test_and_olt_undef(<2 x float> %x) {
+; CHECK-LABEL: define <2 x i1> @test_and_olt_undef(
+; CHECK-SAME: <2 x float> [[X:%.*]]) {
+; CHECK-NEXT:    [[TMP1:%.*]] = call <2 x float> @llvm.fabs.v2f32(<2 x float> [[X]])
+; CHECK-NEXT:    [[COND:%.*]] = fcmp olt <2 x float> [[TMP1]], <float 0x3C00000000000000, float 0x3C00000000000000>
+; CHECK-NEXT:    ret <2 x i1> [[COND]]
+;
+  %cmp1 = fcmp olt <2 x float> %x, <float 0x3C00000000000000, float undef>
+  %cmp2 = fcmp ogt <2 x float> %x, <float 0xBC00000000000000, float undef>
+  %cond = and <2 x i1> %cmp1, %cmp2
+  ret <2 x i1> %cond
+}
+define i1 @test_and_olt_nan(float %x) {
+; CHECK-LABEL: define i1 @test_and_olt_nan(
+; CHECK-SAME: float [[X:%.*]]) {
+; CHECK-NEXT:    ret i1 false
+;
+  %cmp1 = fcmp olt float %x, 0x7FF8000000000000
+  %cmp2 = fcmp ogt float %x, 0xFFF8000000000000
+  %cond = and i1 %cmp1, %cmp2
+  ret i1 %cond
+}
+define i1 @test_and_ogt(float %x) {
+; CHECK-LABEL: define i1 @test_and_ogt(
+; CHECK-SAME: float [[X:%.*]]) {
+; CHECK-NEXT:    ret i1 false
+;
+  %cmp1 = fcmp ogt float %x, 0x3C00000000000000
+  %cmp2 = fcmp olt float %x, 0xBC00000000000000
+  %cond = and i1 %cmp1, %cmp2
+  ret i1 %cond
+}
+define i1 @test_or_olt(float %x) {
+; CHECK-LABEL: define i1 @test_or_olt(
+; CHECK-SAME: float [[X:%.*]]) {
+; CHECK-NEXT:    [[TMP1:%.*]] = call float @llvm.fabs.f32(float [[X]])
+; CHECK-NEXT:    [[COND:%.*]] = fcmp ogt float [[TMP1]], 0xBC00000000000000
+; CHECK-NEXT:    ret i1 [[COND]]
+;
+  %cmp1 = fcmp olt float %x, 0x3C00000000000000
+  %cmp2 = fcmp ogt float %x, 0xBC00000000000000
+  %cond = or i1 %cmp1, %cmp2
+  ret i1 %cond
+}
+; Negative tests
+define i1 @test_and_olt_multiuse(float %x) {
+; CHECK-LABEL: define i1 @test_and_olt_multiuse(
+; CHECK-SAME: float [[X:%.*]]) {
+; CHECK-NEXT:    [[CMP1:%.*]] = fcmp olt float [[X]], 0x3C00000000000000
+; CHECK-NEXT:    call void @use(i1 [[CMP1]])
+; CHECK-NEXT:    [[CMP2:%.*]] = fcmp ogt float [[X]], 0xBC00000000000000
+; CHECK-NEXT:    [[COND:%.*]] = and i1 [[CMP1]], [[CMP2]]
+; CHECK-NEXT:    ret i1 [[COND]]
+;
+  %cmp1 = fcmp olt float %x, 0x3C00000000000000
+  call void @use(i1 %cmp1)
+  %cmp2 = fcmp ogt float %x, 0xBC00000000000000
+  %cond = and i1 %cmp1, %cmp2
+  ret i1 %cond
+}
+define i1 @test_and_olt_mismatched_lhs(float %x, float %y) {
+; CHECK-LABEL: define i1 @test_and_olt_mismatched_lhs(
+; CHECK-SAME: float [[X:%.*]], float [[Y:%.*]]) {
+; CHECK-NEXT:    [[CMP1:%.*]] = fcmp olt float [[X]], 0x3C00000000000000
+; CHECK-NEXT:    [[CMP2:%.*]] = fcmp ogt float [[Y]], 0xBC00000000000000
+; CHECK-NEXT:    [[COND:%.*]] = and i1 [[CMP1]], [[CMP2]]
+; CHECK-NEXT:    ret i1 [[COND]]
+;
+  %cmp1 = fcmp olt float %x, 0x3C00000000000000
+  %cmp2 = fcmp ogt float %y, 0xBC00000000000000
+  %cond = and i1 %cmp1, %cmp2
+  ret i1 %cond
+}
+define i1 @test_and_olt_same_sign(float %x) {
+; CHECK-LABEL: define i1 @test_and_olt_same_sign(
+; CHECK-SAME: float [[X:%.*]]) {
+; CHECK-NEXT:    ret i1 false
+;
+  %cmp1 = fcmp olt float %x, 0x3C00000000000000
+  %cmp2 = fcmp ogt float %x, 0x3C00000000000000
+  %cond = and i1 %cmp1, %cmp2
+  ret i1 %cond
+}
+define i1 @test_and_olt_mismatched_mag(float %x) {
+; CHECK-LABEL: define i1 @test_and_olt_mismatched_mag(
+; CHECK-SAME: float [[X:%.*]]) {
+; CHECK-NEXT:    [[CMP1:%.*]] = fcmp olt float [[X]], 0x3C80000000000000
+; CHECK-NEXT:    [[CMP2:%.*]] = fcmp ogt float [[X]], 0xBC00000000000000
+; CHECK-NEXT:    [[COND:%.*]] = and i1 [[CMP1]], [[CMP2]]
+; CHECK-NEXT:    ret i1 [[COND]]
+;
+  %cmp1 = fcmp olt float %x, 0x3C80000000000000
+  %cmp2 = fcmp ogt float %x, 0xBC00000000000000
+  %cond = and i1 %cmp1, %cmp2
+  ret i1 %cond
+}
+define i1 @test_and_olt_wrong_pred2(float %x) {
+; CHECK-LABEL: define i1 @test_and_olt_wrong_pred2(
+; CHECK-SAME: float [[X:%.*]]) {
+; CHECK-NEXT:    [[CMP1:%.*]] = fcmp olt float [[X]], 0x3C00000000000000
+; CHECK-NEXT:    [[CMP2:%.*]] = fcmp oge float [[X]], 0xBC00000000000000
+; CHECK-NEXT:    [[COND:%.*]] = and i1 [[CMP1]], [[CMP2]]
+; CHECK-NEXT:    ret i1 [[COND]]
+;
+  %cmp1 = fcmp olt float %x, 0x3C00000000000000
+  %cmp2 = fcmp oge float %x, 0xBC00000000000000
+  %cond = and i1 %cmp1, %cmp2
+  ret i1 %cond
+}

dtcxzyw added a commit to dtcxzyw/llvm-opt-benchmark that referenced this pull request Dec 25, 2023
@tschuett
Copy link

May I assume that you do the same combine for integer range checks, i.e. and of icmps? Could you point me please to the combine?

@dtcxzyw
Copy link
Member Author

dtcxzyw commented Dec 25, 2023

May I assume that you do the same combine for integer range checks, i.e. and of icmps? Could you point me please to the combine?

I don't think they are the same combine. We always convert and/or of icmps into (add +) icmp.

/// Fold (icmp Pred1 V1, C1) & (icmp Pred2 V2, C2)
/// or (icmp Pred1 V1, C1) | (icmp Pred2 V2, C2)
/// into a single comparison using range-based reasoning.
/// NOTE: This is also used for logical and/or, must be poison-safe!
Value *InstCombinerImpl::foldAndOrOfICmpsUsingRanges(ICmpInst *ICmp1,
ICmpInst *ICmp2,
bool IsAnd) {
ICmpInst::Predicate Pred1, Pred2;
Value *V1, *V2;
const APInt *C1, *C2;
if (!match(ICmp1, m_ICmp(Pred1, m_Value(V1), m_APInt(C1))) ||
!match(ICmp2, m_ICmp(Pred2, m_Value(V2), m_APInt(C2))))
return nullptr;
// Look through add of a constant offset on V1, V2, or both operands. This
// allows us to interpret the V + C' < C'' range idiom into a proper range.
const APInt *Offset1 = nullptr, *Offset2 = nullptr;
if (V1 != V2) {
Value *X;
if (match(V1, m_Add(m_Value(X), m_APInt(Offset1))))
V1 = X;
if (match(V2, m_Add(m_Value(X), m_APInt(Offset2))))
V2 = X;
}
if (V1 != V2)
return nullptr;
ConstantRange CR1 = ConstantRange::makeExactICmpRegion(
IsAnd ? ICmpInst::getInversePredicate(Pred1) : Pred1, *C1);
if (Offset1)
CR1 = CR1.subtract(*Offset1);
ConstantRange CR2 = ConstantRange::makeExactICmpRegion(
IsAnd ? ICmpInst::getInversePredicate(Pred2) : Pred2, *C2);
if (Offset2)
CR2 = CR2.subtract(*Offset2);
Type *Ty = V1->getType();
Value *NewV = V1;
std::optional<ConstantRange> CR = CR1.exactUnionWith(CR2);
if (!CR) {
if (!(ICmp1->hasOneUse() && ICmp2->hasOneUse()) || CR1.isWrappedSet() ||
CR2.isWrappedSet())
return nullptr;
// Check whether we have equal-size ranges that only differ by one bit.
// In that case we can apply a mask to map one range onto the other.
APInt LowerDiff = CR1.getLower() ^ CR2.getLower();
APInt UpperDiff = (CR1.getUpper() - 1) ^ (CR2.getUpper() - 1);
APInt CR1Size = CR1.getUpper() - CR1.getLower();
if (!LowerDiff.isPowerOf2() || LowerDiff != UpperDiff ||
CR1Size != CR2.getUpper() - CR2.getLower())
return nullptr;
CR = CR1.getLower().ult(CR2.getLower()) ? CR1 : CR2;
NewV = Builder.CreateAnd(NewV, ConstantInt::get(Ty, ~LowerDiff));
}
if (IsAnd)
CR = CR->inverse();
CmpInst::Predicate NewPred;
APInt NewC, Offset;
CR->getEquivalentICmp(NewPred, NewC, Offset);
if (Offset != 0)
NewV = Builder.CreateAdd(NewV, ConstantInt::get(Ty, Offset));
return Builder.CreateICmp(NewPred, NewV, ConstantInt::get(Ty, NewC));
}

@tschuett
Copy link

Thanks!

@dtcxzyw dtcxzyw added the floating-point Floating-point math label Dec 25, 2023
@tschuett
Copy link

Do you need to pass FMF-flags from the original floating point operations to the new ones?

@nikic nikic removed their request for review January 2, 2024 12:01
@dtcxzyw dtcxzyw force-pushed the fcmp-range-check-idiom branch from d5850ff to 5a41406 Compare January 7, 2024 12:34
@dtcxzyw
Copy link
Member Author

dtcxzyw commented Jan 7, 2024

Do you need to pass FMF-flags from the original floating point operations to the new ones?

Done.

@dtcxzyw dtcxzyw requested a review from tschuett January 7, 2024 12:35
Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly lgtm with some nits

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp Outdated Show resolved Hide resolved
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp Outdated Show resolved Hide resolved

Value *FAbs = Builder.CreateUnaryIntrinsic(Intrinsic::fabs, LHS0);
return Builder.CreateFCmp(PredL, FAbs,
ConstantFP::get(LHS0->getType(), *LHSC));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like you could just pass the original value through, but I guess that doesn't work if the swap happened above

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PredL/R and LHSC/RHSC are also swapped.

@dtcxzyw dtcxzyw force-pushed the fcmp-range-check-idiom branch from 5a41406 to df2bfe7 Compare January 22, 2024 12:42
@dtcxzyw
Copy link
Member Author

dtcxzyw commented Feb 4, 2024

Ping.

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still lgtm with some nits

@dtcxzyw dtcxzyw force-pushed the fcmp-range-check-idiom branch from df2bfe7 to 11881d5 Compare February 5, 2024 18:09
@dtcxzyw dtcxzyw merged commit 4858e9c into llvm:main Feb 6, 2024
3 of 4 checks passed
@dtcxzyw dtcxzyw deleted the fcmp-range-check-idiom branch February 6, 2024 20:33
steven-johnson added a commit to halide/Halide that referenced this pull request Feb 7, 2024
The old definitions of bool_1, bool_2, bool_3 in simd_op_check_x86 (etc) all referred to the same entry in in_f32; as of llvm/llvm-project#76367, the LLVM optimizer is smart enough to realize that (eg) bool1 != bool2 by construction, and optimizes away the code that tests their conditions, such as the one for andps and orps. Initing them from different locations is enough to outsmart the compiler.

(bug was only noticed in the x86 test, but I updated the other tests to guard against future improvements there too.)
steven-johnson added a commit to halide/Halide that referenced this pull request Feb 7, 2024
The old definitions of bool_1, bool_2, bool_3 in simd_op_check_x86 (etc) all referred to the same entry in in_f32; as of llvm/llvm-project#76367, the LLVM optimizer is smart enough to realize that (eg) bool1 != bool2 by construction, and optimizes away the code that tests their conditions, such as the one for andps and orps. Initing them from different locations is enough to outsmart the compiler.

(bug was only noticed in the x86 test, but I updated the other tests to guard against future improvements there too.)
ardier pushed a commit to ardier/Halide-mutation that referenced this pull request Mar 3, 2024
The old definitions of bool_1, bool_2, bool_3 in simd_op_check_x86 (etc) all referred to the same entry in in_f32; as of llvm/llvm-project#76367, the LLVM optimizer is smart enough to realize that (eg) bool1 != bool2 by construction, and optimizes away the code that tests their conditions, such as the one for andps and orps. Initing them from different locations is enough to outsmart the compiler.

(bug was only noticed in the x86 test, but I updated the other tests to guard against future improvements there too.)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants