Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[InstCombine] Missed optimization : fold X > C2 ? X + C1 : C2 + C1 to max(X, C2) + C1 #82414

Closed
XChy opened this issue Feb 20, 2024 · 8 comments · Fixed by #116888
Closed

[InstCombine] Missed optimization : fold X > C2 ? X + C1 : C2 + C1 to max(X, C2) + C1 #82414

XChy opened this issue Feb 20, 2024 · 8 comments · Fixed by #116888

Comments

@XChy
Copy link
Member

XChy commented Feb 20, 2024

Alive2 proof: https://alive2.llvm.org/ce/z/ERjNs4

Motivating example

define i32 @src(i32 %x) {
  %add = add nsw i32 %x, 16
  %cmp = icmp sgt i32 %x, 1008
  %s = select i1 %cmp, i32 %add, i32 1024
  ret i32 %s
}

can be folded to:

define i32 @tgt(i32 %x) {
  %smax = call i32 @llvm.smax.i32(i32 %x, i32 1008)
  %add = add nuw nsw i32 %smax, 16
  ret i32 %add
}

LLVM does well when C1 or C2 is not constant, but when both are constants, LLVM missed it. Though this example doesn't show better codegen, I think it's a better canonicalization.

Real-world motivation

This snippet of IR is derived from protobuf/generated_message_tctable_lite.cc after O3 pipeline (original IR is from llvm-opt-benchmark).
Original IR is too big to attach here, email me to get it please.

Let me know if you can confirm that it's an optimization opportunity, thanks.

@nikic
Copy link
Contributor

nikic commented Feb 20, 2024

Does this produce better optimizations in the surrounding context? I'm not sure this transform is worthwhile.

@XChy
Copy link
Member Author

XChy commented Feb 20, 2024

Does this produce better optimizations in the surrounding context? I'm not sure this transform is worthwhile.

No, just as this example shows. I don't insist on it since it doesn't bring better codegen..

@XChy
Copy link
Member Author

XChy commented Feb 20, 2024

Detected a similar but more complex pattern just now: https://alive2.llvm.org/ce/z/pTzsqM
This one produces better optimizations in the surrounding context, see also: https://godbolt.org/z/9d5n7o1er

@pinskia
Copy link

pinskia commented Feb 20, 2024

Does this produce better optimizations in the surrounding context? I'm not sure this transform is worthwhile.

On RISCV, it will allow to use the smax instruction even for the original simplified example instead of the more complex code.

@dtcxzyw
Copy link
Member

dtcxzyw commented Feb 21, 2024

Does this produce better optimizations in the surrounding context? I'm not sure this transform is worthwhile.

On RISCV, it will allow to use the smax instruction even for the original simplified example instead of the more complex code.

It depends on the materialization cost for the immediates.

@XChy
Copy link
Member Author

XChy commented Mar 19, 2024

Detect better optimization in the surrounding context, mainly when the select is used multiple times.
An example: https://alive2.llvm.org/ce/z/7Hb43S , which eliminates an add instruction.

@nikic
Copy link
Contributor

nikic commented Apr 13, 2024

I believe rust-lang/rust#123845 is another motivating case for this. That one is smin with multi-use condition and one-use add.

@veera-sivarajan
Copy link
Contributor

I'd like to work on this.

@nikic nikic closed this as completed in 979a035 Dec 2, 2024
TIFitis pushed a commit to TIFitis/llvm-project that referenced this issue Dec 18, 2024
…C2) BOp C1` (llvm#116888)

Fixes llvm#82414.

General Proof: https://alive2.llvm.org/ce/z/ERjNs4 
Proof for Tests: https://alive2.llvm.org/ce/z/K-934G

This PR transforms `select` instructions of the form `select (Cmp X C1)
(BOp X C2) C3` to `BOp (min/max X C1) C2` iff `C3 == BOp C1 C2`.

This helps in eliminating a noop loop in
rust-lang/rust#123845 but does not improve
optimizations.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants