Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrapper generates more instructions for simple integer operations #119520

Closed
DaniPopes opened this issue Jan 2, 2024 · 8 comments · Fixed by #120268
Closed

Wrapper generates more instructions for simple integer operations #119520

DaniPopes opened this issue Jan 2, 2024 · 8 comments · Fixed by #120268
Assignees
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. I-slow Issue: Problems and improvements with respect to performance of generated code. P-medium Medium priority regression-from-stable-to-stable Performance or correctness regression from one stable version to another.

Comments

@DaniPopes
Copy link
Contributor

DaniPopes commented Jan 2, 2024

Code

I tried this code (godbolt):

#[derive(Clone, Copy, PartialEq, Eq, PartialOrd, Ord)]
pub struct Int(u32);

const A: Int = Int(201);
const B: Int = Int(270);
const C: Int = Int(153);

#[inline(never)]
#[rustfmt::skip]
pub fn wrapped(x: Int) -> bool {
    (x >= A && x <= B)
        || x == C
}

#[inline(never)]
#[rustfmt::skip]
pub fn primitive(x: Int) -> bool {
    (x.0 >= A.0 && x.0 <= B.0)
        || x.0 == C.0
}

I expected to see this happen: both functions compile to identical assembly

Instead, this happened: wrapped contains way more instructions:

example::wrapped:
        cmp     edi, 201
        jb      .LBB0_3
        xor     eax, eax
        cmp     edi, 270
        mov     ecx, 255
        setne   al
        cmovae  ecx, eax
        mov     al, 1
        test    cl, cl
        je      .LBB0_4
        movzx   ecx, cl
        cmp     ecx, 255
        jne     .LBB0_3
.LBB0_4:
        ret
.LBB0_3:
        cmp     edi, 153
        sete    al
        ret

example::primitive:
        lea     eax, [rdi - 201]
        cmp     eax, 70
        setb    cl
        cmp     edi, 153
        sete    al
        or      al, cl
        ret

Version it worked on

It most recently worked on: 1.64

Version with regression

rustc --version --verbose:

rustc 1.77.0-nightly (e51e98dde 2023-12-31)
binary: rustc
commit-hash: e51e98dde6a60637b6a71b8105245b629ac3fe77
commit-date: 2023-12-31
host: x86_64-unknown-linux-gnu
release: 1.77.0-nightly
LLVM version: 17.0.6

I have bisected it to between 1.64 and 1.65 on Godbolt, and then further with cargo bisect-rustc which points to nightly-2022-08-17 (4033686...86c6ebe):

Regression in nightly-2022-08-17
...
looking for regression commit between 2022-08-16 and 2022-08-17
...
found 8 bors merge commits in the specified range
  commit[0] 2022-08-15: Auto merge of #100595 - matthiaskrgr:rollup-f1zur58, r=matthiaskrgr
  commit[1] 2022-08-15: Auto merge of #100007 - ChrisDenton:dtor-inline-never, r=michaelwoerister
  commit[2] 2022-08-16: Auto merge of #100237 - cjgillot:no-special-hash-hir, r=nagisa
  commit[3] 2022-08-16: Auto merge of #100611 - matthiaskrgr:rollup-rxj10ur, r=matthiaskrgr
  commit[4] 2022-08-16: Auto merge of #100441 - nnethercote:shrink-ast-Attribute, r=petrochenkov
  commit[5] 2022-08-16: Auto merge of #99612 - yanchen4791:issue-95079-fix, r=compiler-errors
  commit[6] 2022-08-16: Auto merge of #100626 - Dylan-DPC:rollup-mwbm7kj, r=Dylan-DPC
  commit[7] 2022-08-16: Auto merge of #100644 - TaKO8Ki:rollup-n0o6a1t, r=TaKO8Ki

Backtrace

Backtrace

<backtrace>

@rustbot modify labels: +I-slow +regression-from-stable-to-stable -regression-untriaged

@DaniPopes DaniPopes added C-bug Category: This is a bug. regression-untriaged Untriaged performance or correctness regression. labels Jan 2, 2024
@rustbot rustbot added I-prioritize Issue: Indicates that prioritization has been requested for this issue. needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. I-slow Issue: Problems and improvements with respect to performance of generated code. regression-from-stable-to-stable Performance or correctness regression from one stable version to another. and removed regression-untriaged Untriaged performance or correctness regression. labels Jan 2, 2024
@DaniPopes
Copy link
Contributor Author

@rustbot modify labels: -needs-triage +A-LLVM

@rustbot rustbot added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Jan 2, 2024
@DaniPopes
Copy link
Contributor Author

Possibly #100460 which modified PartialOrd::le?

@apiraino
Copy link
Contributor

apiraino commented Jan 4, 2024

WG-prioritization assigning priority (Zulip discussion).

(I wonder if it's in any way related to issue #119014 also about code generation for Partial)

@rustbot label -I-prioritize +P-medium

@rustbot rustbot added P-medium Medium priority and removed I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Jan 4, 2024
@DianQK
Copy link
Member

DianQK commented Jan 5, 2024

Alive2: https://alive2.llvm.org/ce/z/LDK_Rf.
(I haven't checked where the missing optimizations are.)

@clubby789
Copy link
Contributor

clubby789 commented Jan 5, 2024

The regression goes away with -Zmir-opt-level=0 - comparing the LLVM opt pipeline between 1.64 and 1.65, it looks like (in 1.64) LLVM is able to optimise PartialOrd::le down to a single instruction, then inline that into f()

In 1.65, PartialOrd::le has been inlined in the MIR, so LLVM inlines several slightly less minimal partial_cmp calls into f() which it then has trouble optimising as well.

I'm not sure this lines up with the bisection, but it seems to be somewhat related and 1.65 did enable MIR inlining by default

@DianQK
Copy link
Member

DianQK commented Jan 15, 2024

@rustbot claim

@DianQK
Copy link
Member

DianQK commented Jan 16, 2024

Upstream issue: llvm/llvm-project#78281

Even though I created this issue, I think there may be other solutions.
I might be able to keep the range information in the SROA, and of course for this example, I could use the LazyValueInfo calculation. When I'm sure that this is also a possible way, I'll create a new issue. (Of course, the above issue needs to be resolved as well.)

But personally, I would like this issue to be solved directly in rust. I write similar code all the time (and I'm sure other projects are similar?). If we wait for an upstream solution, we may be delayed until LLVM 19. Of course, I would also like to see if I can use this to learn the MIR optimization. :)

@DianQK
Copy link
Member

DianQK commented Jan 23, 2024

Example of trait removal:

#![crate_type = "lib"]
use std::cmp::Ordering::*;

const A: i32 = 201;
const B: i32 = 270;
const C: i32 = 153;

#[inline(never)]
#[no_mangle]
pub fn foo(x: i32) -> bool {
    matches!(x.partial_cmp(&A), Some(Greater | Equal)) && 
    matches!(x.partial_cmp(&B), Some(Less | Equal)) ||
    x == C
}

Godbolt: https://rust.godbolt.org/z/EP9ohq5j3

bors added a commit to rust-lang-ci/rust that referenced this issue Jan 23, 2024
…tchs, r=<try>

Replace the default branch with an unreachable branch If it is the last variant

Fixes rust-lang#119520.

LLVM currently has limited ability to eliminate dead branches in switches, even with the patch of llvm/llvm-project#73446.

The main reasons are as follows:

- Additional costs are required to calculate the range of values, and there exist many scenarios that cannot be analyzed accurately.
- Matching values by bitwise calculation cannot handle odd branches, nor can it handle values like `-1, 0, 1`. See [SimplifyCFG.cpp#L5424](https://github.com/llvm/llvm-project/blob/llvmorg-17.0.6/llvm/lib/Transforms/Utils/SimplifyCFG.cpp#L5424) and https://llvm.godbolt.org/z/qYMqhvMa8
- The current range information is continuous, even if the metadata for the range is submitted. See [ConstantRange.cpp#L1869-L1870](https://github.com/llvm/llvm-project/blob/llvmorg-17.0.6/llvm/lib/IR/ConstantRange.cpp#L1869-L1870).
- The metadata of the range may be lost in passes such as SROA. See https://rust.godbolt.org/z/e7f87vKMK.

Although we can make improvements, I think it would be more appropriate to put this issue to rustc first. After all, we can easily know the possible values.

Note that we've currently found a slow compilation problem in the presence of unreachable branches. See
llvm/llvm-project#78578.

r? compiler
bors added a commit to rust-lang-ci/rust that referenced this issue Jan 26, 2024
…tchs, r=oli-obk

Replace the default branch with an unreachable branch If it is the last variant

Fixes rust-lang#119520.

LLVM currently has limited ability to eliminate dead branches in switches, even with the patch of llvm/llvm-project#73446.

The main reasons are as follows:

- Additional costs are required to calculate the range of values, and there exist many scenarios that cannot be analyzed accurately.
- Matching values by bitwise calculation cannot handle odd branches, nor can it handle values like `-1, 0, 1`. See [SimplifyCFG.cpp#L5424](https://github.com/llvm/llvm-project/blob/llvmorg-17.0.6/llvm/lib/Transforms/Utils/SimplifyCFG.cpp#L5424) and https://llvm.godbolt.org/z/qYMqhvMa8
- The current range information is continuous, even if the metadata for the range is submitted. See [ConstantRange.cpp#L1869-L1870](https://github.com/llvm/llvm-project/blob/llvmorg-17.0.6/llvm/lib/IR/ConstantRange.cpp#L1869-L1870).
- The metadata of the range may be lost in passes such as SROA. See https://rust.godbolt.org/z/e7f87vKMK.

Although we can make improvements, I think it would be more appropriate to put this issue to rustc first. After all, we can easily know the possible values.

Note that we've currently found a slow compilation problem in the presence of unreachable branches. See
llvm/llvm-project#78578.

r? compiler
Nadrieril added a commit to Nadrieril/rust that referenced this issue Jan 27, 2024
…witchs, r=oli-obk

Replace the default branch with an unreachable branch If it is the last variant

Fixes rust-lang#119520.

LLVM currently has limited ability to eliminate dead branches in switches, even with the patch of llvm/llvm-project#73446.

The main reasons are as follows:

- Additional costs are required to calculate the range of values, and there exist many scenarios that cannot be analyzed accurately.
- Matching values by bitwise calculation cannot handle odd branches, nor can it handle values like `-1, 0, 1`. See [SimplifyCFG.cpp#L5424](https://github.com/llvm/llvm-project/blob/llvmorg-17.0.6/llvm/lib/Transforms/Utils/SimplifyCFG.cpp#L5424) and https://llvm.godbolt.org/z/qYMqhvMa8
- The current range information is continuous, even if the metadata for the range is submitted. See [ConstantRange.cpp#L1869-L1870](https://github.com/llvm/llvm-project/blob/llvmorg-17.0.6/llvm/lib/IR/ConstantRange.cpp#L1869-L1870).
- The metadata of the range may be lost in passes such as SROA. See https://rust.godbolt.org/z/e7f87vKMK.

Although we can make improvements, I think it would be more appropriate to put this issue to rustc first. After all, we can easily know the possible values.

Note that we've currently found a slow compilation problem in the presence of unreachable branches. See
llvm/llvm-project#78578.

r? compiler
bors added a commit to rust-lang-ci/rust that referenced this issue Jan 27, 2024
…tchs, r=oli-obk

Replace the default branch with an unreachable branch If it is the last variant

Fixes rust-lang#119520.

LLVM currently has limited ability to eliminate dead branches in switches, even with the patch of llvm/llvm-project#73446.

The main reasons are as follows:

- Additional costs are required to calculate the range of values, and there exist many scenarios that cannot be analyzed accurately.
- Matching values by bitwise calculation cannot handle odd branches, nor can it handle values like `-1, 0, 1`. See [SimplifyCFG.cpp#L5424](https://github.com/llvm/llvm-project/blob/llvmorg-17.0.6/llvm/lib/Transforms/Utils/SimplifyCFG.cpp#L5424) and https://llvm.godbolt.org/z/qYMqhvMa8
- The current range information is continuous, even if the metadata for the range is submitted. See [ConstantRange.cpp#L1869-L1870](https://github.com/llvm/llvm-project/blob/llvmorg-17.0.6/llvm/lib/IR/ConstantRange.cpp#L1869-L1870).
- The metadata of the range may be lost in passes such as SROA. See https://rust.godbolt.org/z/e7f87vKMK.

Although we can make improvements, I think it would be more appropriate to put this issue to rustc first. After all, we can easily know the possible values.

Note that we've currently found a slow compilation problem in the presence of unreachable branches. See
llvm/llvm-project#78578.

r? compiler
bors added a commit to rust-lang-ci/rust that referenced this issue Feb 14, 2024
…tchs, r=<try>

Replace the default branch with an unreachable branch If it is the last variant

Fixes rust-lang#119520.

LLVM currently has limited ability to eliminate dead branches in switches, even with the patch of llvm/llvm-project#73446.

The main reasons are as follows:

- Additional costs are required to calculate the range of values, and there exist many scenarios that cannot be analyzed accurately.
- Matching values by bitwise calculation cannot handle odd branches, nor can it handle values like `-1, 0, 1`. See [SimplifyCFG.cpp#L5424](https://github.com/llvm/llvm-project/blob/llvmorg-17.0.6/llvm/lib/Transforms/Utils/SimplifyCFG.cpp#L5424) and https://llvm.godbolt.org/z/qYMqhvMa8
- The current range information is continuous, even if the metadata for the range is submitted. See [ConstantRange.cpp#L1869-L1870](https://github.com/llvm/llvm-project/blob/llvmorg-17.0.6/llvm/lib/IR/ConstantRange.cpp#L1869-L1870).
- The metadata of the range may be lost in passes such as SROA. See https://rust.godbolt.org/z/e7f87vKMK.

Although we can make improvements, I think it would be more appropriate to put this issue to rustc first. After all, we can easily know the possible values.

Note that we've currently found a slow compilation problem in the presence of unreachable branches. See
llvm/llvm-project#78578.

r? compiler
bors added a commit to rust-lang-ci/rust that referenced this issue Feb 28, 2024
…tchs, r=<try>

Replace the default branch with an unreachable branch If it is the last variant

Fixes rust-lang#119520. Fixes rust-lang#110097.

LLVM currently has limited ability to eliminate dead branches in switches, even with the patch of llvm/llvm-project#73446.

The main reasons are as follows:

- Additional costs are required to calculate the range of values, and there exist many scenarios that cannot be analyzed accurately.
- Matching values by bitwise calculation cannot handle odd branches, nor can it handle values like `-1, 0, 1`. See [SimplifyCFG.cpp#L5424](https://github.com/llvm/llvm-project/blob/llvmorg-17.0.6/llvm/lib/Transforms/Utils/SimplifyCFG.cpp#L5424) and https://llvm.godbolt.org/z/qYMqhvMa8
- The current range information is continuous, even if the metadata for the range is submitted. See [ConstantRange.cpp#L1869-L1870](https://github.com/llvm/llvm-project/blob/llvmorg-17.0.6/llvm/lib/IR/ConstantRange.cpp#L1869-L1870).
- The metadata of the range may be lost in passes such as SROA. See https://rust.godbolt.org/z/e7f87vKMK.

Although we can make improvements, I think it would be more appropriate to put this issue to rustc first. After all, we can easily know the possible values.

Note that we've currently found a slow compilation problem in the presence of unreachable branches. See
llvm/llvm-project#78578.

r? compiler
bors added a commit to rust-lang-ci/rust that referenced this issue Feb 29, 2024
…tchs, r=<try>

Replace the default branch with an unreachable branch If it is the last variant

Fixes rust-lang#119520. Fixes rust-lang#110097.

LLVM currently has limited ability to eliminate dead branches in switches, even with the patch of llvm/llvm-project#73446.

The main reasons are as follows:

- Additional costs are required to calculate the range of values, and there exist many scenarios that cannot be analyzed accurately.
- Matching values by bitwise calculation cannot handle odd branches, nor can it handle values like `-1, 0, 1`. See [SimplifyCFG.cpp#L5424](https://github.com/llvm/llvm-project/blob/llvmorg-17.0.6/llvm/lib/Transforms/Utils/SimplifyCFG.cpp#L5424) and https://llvm.godbolt.org/z/qYMqhvMa8
- The current range information is continuous, even if the metadata for the range is submitted. See [ConstantRange.cpp#L1869-L1870](https://github.com/llvm/llvm-project/blob/llvmorg-17.0.6/llvm/lib/IR/ConstantRange.cpp#L1869-L1870).
- The metadata of the range may be lost in passes such as SROA. See https://rust.godbolt.org/z/e7f87vKMK.

Although we can make improvements, I think it would be more appropriate to put this issue to rustc first. After all, we can easily know the possible values.

Note that we've currently found a slow compilation problem in the presence of unreachable branches. See
llvm/llvm-project#78578.

r? compiler
bors added a commit to rust-lang-ci/rust that referenced this issue Feb 29, 2024
…tchs, r=<try>

Replace the default branch with an unreachable branch If it is the last variant

Fixes rust-lang#119520. Fixes rust-lang#110097.

LLVM currently has limited ability to eliminate dead branches in switches, even with the patch of llvm/llvm-project#73446.

The main reasons are as follows:

- Additional costs are required to calculate the range of values, and there exist many scenarios that cannot be analyzed accurately.
- Matching values by bitwise calculation cannot handle odd branches, nor can it handle values like `-1, 0, 1`. See [SimplifyCFG.cpp#L5424](https://github.com/llvm/llvm-project/blob/llvmorg-17.0.6/llvm/lib/Transforms/Utils/SimplifyCFG.cpp#L5424) and https://llvm.godbolt.org/z/qYMqhvMa8
- The current range information is continuous, even if the metadata for the range is submitted. See [ConstantRange.cpp#L1869-L1870](https://github.com/llvm/llvm-project/blob/llvmorg-17.0.6/llvm/lib/IR/ConstantRange.cpp#L1869-L1870).
- The metadata of the range may be lost in passes such as SROA. See https://rust.godbolt.org/z/e7f87vKMK.

Although we can make improvements, I think it would be more appropriate to put this issue to rustc first. After all, we can easily know the possible values.

Note that we've currently found a slow compilation problem in the presence of unreachable branches. See
llvm/llvm-project#78578.

r? compiler
bors added a commit to rust-lang-ci/rust that referenced this issue Mar 8, 2024
…tchs, r=<try>

Replace the default branch with an unreachable branch If it is the last variant

Fixes rust-lang#119520. Fixes rust-lang#110097.

LLVM currently has limited ability to eliminate dead branches in switches, even with the patch of llvm/llvm-project#73446.

The main reasons are as follows:

- Additional costs are required to calculate the range of values, and there exist many scenarios that cannot be analyzed accurately.
- Matching values by bitwise calculation cannot handle odd branches, nor can it handle values like `-1, 0, 1`. See [SimplifyCFG.cpp#L5424](https://github.com/llvm/llvm-project/blob/llvmorg-17.0.6/llvm/lib/Transforms/Utils/SimplifyCFG.cpp#L5424) and https://llvm.godbolt.org/z/qYMqhvMa8
- The current range information is continuous, even if the metadata for the range is submitted. See [ConstantRange.cpp#L1869-L1870](https://github.com/llvm/llvm-project/blob/llvmorg-17.0.6/llvm/lib/IR/ConstantRange.cpp#L1869-L1870).
- The metadata of the range may be lost in passes such as SROA. See https://rust.godbolt.org/z/e7f87vKMK.

Although we can make improvements, I think it would be more appropriate to put this issue to rustc first. After all, we can easily know the possible values.

Note that we've currently found a slow compilation problem in the presence of unreachable branches. See
llvm/llvm-project#78578.

r? compiler
@bors bors closed this as completed in 14fbc3c Mar 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. I-slow Issue: Problems and improvements with respect to performance of generated code. P-medium Medium priority regression-from-stable-to-stable Performance or correctness regression from one stable version to another.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants