Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing MIR optimization: Replace matches with loads if possible #88793

Open
pcwalton opened this issue Sep 9, 2021 · 5 comments
Open

Missing MIR optimization: Replace matches with loads if possible #88793

pcwalton opened this issue Sep 9, 2021 · 5 comments
Labels
A-mir-opt Area: MIR optimizations C-enhancement Category: An issue proposing an enhancement or a PR with one. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such

Comments

@pcwalton
Copy link
Contributor

pcwalton commented Sep 9, 2021

Many times, people write matches on enums that are really loads from a table. It'd be nice if we could codegen them as such.

Here's an example, at https://godbolt.org/z/4P9v61an7:

pub enum Char {
    A,
    B,
    C,
    D,
    E,
    F,
    G,
    H,
    I,
    J,
    K,
    L,
}

pub fn to_str(val: Char) -> &'static str {
    match val {
        Char::A => "A",
        Char::B => "B",
        Char::C => "C",
        Char::D => "D",
        Char::E => "E",
        Char::F => "F",
        Char::G => "G",
        Char::H => "H",
        Char::I => "I",
        Char::J => "J",
        Char::K => "K",
        Char::L => "L",
    }
}

The codegen here has a lot to be desired:

example::to_str:
        lea     rax, [rip + .L__unnamed_1]
        movzx   ecx, dil
        lea     rdx, [rip + .LJTI0_0]
        movsxd  rcx, dword ptr [rdx + 4*rcx]
        add     rcx, rdx
        jmp     rcx
.LBB0_1:
        lea     rax, [rip + .L__unnamed_2]
        mov     edx, 1
        ret
.LBB0_2:
        lea     rax, [rip + .L__unnamed_3]
        mov     edx, 1
        ret
.LBB0_3:
        lea     rax, [rip + .L__unnamed_4]
        mov     edx, 1
        ret
.LBB0_4:
        lea     rax, [rip + .L__unnamed_5]
        mov     edx, 1
        ret
.LBB0_5:
        lea     rax, [rip + .L__unnamed_6]
        mov     edx, 1
        ret
.LBB0_6:
        lea     rax, [rip + .L__unnamed_7]
        mov     edx, 1
        ret
.LBB0_7:
        lea     rax, [rip + .L__unnamed_8]
        mov     edx, 1
        ret
.LBB0_8:
        lea     rax, [rip + .L__unnamed_9]
        mov     edx, 1
        ret
.LBB0_9:
        lea     rax, [rip + .L__unnamed_10]
        mov     edx, 1
        ret
.LBB0_10:
        lea     rax, [rip + .L__unnamed_11]
        mov     edx, 1
        ret
.LBB0_11:
        lea     rax, [rip + .L__unnamed_12]
.LBB0_12:
        mov     edx, 1
        ret
.LJTI0_0:
        .long   .LBB0_12-.LJTI0_0
        .long   .LBB0_1-.LJTI0_0
        .long   .LBB0_2-.LJTI0_0
        .long   .LBB0_3-.LJTI0_0
        .long   .LBB0_4-.LJTI0_0
        .long   .LBB0_5-.LJTI0_0
        .long   .LBB0_6-.LJTI0_0
        .long   .LBB0_7-.LJTI0_0
        .long   .LBB0_8-.LJTI0_0
        .long   .LBB0_9-.LJTI0_0
        .long   .LBB0_10-.LJTI0_0
        .long   .LBB0_11-.LJTI0_0

.L__unnamed_12:
        .byte   76

.L__unnamed_11:
        .byte   75

.L__unnamed_10:
        .byte   74

.L__unnamed_9:
        .byte   73

.L__unnamed_8:
        .byte   72

.L__unnamed_7:
        .byte   71

.L__unnamed_6:
        .byte   70

.L__unnamed_5:
        .byte   69

.L__unnamed_4:
        .byte   68

.L__unnamed_3:
        .byte   67

.L__unnamed_2:
        .byte   66

.L__unnamed_1:
        .byte   65

Deriving Debug can cause poor codegen too: https://godbolt.org/z/xnexGxo8e

There's some discussion on Twitter from LLVM folks that suggests this would be best as an MIR optzn: https://twitter.com/pcwalton/status/1436036809603960835

@pcwalton pcwalton added A-mir-opt Area: MIR optimizations C-enhancement Category: An issue proposing an enhancement or a PR with one. labels Sep 9, 2021
@glandium
Copy link
Contributor

glandium commented Sep 9, 2021

It's worth noting that clang is doing better at equivalent C++ code. https://godbolt.org/z/zqT1z36es

@pcwalton
Copy link
Contributor Author

pcwalton commented Sep 9, 2021

Wonder if this is a pass ordering issue then.

@pcwalton
Copy link
Contributor Author

pcwalton commented Sep 9, 2021

There's LLVM code that already tries to do this. I guess it's not running on Rust for some reason? https://llvm.org/doxygen/SimplifyCFG_8cpp_source.html#l05786

@pcwalton
Copy link
Contributor Author

pcwalton commented Sep 9, 2021

I think I have an LLVM fix. Will post a patch upstream.

pcwalton added a commit to pcwalton/rust that referenced this issue Sep 10, 2021
…deriving

Debug for unit-like enum variants.

The intent here is to allow LLVM to remove the switch entirely in favor of an
indexed load from a table of constant strings, which is likely what the
programmer would write in C. Unfortunately, LLVM currently doesn't perform this
optimization due to a bug, but there is [a
patch](https://reviews.llvm.org/D109565) that fixes this issue. I've verified
that, with that patch applied on top of this commit, Debug for unit-like tuple
variants becomes a load, reducing the O(n) code bloat to O(1).

Note that inlining `DebugTuple::finish()` wasn't enough to allow LLVM to
optimize the code properly; I had to avoid the abstraction entirely. Not using
the abstraction is likely better for compile time anyway.

Part of rust-lang#88793.
bors added a commit to rust-lang-ci/rust that referenced this issue Sep 17, 2021
…, r=oli-obk

Introduce a fast path that avoids the `debug_tuple` abstraction when deriving Debug for unit-like enum variants.

The intent here is to allow LLVM to remove the switch entirely in favor of an
indexed load from a table of constant strings, which is likely what the
programmer would write in C. Unfortunately, LLVM currently doesn't perform this
optimization due to a bug, but there is [a
patch](https://reviews.llvm.org/D109565) that fixes this issue. I've verified
that, with that patch applied on top of this commit, Debug for unit-like tuple
variants becomes a load, reducing the O(n) code bloat to O(1).

Note that inlining `DebugTuple::finish()` wasn't enough to allow LLVM to
optimize the code properly; I had to avoid the abstraction entirely. Not using
the abstraction is likely better for compile time anyway.

Part of rust-lang#88793.

r? `@oli-obk`
@mati865
Copy link
Contributor

mati865 commented Jan 14, 2022

Should this issues be relabelled as A-LLVM?

@workingjubilee workingjubilee added the C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such label Oct 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-mir-opt Area: MIR optimizations C-enhancement Category: An issue proposing an enhancement or a PR with one. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such
Projects
None yet
Development

No branches or pull requests

4 participants