Comparison of Option<NonZeroU*> is not fully optimized #49892

kennytm · 2018-04-11T23:23:29Z

Compare:

#![feature(nonzero)]
use std::num::NonZeroU32;

pub fn f(a: Option<NonZeroU32>, b: Option<NonZeroU32>) -> bool {
    a < b
}
pub fn g(a: u32, b: u32) -> bool {
    a < b
}

The functions f and g have equivalent effect, and should compile to the same code. However, the translated f is much more complex, involving needless comparison of the discriminant.

LLVM IR and ASM (in release mode)

; playground::f
; Function Attrs: norecurse nounwind readnone uwtable
define zeroext i1 @_ZN10playground1f17hd628eff49d2ff60dE(i32, i32) unnamed_addr #0 {
start:
  %2 = icmp ne i32 %0, 0
  %3 = icmp ne i32 %1, 0
  %4 = xor i1 %2, %3
  br i1 %4, label %bb7.i, label %bb6.i

bb6.i:                                            ; preds = %start
  %5 = icmp ult i32 %0, %1
  %spec.select = and i1 %2, %5
  ret i1 %spec.select

bb7.i:                                            ; preds = %start
  %6 = xor i1 %2, true
  %7 = and i1 %3, %6
  ret i1 %7
}

; playground::g
; Function Attrs: norecurse nounwind readnone uwtable
define zeroext i1 @_ZN10playground1g17hea1c394facd055aeE(i32 %a, i32 %b) unnamed_addr #0 {
start:
  %0 = icmp ult i32 %a, %b
  ret i1 %0
}

playground::f:
	test	edi, edi
	setne	al
	test	esi, esi
	setne	cl
	xor	cl, al
	je	.LBB0_1
	test	esi, esi
	setne	cl
	test	edi, edi
	sete	al
	and	al, cl
	ret

.LBB0_1:
	test	edi, edi
	setne	cl
	cmp	edi, esi
	setb	al
	and	al, cl
	ret

playground::g:
	cmp	edi, esi
	setb	al
	ret

Note that specializing PartialOrd etc for Option<NonZero*> is not a valid solution, since custom wrapper types of NonZero* will still have the same problem.

#[derive(PartialOrd, PartialEq)]
pub struct S(NonZeroU32);
pub fn f(a: Option<S>, b: Option<S>) -> bool {
    a < b
}

cc #49137.

The text was updated successfully, but these errors were encountered:

scottmcm · 2018-04-12T02:18:44Z

Possibly-related: #49572

oli-obk · 2018-04-12T08:50:25Z

I'm assuming we need to change the layout here

rust/src/librustc_trans/mir/rvalue.rs

Line 435 in 5ebf748

layout: self.cx.layout_of(discr_ty)

In order to pass a layout that knows that the discriminant is not just an integer, but a subset of an integer.

Also maybe the discriminant intrinsic must be adjusted:

rust/src/librustc_trans/intrinsic.rs

Line 406 in 31ae4f9

"discriminant_value" => {

Maybe we should add compiler-internal support for subsetting integers? So TypeVariants::Int also has a range field that we can set?

cc @eddyb
cc @rust-lang/wg-codegen

hanna-kruppe · 2018-04-12T09:08:14Z

I don't see how layout factors in, Option<NonZeroU32> is already represented as a single 32 bit integer with 0 as None. The generic PartialOrd is just not fully optimized for this monomorphization. That's an optimization LLVM could just do, it just currently doesn't.

hanna-kruppe · 2018-04-12T09:17:33Z

To clarify, the Alive link is just to confirm the overall transformation is correct within LLVM IR. Obviously this very specific pattern should not be added verbatim to instcombine (it wouldn't even help since I replaced the branching with select to make Alive accept it). But since this optimization applies only to particular monomorphizations and we can't reasonably special case this in rustc, we should really report this as a LLVM bug, so that they can look into which combination of passes could be improved to catch this.

oli-obk · 2018-04-12T11:04:27Z

Oh... I assumed that we're missing some assume annotations that would allow this optimization from happening by itself.

eddyb · 2018-04-12T22:11:44Z

What's happening in <Option as PartialOrd>::le, even? Is this the output of auto-derive? What's the actual code that optimizes to the shown LLVM IR? I'm wondering if this is a derive inefficiency.

kennytm · 2021-09-17T05:11:07Z

as of 1.57-nightly, the ASM generated for f is becoming even worse.

; playground::f
; Function Attrs: norecurse nounwind nonlazybind readnone uwtable willreturn
define zeroext i1 @_ZN10playground1f17h98eba4f4ca80f23cE(i32 %0, i32 %1) unnamed_addr #0 {
start:
  %2 = icmp ne i32 %0, 0
  %3 = icmp ne i32 %1, 0
  %4 = xor i1 %2, %3
  br i1 %4, label %bb2.i.i, label %bb1.i.i

bb2.i.i:                                          ; preds = %start
  %5 = xor i1 %2, true
  %_3.i.i.i.i = and i1 %3, %5
  %.0.i.i.i.i = select i1 %_3.i.i.i.i, i8 -1, i8 1
  br label %_ZN4core3cmp10PartialOrd2lt17h869f0d8a39125a29E.exit

bb1.i.i:                                          ; preds = %start
  %.not.i.i = icmp eq i32 %0, 0
  %.not5.i.i = icmp eq i32 %1, 0
  %or.cond.i.i = or i1 %.not.i.i, %.not5.i.i
  br i1 %or.cond.i.i, label %_ZN4core3cmp10PartialOrd2lt17h869f0d8a39125a29E.exit, label %bb5.i.i

bb5.i.i:                                          ; preds = %bb1.i.i
  %_3.i.i.i.i.i = icmp ult i32 %0, %1
  %_6.i.i.i.i.i = icmp ne i32 %0, %1
  %spec.select.i.i.i.i.i = zext i1 %_6.i.i.i.i.i to i8
  %.0.i.i.i.i.i = select i1 %_3.i.i.i.i.i, i8 -1, i8 %spec.select.i.i.i.i.i
  br label %_ZN4core3cmp10PartialOrd2lt17h869f0d8a39125a29E.exit

_ZN4core3cmp10PartialOrd2lt17h869f0d8a39125a29E.exit: ; preds = %bb2.i.i, %bb1.i.i, %bb5.i.i
  %.2.i.i = phi i8 [ %.0.i.i.i.i, %bb2.i.i ], [ 0, %bb1.i.i ], [ %.0.i.i.i.i.i, %bb5.i.i ]
  %cond.i = icmp eq i8 %.2.i.i, -1
  ret i1 %cond.i
}

example::f:
        test    edi, edi
        setne   al
        test    esi, esi
        setne   cl
        cmp     al, cl
        je      .LBB0_2
        test    edi, edi
        setne   al
        add     al, al
        add     al, -1
        test    esi, esi
        movzx   ecx, al
        mov     eax, 1
        cmovne  eax, ecx
.LBB0_5:
        cmp     al, -1
        sete    al
        ret
.LBB0_2:
        xor     eax, eax
        test    edi, edi
        je      .LBB0_5
        test    esi, esi
        je      .LBB0_5
        xor     ecx, ecx
        cmp     edi, esi
        setne   cl
        mov     eax, 255
        cmovae  eax, ecx
        cmp     al, -1
        sete    al
        ret

the regression happens on 1.52.0 which upgraded to LLVM 12.

the GCC backend produces less instructions, but still far from ideal:

example::f:
        xor     eax, eax
        test    esi, esi
        je      .L6
        test    edi, edi
        jne     .L12
        mov     eax, 1
        ret
.L12:
        cmp     edi, esi
        setb    al
.L6:
        ret

example::g:
        cmp     edi, esi
        setb    al
        ret

saethlin · 2021-12-10T01:57:37Z

Most of the above comments are about PartialOrd, but this also afflicts PartialEq. Including on the latest nightly with NewPM.

pub fn h(a: Option<NonZeroU32>, b: Option<NonZeroU32>) -> bool {
    a == b
}

example::h:
        test    edi, edi
        setne   al
        test    esi, esi
        setne   cl
        xor     cl, al
        je      .LBB0_2
        xor     eax, eax
        ret
.LBB0_2:
        test    esi, esi
        sete    al
        test    edi, edi
        sete    cl
        or      cl, al
        cmp     edi, esi
        sete    al
        or      al, cl
        ret

IMO the fact that LLVM can't do the expected optimization to just compare as two u32s is more egregious.

JakobDegen · 2021-12-10T02:20:41Z

Also not great is this one, but then there's

#[derive(Eq, PartialEq)]
pub enum A {
    A(u8),
    B(u8),
    C(u8),
    D(u8),
    E(u8)
}

pub fn cmp(a: A, b: A) -> bool {
    a == b
}

which currently compiles to what I'm fairly sure is a jump table that all goes to the same place:

example::cmp:
        cmp     dil, dl
        jne     .LBB0_1
        movzx   eax, dil
        lea     rdx, [rip + .LJTI0_0]
        movsxd  rax, dword ptr [rdx + 4*rax]
        add     rax, rdx
        jmp     rax
.LBB0_3:
        cmp     sil, cl
        sete    al
        ret
.LBB0_1:
        xor     eax, eax
        ret
.LJTI0_0:
        .long   .LBB0_3-.LJTI0_0
        .long   .LBB0_3-.LJTI0_0
        .long   .LBB0_3-.LJTI0_0
        .long   .LBB0_3-.LJTI0_0
        .long   .LBB0_3-.LJTI0_0

Edit: Turns out these are unrelated
Edit edit: A local fix of the issue on the rustc side shows that LLVM misses the opt even when its allowed, so they are once more related.

JakobDegen · 2021-12-11T21:19:02Z

I investigated a little and filed an LLVM bug: llvm/llvm-project#52622

curiouscat2018 · 2022-05-09T02:42:28Z

I don't think we have sufficient information to solve the problem on llvm side. Comparing the two functions:

pub fn less_than_unsigned(a: Option<NonZeroU32>, b: Option<NonZeroU32>) -> bool {
    a < b
}

pub fn less_than_signed(a: Option<NonZeroI32>, b: Option<NonZeroI32>) -> bool {
    a < b
}

llvm-ir generated are almost same, except one line difference of ugt and sgt instruction. However only the first function can be reduced to u32 comparison. The second one can not because None is smaller than any value of Some(NonZeroI32), including zero or negative numbers. Given this fact, how can we know which function the llvm-ir is coming from?

This problem might be solved on std, if we can have a specific PartialEq definition for Option<NonZeroU32>:

impl PartialOrd for Option<NonZeroU32> {
    fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
        unsafe {
            let a = std::intrinsics::transmute::<&Option<NonZeroU32>, &u32>(self);
            let b = std::intrinsics::transmute::<&Option<NonZeroU32>, &u32>(other);
            a.partial_cmp(b)
        }
    }
}

In order to do this, we need to use specification feature and mark the derived partial_cmp function for Option default. Yet the derive macro does not accept parameters like defaultness=default, which is entirely another topic.

cc @eddyb @oli-obk

JakobDegen · 2022-05-09T21:51:16Z

@curiouscat2018 I'm not sure what you mean. alive2 says that the transformation would be correct for the unsigned case, so LLVM has all the information it needs to continue optimizing

curiouscat2018 · 2022-05-10T02:53:26Z

@JakobDegen yes the unsigned case can be correctly transformed, but the signed case can not if you try on alive2. Given only llvm-ir, how can we tell whether it's from signed or unsigned case? alive2 verifies whether the transformation is correct, but not how the transformation can be done. My point is, we might not have enough information to do optimization on llvm side. Does this clarification help ?

JakobDegen · 2022-05-10T03:31:36Z

I don't understand what you mean. If alive2 says that the transformation is legal, then by definition LLVM has all the information it needs to optimize. This is what an optimizer should do after all: We give it a slow version of the LLVM IR, and it turns it into the fast version that is equivalent. There's nothing else we should need to do to help it.

Specifically, with respect to "how can we tell whether it's from [the] signed or unsigned case?": There's no need to tell where the code came from. Optimizers don't work backwards to try and understand the origin of the code, they work forwards from what they have to try and simplify it further. Of course, sometimes that's really hard, but this is not one of those times. In the LLVM IR from before, it's sound to replace %.not5.i.i with false for example.

saethlin · 2022-05-30T02:18:10Z

I found a way around this: https://godbolt.org/z/bh9dfWdc1

For anyone on mobile, we have an

#[derive(PartialEq, Clone, Copy)]
pub enum SbTag {
    Tagged(NonZeroU64),
    Untagged,
}

For the purposes of codegen this is an Option<NonZeroU64>. Replacing the derived PartialEq with the below code produces the desired codegen for comparison of SbTag.

impl SbTag {
    fn as_u64(&self) -> u64 {
        match self {
            SbTag::Tagged(tag) => tag.get(),
            SbTag::Untagged => 0,
        }
    }
}

impl PartialEq for SbTag {
    fn eq(&self, other: &Self) -> bool {
        self.as_u64() == other.as_u64()
    }
}

I think the culprit here is how LLVM is understanding the exact arrangement of the logic in the derived PartialEq. Which is this:

    fn eq(&self, other: &SbTag) -> bool {
        {
            let __self_vi = ::core::intrinsics::discriminant_value(&*self);
            let __arg_1_vi = ::core::intrinsics::discriminant_value(&*other);
            if true && __self_vi == __arg_1_vi {
                match (&*self, &*other) {
                    (&SbTag::Tagged(ref __self_0), &SbTag::Tagged(ref __arg_1_0)) => {
                        (*__self_0) == (*__arg_1_0)
                    }
                    _ => true,
                }
            } else {
                false
            }
        }
    }

Adjusting the behavior of the derived PartialEq to match what I did by hand might be a bit tricky. But at least it's clear that there is a way to fix this just in Rust?

curiouscat2018 · 2022-06-24T07:50:04Z

@saethlin I think we can do similar change for Option. Manual override of PartialEq trait for Option<NonZeroU*> should work. @JakobDegen any idea on this?

saethlin · 2022-06-24T12:23:09Z

@nnethercote is changing how a lot of the built-in derives work, aiming at improving compile time. I think it would be best to re-evaluate this after his PRs land, just to make sure this is still a problem with them.

nikic · 2022-06-24T12:46:22Z

For reference, alive link for the simplest case (PartialEq of Option): https://alive2.llvm.org/ce/z/DMtwsK

I've been staring at this, but don't really see any obvious sequence of simple transforms that would lead to the desired result. It would be viable to fold select i1 %cmp1, i1 %cmp2, i1 false to either %cmp1 or %cmp2, but that doesn't make things substantially simpler. Having a xor as dominating condition is tough.

nnethercote · 2022-06-24T23:36:16Z

My changes don't affect eq for enums, other than getting rid of the redundant true && in the discriminant check. (My changes affect eq for structs a lot more.)

scottmcm · 2022-10-26T10:22:18Z

@nikic Any ideas if we rewrote Option::eq to give you a different dominating condition? Like we could try putting the payload comparison behind an &, maybe https://rust.godbolt.org/z/vKMa4zzsx. Alive again says the transformation is legal https://alive2.llvm.org/ce/z/PuZ-SZ, though I also don't see an obvious path to get there.

EDIT: Actually, I found one thing that opt doesn't do today (llvm/llvm-project#58624), at least.

…try> Remove SpecOptionPartialEq With the recent LLVM bump, the specialization for Option::partial_eq on types with niches is no longer necessary. I kept the manual implementation as it still gives us better codegen than the derive (will look at this seperately). Also implemented PartialOrd/Ord by hand as it _somewhat_ improves codegen for rust-lang#49892: https://godbolt.org/z/vx5Y6oW4Y

…hpratt Remove SpecOptionPartialEq With the recent LLVM bump, the specialization for Option::partial_eq on types with niches is no longer necessary. I kept the manual implementation as it still gives us better codegen than the derive (will look at this seperately). Also implemented PartialOrd/Ord by hand as it _somewhat_ improves codegen for rust-lang#49892: https://godbolt.org/z/vx5Y6oW4Y

mlindner · 2024-07-29T22:46:34Z

Possibly related: https://old.reddit.com/r/rust/comments/1ef6r8s/why_does_generate_so_much_code_compared_to_and_cmp/

kennytm · 2024-07-30T04:53:22Z

as of the current latest Rust (1.82-nightly), thanks to #122024,

use std::num::NonZeroU32;
#[no_mangle]
pub fn f(a: Option<NonZeroU32>, b: Option<NonZeroU32>) -> bool {
    a < b
}
#[no_mangle]
pub fn g(a: u32, b: u32) -> bool {
    a < b
}

; Function Attrs: mustprogress nofree norecurse nosync nounwind nonlazybind willreturn memory(none) uwtable
define noundef zeroext i1 @f(i32 noundef %0, i32 noundef %1) unnamed_addr #0 {
start:
  %2 = icmp eq i32 %0, 0
  %3 = icmp ne i32 %1, 0
  %4 = icmp ult i32 %0, %1
  %_3.sroa.0.0 = select i1 %2, i1 %3, i1 %4
  ret i1 %_3.sroa.0.0
}
; Function Attrs: mustprogress nofree norecurse nosync nounwind nonlazybind willreturn memory(none) uwtable
define noundef zeroext i1 @g(i32 noundef %a, i32 noundef %b) unnamed_addr #0 {
start:
  %_0 = icmp ult i32 %a, %b
  ret i1 %_0
}

f:                                      # @f
# %bb.0:
	xor	eax, eax
	test	esi, esi
	setne	al
	xor	ecx, ecx
	cmp	edi, esi
	setb	cl
	test	edi, edi
	cmovne	eax, ecx
                                        # kill: def $al killed $al killed $eax
	ret
                                        # -- End function

g:                                      # @g
# %bb.0:
	cmp	edi, esi
	setb	al
	ret
                                        # -- End function

That is, there still remains a redundant a == None comparison

if a == 0 {
    b != 0
} else {
    a < b
}

danilaml · 2024-07-30T16:06:16Z

@kennytm looks like missed optimization in upstream llvm: https://alive2.llvm.org/ce/z/gFYb4q

kennytm added I-slow Issue: Problems and improvements with respect to performance of generated code. C-enhancement Category: An issue proposing an enhancement or a PR with one. WG-llvm Working group: LLVM backend code generation labels Apr 11, 2018

kennytm changed the title ~~Comparison of Option<NonZero*> is not fully optimized~~ Comparison of Option<NonZeroU*> is not fully optimized Apr 11, 2018

kennytm added the A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. label Apr 12, 2018

Vlad-Shcherbina mentioned this issue May 27, 2020

Missed optimization: Option<fieldless enum> equality #72646

Open

jonas-schievink added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Jun 3, 2020

This was referenced Jun 7, 2022

Iterate in reverse when comparing Stacks rust-lang/miri#2214

Closed

Optimize SbTag::eq rust-lang/miri#2218

Merged

lukas-code mentioned this issue Oct 26, 2022

Manually implement PartialEq for Option<T> and specialize non-nullable types #103556

Merged

workingjubilee added the C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such label Oct 8, 2023

clubby789 mentioned this issue Mar 5, 2024

Remove SpecOptionPartialEq #122024

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comparison of Option<NonZeroU*> is not fully optimized #49892

Comparison of Option<NonZeroU*> is not fully optimized #49892

kennytm commented Apr 11, 2018

scottmcm commented Apr 12, 2018

oli-obk commented Apr 12, 2018

hanna-kruppe commented Apr 12, 2018

hanna-kruppe commented Apr 12, 2018

oli-obk commented Apr 12, 2018

eddyb commented Apr 12, 2018 •

edited

Loading

kennytm commented Sep 17, 2021

saethlin commented Dec 10, 2021 •

edited

Loading

JakobDegen commented Dec 10, 2021 •

edited

Loading

JakobDegen commented Dec 11, 2021

curiouscat2018 commented May 9, 2022

JakobDegen commented May 9, 2022 •

edited

Loading

curiouscat2018 commented May 10, 2022

JakobDegen commented May 10, 2022

saethlin commented May 30, 2022

curiouscat2018 commented Jun 24, 2022

saethlin commented Jun 24, 2022

nikic commented Jun 24, 2022 •

edited

Loading

nnethercote commented Jun 24, 2022

scottmcm commented Oct 26, 2022 •

edited

Loading

mlindner commented Jul 29, 2024

kennytm commented Jul 30, 2024 •

edited

Loading

danilaml commented Jul 30, 2024

Comparison of Option<NonZeroU*> is not fully optimized #49892

Comparison of Option<NonZeroU*> is not fully optimized #49892

Comments

kennytm commented Apr 11, 2018

scottmcm commented Apr 12, 2018

oli-obk commented Apr 12, 2018

hanna-kruppe commented Apr 12, 2018

hanna-kruppe commented Apr 12, 2018

oli-obk commented Apr 12, 2018

eddyb commented Apr 12, 2018 • edited Loading

kennytm commented Sep 17, 2021

saethlin commented Dec 10, 2021 • edited Loading

JakobDegen commented Dec 10, 2021 • edited Loading

JakobDegen commented Dec 11, 2021

curiouscat2018 commented May 9, 2022

JakobDegen commented May 9, 2022 • edited Loading

curiouscat2018 commented May 10, 2022

JakobDegen commented May 10, 2022

saethlin commented May 30, 2022

curiouscat2018 commented Jun 24, 2022

saethlin commented Jun 24, 2022

nikic commented Jun 24, 2022 • edited Loading

nnethercote commented Jun 24, 2022

scottmcm commented Oct 26, 2022 • edited Loading

mlindner commented Jul 29, 2024

kennytm commented Jul 30, 2024 • edited Loading

danilaml commented Jul 30, 2024

eddyb commented Apr 12, 2018 •

edited

Loading

saethlin commented Dec 10, 2021 •

edited

Loading

JakobDegen commented Dec 10, 2021 •

edited

Loading

JakobDegen commented May 9, 2022 •

edited

Loading

nikic commented Jun 24, 2022 •

edited

Loading

scottmcm commented Oct 26, 2022 •

edited

Loading

kennytm commented Jul 30, 2024 •

edited

Loading