Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance regression with get_many_mut() #128214

Open
ChayimFriedman2 opened this issue Jul 25, 2024 · 2 comments
Open

Performance regression with get_many_mut() #128214

ChayimFriedman2 opened this issue Jul 25, 2024 · 2 comments
Labels
I-heavy Issue: Problems and improvements with respect to binary size of generated code. I-slow Issue: Problems and improvements with respect to performance of generated code. P-medium Medium priority perf-regression Performance regression. regression-from-stable-to-stable Performance or correctness regression from one stable version to another.
Milestone

Comments

@ChayimFriedman2
Copy link
Contributor

ChayimFriedman2 commented Jul 25, 2024

In the process of trying to stabilize get_many_mut(), and while trying to check what will be the costs of a more detailed error type, I came across the following perf regression from stable to beta.

Code

I tried this code:

#![feature(get_many_mut)]

#[inline]
fn get_many_check_valid(indices: &[usize; 2], len: usize) -> Result<(), GetManyMutError> {
    for (i, &idx) in indices.iter().enumerate() {
        if idx >= len {
            return Err(GetManyMutError::OutOfBounds);
        }
        for &idx2 in &indices[..i] {
            if idx == idx2 {
                return Err(GetManyMutError::Overlapping);
            }
        }
    }
    Ok(())
}

#[inline]
pub fn after_all<'a>(
    slice: &'a mut [i32],
    indices: [usize; 2],
) -> Result<[&mut i32; 2], GetManyMutError> {
    get_many_check_valid(&indices, slice.len())?;
    unsafe { Ok(slice.get_many_unchecked_mut(indices)) }
}

#[derive(Debug)]
pub enum GetManyMutError {
    OutOfBounds,
    Overlapping,
}

#[no_mangle]
pub fn after_all_unwrap(data: &mut [i32], indices: [usize; 2]) -> [&mut i32; 2] {
    after_all(data, indices).unwrap()
}

On stable (with RUSTC_BOOTSTRAP=1) this optimizes nicely to the following code (https://godbolt.org/z/zq8EG7df7):

after_all_unwrap:
        mov     r8, qword ptr [rcx]
        xor     eax, eax
        cmp     r8, rdx
        jae     .LBB0_4
        mov     rcx, qword ptr [rcx + 8]
        cmp     rcx, rdx
        jae     .LBB0_4
        cmp     rcx, r8
        je      .LBB0_3
        lea     rax, [rsi + 4*r8]
        lea     rcx, [rsi + 4*rcx]
        mov     qword ptr [rdi], rax
        mov     qword ptr [rdi + 8], rcx
        mov     rax, rdi
        ret
.LBB0_3:
        mov     al, 1
.LBB0_4:
        push    rax
        mov     byte ptr [rsp + 7], al
        lea     rdi, [rip + .L__unnamed_1]
        lea     rcx, [rip + .L__unnamed_2]
        lea     r8, [rip + .L__unnamed_3]
        lea     rdx, [rsp + 7]
        mov     esi, 43
        call    qword ptr [rip + core::result::unwrap_failed::haccb9aaa604e1e21@GOTPCREL]

However, on beta (and nightly) this adds a whole bunch of unnecessary instructions:

after_all_unwrap:
        mov     r8, qword ptr [rcx]
        xor     eax, eax
        cmp     r8, rdx
        jae     .LBB0_4
        mov     rcx, qword ptr [rcx + 8]
        cmp     rcx, rdx
        jae     .LBB0_4
        cmp     rcx, r8
        je      .LBB0_3
        lea     rax, [rsi + 4*r8]
        lea     rcx, [rsi + 4*rcx]
        mov     rdx, rcx
        shr     rdx, 8
        mov     qword ptr [rdi], rax
        mov     byte ptr [rdi + 8], cl
        mov     rax, rcx
        shr     rax, 56
        mov     byte ptr [rdi + 15], al
        shr     rcx, 40
        mov     word ptr [rdi + 13], cx
        mov     dword ptr [rdi + 9], edx
        mov     rax, rdi
        ret
.LBB0_3:
        mov     al, 1
.LBB0_4:
        push    rax
        mov     byte ptr [rsp + 7], al
        lea     rdi, [rip + .L__unnamed_1]
        lea     rcx, [rip + .L__unnamed_2]
        lea     r8, [rip + .L__unnamed_3]
        lea     rdx, [rsp + 7]
        mov     esi, 43
        call    qword ptr [rip + core::result::unwrap_failed::h803c538fb296a320@GOTPCREL]

With cargo-bisect-rustc I bisected it to #126578. CC @scottmcm.

@rustbot label +regression-from-stable-to-beta -regression-untriaged -C-bug +perf-regression +I-heavy +I-slow

@ChayimFriedman2 ChayimFriedman2 added C-bug Category: This is a bug. regression-untriaged Untriaged performance or correctness regression. labels Jul 25, 2024
@rustbot rustbot added I-prioritize Issue: Indicates that prioritization has been requested for this issue. needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. I-heavy Issue: Problems and improvements with respect to binary size of generated code. I-slow Issue: Problems and improvements with respect to performance of generated code. perf-regression Performance regression. regression-from-stable-to-beta Performance or correctness regression from stable to beta. and removed regression-untriaged Untriaged performance or correctness regression. C-bug Category: This is a bug. labels Jul 25, 2024
@apiraino
Copy link
Contributor

WG-prioritization assigning priority (Zulip discussion).

In #126578 the perf. triaging showed some regression in binary size (though not entirely due to that patchset). Probably this is another side effect.

@rustbot label -I-prioritize +P-medium

@rustbot rustbot added P-medium Medium priority and removed I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Jul 26, 2024
@ChayimFriedman2
Copy link
Contributor Author

The root cause is a bug in LLVM: llvm/llvm-project#101899.

@Noratrieb Noratrieb removed the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Aug 5, 2024
@Mark-Simulacrum Mark-Simulacrum added regression-from-stable-to-stable Performance or correctness regression from one stable version to another. and removed regression-from-stable-to-beta Performance or correctness regression from stable to beta. labels Oct 11, 2024
@Mark-Simulacrum Mark-Simulacrum added this to the 1.81.0 milestone Oct 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I-heavy Issue: Problems and improvements with respect to binary size of generated code. I-slow Issue: Problems and improvements with respect to performance of generated code. P-medium Medium priority perf-regression Performance regression. regression-from-stable-to-stable Performance or correctness regression from one stable version to another.
Projects
None yet
Development

No branches or pull requests

5 participants