Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missed optimization/perf oddity with allocations #128854

Open
peterwmwong opened this issue Aug 8, 2024 · 5 comments
Open

Missed optimization/perf oddity with allocations #128854

peterwmwong opened this issue Aug 8, 2024 · 5 comments
Labels
A-rust-for-linux Relevant for the Rust-for-Linux project C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such T-libs Relevant to the library team, which will review and decide on the PR/issue.

Comments

@peterwmwong
Copy link

peterwmwong commented Aug 8, 2024

Consider the following minimized example:

pub fn test() {
    for _ in 0..128 {
        let _ = vec![0; 32];
    }
}

Expected generated output (Rust 1.81.0):

example::test::h15f9c44deb8efbb9:
        ret

Actual output (Rust Nightly):

example::test::h15f9c44deb8efbb9:
        mov     rax, qword ptr [rip + __rust_no_alloc_shim_is_unstable@GOTPCREL]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   ecx, byte ptr [rax]
        movzx   eax, byte ptr [rax]
        ret

Godbolt: https://www.godbolt.org/z/x458Pv8P5

@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Aug 8, 2024
@workingjubilee
Copy link
Member

I know this is a "minimal reduction" but is there an example where this impacts actual programs?

@Artikae
Copy link

Artikae commented Aug 10, 2024

For reference, the actual cause, as far as I can tell, is this line in the alloc function.

// Make sure we don't accidentally allow omitting the allocator shim in
// stable code until it is actually stabilized.
core::ptr::read_volatile(&__rust_no_alloc_shim_is_unstable);

It, of course, can't be optimized out because it's volatile.

That line doesn't appear in alloc_zeroed, for some reason. (Maybe a bug?)

@jieyouxu jieyouxu added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such labels Aug 13, 2024
@peterwmwong
Copy link
Author

Just noticed a helpful rundown of the cause, why, and perf potential if fixed (@Kobzol's perf run) in this zulip conversation

@saethlin
Copy link
Member

Would you count this as fixed if we make the codegen match by regressing the good case? #130497

bors added a commit to rust-lang-ci/rust that referenced this issue Sep 18, 2024
…=<try>

read_volatile __rust_no_alloc_shim_is_unstable in alloc_zeroed

rust-lang#128854 (comment)

r? `@ghost`
bors added a commit to rust-lang-ci/rust that referenced this issue Sep 18, 2024
…=bjorn3

read_volatile __rust_no_alloc_shim_is_unstable in alloc_zeroed

It was pointed out in rust-lang#128854 (comment) that the magic volatile read was probably missing from `alloc_zeroed`. I can't find any mention of `alloc_zeroed` on rust-lang#86844, so it looks like this was just missed initially.
@peterwmwong
Copy link
Author

Bold move @saethlin 😆. I've updated bug description/repro.

Y'all do whatever you want with this, just an observation I made a while back looking into Mojo's claims being faster than Rust.

@saethlin saethlin added A-rust-for-linux Relevant for the Rust-for-Linux project T-libs Relevant to the library team, which will review and decide on the PR/issue. and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Sep 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-rust-for-linux Relevant for the Rust-for-Linux project C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

6 participants