Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issue in write_all (Vec::extend_from_slice) #33518

Closed
frankmcsherry opened this issue May 9, 2016 · 7 comments
Closed

Performance issue in write_all (Vec::extend_from_slice) #33518

frankmcsherry opened this issue May 9, 2016 · 7 comments
Labels
T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@frankmcsherry
Copy link
Contributor

frankmcsherry commented May 9, 2016

Hi folks. Chatted a bit on IRC, seemed to think this wasn't obviously a dup, so reporting here.

I'm using write_all to push some bytes into a buffer. If I do this in-line, all goes well performance-wise (memcpy speeds; about 50GB/s on my machine). If I put it in a method, even with a #[inline(always)] attribute, it drops down to about 1GB/s (and assembly looks like a loop doing something).

The problem goes away if I don't push the leading 24 bytes on using write_all. Meaning, if I don't push them on, great! If I call push(0u8); 24 times, also great! Something about the existence of the preceding write_all seems to tank the perf of the second write_all (the big one). If I push 32 bytes (i.e. use a &[0u8; 32]) the problem goes away as well (quadword alignment?).

But there never seems to be a problem with the manually inlined code; it always goes nice and fast.

extern crate time;

use std::io::Write;

fn main() {

    let dataz = vec![0u8; 1 << 20];
    let mut bytes = Vec::new();

    let rounds = 1_000;
    let start = time::precise_time_ns();

    for _ in 0..rounds {
        bytes.clear();

        // these two: "average time: 81135"
        // bytes.write_all(&[0u8; 24]).unwrap();
        // bytes.write_all(&dataz[..]).unwrap();

        // this one: "average time: 530736"
        test(&dataz, &mut bytes)
    }

    println!("average time: {:?}", (time::precise_time_ns() - start) / rounds);

}

#[inline(always)]
fn test(typed: &Vec<u8>, bytes: &mut Vec<u8>) {
    // comment first line out to go fast!
    // weirdly, to me: if you replace the first line with 24x `bytes.push(0u8)` you get good performance.
    bytes.write_all(&[0u8; 24]).unwrap();
    bytes.write_all(&typed[..]).unwrap();
}

Edit: stable, beta, and nightly.

@eefriedman
Copy link
Contributor

eefriedman commented May 9, 2016

It looks like this is running into serious aliasing problems in the optimizer. This becomes a lot more obvious if you change the inline(always) to inline(never) in your testcase. rustc could be a bit smarter about inserting aliasing hints into IR; #31681 in particular would be helpful here. LLVM could also be smarter about handling the IR for a testcase like this; alias analysis is doing a terrible job, and loop unrolling is making things worse rather than better.

I think with specialization and rust-lang/rfcs#1521, extend_from_slice could be fixed to explicitly call memcpy, which would make testcases like this much less sensitive to the optimizer.

@frankmcsherry
Copy link
Contributor Author

Ah, I thought write_all did just call memcpy. I chose it a while back over extend because it was compiling down to that. At least, when I first did the benchmarking with it, it worked quite well. I guess when #31545, which #31681 points at, says that

Benchmarks suggest that the performance loss by this change is very small.

they didn't test ... copying lots of memory around. >.<

Is there a memcpy-positive form of write, or ... do I allocate enough memory and do copy_nonoverlapping ... ? :D Or maybe just chill out and wait for LLVM to fix their noalias bug?

@bluss
Copy link
Member

bluss commented May 9, 2016

It reminds me of #32155, but I have not confirmed the loop optimization failure is of the exact same kind here as there. If it is, that loop optimization regression is worrying in general, and it's "not enough" to work around it with specialization.

@eefriedman
Copy link
Contributor

It looks like the same thing to me: probably affects all tight loops of ptr::write() followed by Vec::set_len() where the optimizer can't figure out the aliasing.

@frankmcsherry
Copy link
Contributor Author

I've updated my code to use copy_nonoverlapping and recovered the performance. I'm happy to either close this as a dup, or keep it open if the specific example is helpful for eventually testing perf recovery when a #31681 fix lands.

@bluss bluss changed the title Performance issue in write_all Performance issue in write_all (Vec::extend_from_slice) May 27, 2016
@arielb1 arielb1 added I-nominated T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Sep 8, 2016
@arielb1
Copy link
Contributor

arielb1 commented Sep 8, 2016

This can be fixed by smarter handling of len: https://play.rust-lang.org/?gist=69bc48e8afc3750ef2e73fec178ccdb5&version=stable&backtrace=0

@bluss
Copy link
Member

bluss commented Sep 8, 2016

Yep, that's the gist I posted to show my WIP workaround for #32155

bors added a commit that referenced this issue Sep 12, 2016
…d, r=alexcrichton

Work around pointer aliasing issue in Vec::extend_from_slice, extend_with_element

Due to missing noalias annotations for &mut T in general (issue #31681),
in larger programs extend_from_slice and extend_with_element may both
compile very poorly. What is observed is that the .set_len() calls are
not lifted out of the loop, even for `Vec<u8>`.

Use a local length variable for the Vec length instead, and use a scope
guard to write this value back to self.len when the scope ends or on
panic. Then the alias analysis is easy.

This affects extend_from_slice, extend_with_element, the vec![x; n]
macro, Write impls for Vec<u8>, BufWriter, etc (but may / may not
have triggered since inlining can be enough for the compiler to get it right).

Fixes #32155
Fixes #33518
Closes #17844
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

5 participants