Performance issue in `write_all` (`Vec::extend_from_slice`) #33518

frankmcsherry · 2016-05-09T16:45:44Z

Hi folks. Chatted a bit on IRC, seemed to think this wasn't obviously a dup, so reporting here.

I'm using write_all to push some bytes into a buffer. If I do this in-line, all goes well performance-wise (memcpy speeds; about 50GB/s on my machine). If I put it in a method, even with a #[inline(always)] attribute, it drops down to about 1GB/s (and assembly looks like a loop doing something).

The problem goes away if I don't push the leading 24 bytes on using write_all. Meaning, if I don't push them on, great! If I call push(0u8); 24 times, also great! Something about the existence of the preceding write_all seems to tank the perf of the second write_all (the big one). If I push 32 bytes (i.e. use a &[0u8; 32]) the problem goes away as well (quadword alignment?).

But there never seems to be a problem with the manually inlined code; it always goes nice and fast.

extern crate time;

use std::io::Write;

fn main() {

    let dataz = vec![0u8; 1 << 20];
    let mut bytes = Vec::new();

    let rounds = 1_000;
    let start = time::precise_time_ns();

    for _ in 0..rounds {
        bytes.clear();

        // these two: "average time: 81135"
        // bytes.write_all(&[0u8; 24]).unwrap();
        // bytes.write_all(&dataz[..]).unwrap();

        // this one: "average time: 530736"
        test(&dataz, &mut bytes)
    }

    println!("average time: {:?}", (time::precise_time_ns() - start) / rounds);

}

#[inline(always)]
fn test(typed: &Vec<u8>, bytes: &mut Vec<u8>) {
    // comment first line out to go fast!
    // weirdly, to me: if you replace the first line with 24x `bytes.push(0u8)` you get good performance.
    bytes.write_all(&[0u8; 24]).unwrap();
    bytes.write_all(&typed[..]).unwrap();
}

Edit: stable, beta, and nightly.

The text was updated successfully, but these errors were encountered:

eefriedman · 2016-05-09T21:14:02Z

It looks like this is running into serious aliasing problems in the optimizer. This becomes a lot more obvious if you change the inline(always) to inline(never) in your testcase. rustc could be a bit smarter about inserting aliasing hints into IR; #31681 in particular would be helpful here. LLVM could also be smarter about handling the IR for a testcase like this; alias analysis is doing a terrible job, and loop unrolling is making things worse rather than better.

I think with specialization and rust-lang/rfcs#1521, extend_from_slice could be fixed to explicitly call memcpy, which would make testcases like this much less sensitive to the optimizer.

frankmcsherry · 2016-05-09T21:31:09Z

Ah, I thought write_all did just call memcpy. I chose it a while back over extend because it was compiling down to that. At least, when I first did the benchmarking with it, it worked quite well. I guess when #31545, which #31681 points at, says that

Benchmarks suggest that the performance loss by this change is very small.

they didn't test ... copying lots of memory around. >.<

Is there a memcpy-positive form of write, or ... do I allocate enough memory and do copy_nonoverlapping ... ? :D Or maybe just chill out and wait for LLVM to fix their noalias bug?

bluss · 2016-05-09T22:24:50Z

It reminds me of #32155, but I have not confirmed the loop optimization failure is of the exact same kind here as there. If it is, that loop optimization regression is worrying in general, and it's "not enough" to work around it with specialization.

eefriedman · 2016-05-09T23:00:13Z

It looks like the same thing to me: probably affects all tight loops of ptr::write() followed by Vec::set_len() where the optimizer can't figure out the aliasing.

frankmcsherry · 2016-05-10T07:57:39Z

I've updated my code to use copy_nonoverlapping and recovered the performance. I'm happy to either close this as a dup, or keep it open if the specific example is helpful for eventually testing perf recovery when a #31681 fix lands.

arielb1 · 2016-09-08T18:03:52Z

This can be fixed by smarter handling of len: https://play.rust-lang.org/?gist=69bc48e8afc3750ef2e73fec178ccdb5&version=stable&backtrace=0

bluss · 2016-09-08T18:08:18Z

Yep, that's the gist I posted to show my WIP workaround for #32155

…d, r=alexcrichton Work around pointer aliasing issue in Vec::extend_from_slice, extend_with_element Due to missing noalias annotations for &mut T in general (issue #31681), in larger programs extend_from_slice and extend_with_element may both compile very poorly. What is observed is that the .set_len() calls are not lifted out of the loop, even for `Vec<u8>`. Use a local length variable for the Vec length instead, and use a scope guard to write this value back to self.len when the scope ends or on panic. Then the alias analysis is easy. This affects extend_from_slice, extend_with_element, the vec![x; n] macro, Write impls for Vec<u8>, BufWriter, etc (but may / may not have triggered since inlining can be enough for the compiler to get it right). Fixes #32155 Fixes #33518 Closes #17844

apasel422 added the A-libs label May 18, 2016

bluss changed the title ~~Performance issue in write_all~~ Performance issue in write_all (Vec::extend_from_slice) May 27, 2016

arielb1 added I-nominated T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Sep 8, 2016

This was referenced Sep 8, 2016

Remove unnecessary Option from BufWriter #36336

Closed

Work around pointer aliasing issue in Vec::extend_from_slice, extend_with_element #36355

Merged

Vec::clone and String::clone are very slow #17844

Closed

bors closed this as completed in #36355 Sep 12, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance issue in `write_all` (`Vec::extend_from_slice`) #33518

Performance issue in `write_all` (`Vec::extend_from_slice`) #33518

frankmcsherry commented May 9, 2016 •

edited

Loading

eefriedman commented May 9, 2016 •

edited

Loading

frankmcsherry commented May 9, 2016

bluss commented May 9, 2016

eefriedman commented May 9, 2016

frankmcsherry commented May 10, 2016

arielb1 commented Sep 8, 2016

bluss commented Sep 8, 2016

Performance issue in write_all (Vec::extend_from_slice) #33518

Performance issue in write_all (Vec::extend_from_slice) #33518

Comments

frankmcsherry commented May 9, 2016 • edited Loading

eefriedman commented May 9, 2016 • edited Loading

frankmcsherry commented May 9, 2016

bluss commented May 9, 2016

eefriedman commented May 9, 2016

frankmcsherry commented May 10, 2016

arielb1 commented Sep 8, 2016

bluss commented Sep 8, 2016

Performance issue in `write_all` (`Vec::extend_from_slice`) #33518

Performance issue in `write_all` (`Vec::extend_from_slice`) #33518

frankmcsherry commented May 9, 2016 •

edited

Loading

eefriedman commented May 9, 2016 •

edited

Loading