-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance issue in write_all
(Vec::extend_from_slice
)
#33518
Comments
It looks like this is running into serious aliasing problems in the optimizer. This becomes a lot more obvious if you change the I think with specialization and rust-lang/rfcs#1521, extend_from_slice could be fixed to explicitly call memcpy, which would make testcases like this much less sensitive to the optimizer. |
Ah, I thought
they didn't test ... copying lots of memory around. >.< Is there a memcpy-positive form of |
It reminds me of #32155, but I have not confirmed the loop optimization failure is of the exact same kind here as there. If it is, that loop optimization regression is worrying in general, and it's "not enough" to work around it with specialization. |
It looks like the same thing to me: probably affects all tight loops of |
I've updated my code to use |
write_all
write_all
(Vec::extend_from_slice
)
This can be fixed by smarter handling of |
Yep, that's the gist I posted to show my WIP workaround for #32155 |
…d, r=alexcrichton Work around pointer aliasing issue in Vec::extend_from_slice, extend_with_element Due to missing noalias annotations for &mut T in general (issue #31681), in larger programs extend_from_slice and extend_with_element may both compile very poorly. What is observed is that the .set_len() calls are not lifted out of the loop, even for `Vec<u8>`. Use a local length variable for the Vec length instead, and use a scope guard to write this value back to self.len when the scope ends or on panic. Then the alias analysis is easy. This affects extend_from_slice, extend_with_element, the vec![x; n] macro, Write impls for Vec<u8>, BufWriter, etc (but may / may not have triggered since inlining can be enough for the compiler to get it right). Fixes #32155 Fixes #33518 Closes #17844
Hi folks. Chatted a bit on IRC, seemed to think this wasn't obviously a dup, so reporting here.
I'm using
write_all
to push some bytes into a buffer. If I do this in-line, all goes well performance-wise (memcpy speeds; about 50GB/s on my machine). If I put it in a method, even with a#[inline(always)]
attribute, it drops down to about 1GB/s (and assembly looks like a loop doing something).The problem goes away if I don't push the leading 24 bytes on using
write_all
. Meaning, if I don't push them on, great! If I callpush(0u8);
24 times, also great! Something about the existence of the precedingwrite_all
seems to tank the perf of the secondwrite_all
(the big one). If I push 32 bytes (i.e. use a&[0u8; 32]
) the problem goes away as well (quadword alignment?).But there never seems to be a problem with the manually inlined code; it always goes nice and fast.
Edit: stable, beta, and nightly.
The text was updated successfully, but these errors were encountered: