-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nightly performance issue with vec::extend_with_element #32155
Comments
If I'm reading callgrind/kcachegrind output correctly, this is being called by io::buffered::BufReader::...::with_capacity, but the extend_with_element function is the thing that's actually taking all the time. |
Thanks for the report! This may be related to some minor tweak in the standard library or maybe an LLVM update (not sure), but is there is something that we can run locally to help diagnose as well? The main profilers I've used at least are |
I can reproduce with for example the code version rustc 1.9.0-nightly (998a672 2016-03-07) |
I'll have to check if #31999 can have caused this |
@bluss are you sure? The IR for this function looks exactly the same on stable/beta/nightly for me: pub fn foo(n: usize) -> Vec<u8> {
vec![0; n]
} |
I'm trying to reduce a testcase, I've only gotten it down to this: https://gist.github.com/bluss/c31b308feb347067ab19 The case that is slow in nightly is fillvec_vec_macro_u8. It's slow in |
the .as_ptr() change is unrelated (I tried with it reverted). |
@bluss From what I could see, this was happening on Beta as well. So it might be easier to find the problem commit on that branch. |
I've done some further investigation, and found the compiler version where this bug was first introduced. On rust-nightly 2016-02-11 the regression did not occur, on rust-nightly 2016-02-12 (and all following tested versions) the regression did occur. Interestingly the testcase made by @bluss was almost identical for both of these compiler versions. Though strangely, there has been a ten-times speedup for resize, vec_macro_u32 and vec_macro_u8 between 2016-02-11/12 nightlies and a recent nightly. For testing, I'm currently using perf report to identify a high count of vec::Vec$LT$T$GT$::extend_with_element when running the "psq" command from this repo. There is also a high count of "isolate_freepages_block", "je_arena_malloc_large", and "arena_chunk_alloc" on the problem compilers, so I'll try further investigating this in that direction. |
jemalloc was updated to 4.1.0 a week ago, we also now use it without the je_ prefix (affects some llvm malloc special casing). |
Interestingly this doesn't seem to occur when I use opt-level=2 with a recent nightly compiler. |
Alright, so after waiting though many hours of compilation, I've worked out it's definitely the noalias change introduced in a17fb64. (Or at least, the regression occurs on this commit, but does not occur for the previous commit). I'll also mention again that this does impact the beta channel, so if there is some kind of fix for this, it would need to be back ported. If you want me to do any further diagnostics, I'll be happy to run them. But for now I don't think there is much else I can do with this. |
What I see is that in the program psq, in the original report, with current nightly, extend_with_element compiles into a loop where it repeatedly sets the length of the vector in the loop. This is intended to be something that the optimizer lifts out of the loop. │ e0: mov BYTE PTR [rdx+rcx*1],0x0 ▒
28,57 │ lea rsi,[rdi+rcx*1] ▒
│ mov QWORD PTR [rbx+0x10],rsi ▒
│ mov BYTE PTR [rdx+rcx*1+0x1],0x0 ▒
│ lea rsi,[rdi+rcx*1+0x1] ▒
│ mov QWORD PTR [rbx+0x10],rsi ▒
│ mov BYTE PTR [rdx+rcx*1+0x2],0x0 ▒
14,29 │ lea rsi,[rdi+rcx*1+0x2] ▒
│ mov QWORD PTR [rbx+0x10],rsi ▒
│ mov BYTE PTR [rdx+rcx*1+0x3],0x0 ▒
28,57 │ lea rsi,[rdi+rcx*1+0x3] ▒
│ mov QWORD PTR [rbx+0x10],rsi ▒
14,29 │ add rcx,0x4 ▒
│ cmp r8,rcx ▒
│ ↑ jne e0 |
There was actually just a problem with the benchmark. gen_vec was not being optimized, but only for bench_tarpc. Potentially related to rust-lang/rust#32155?
Triage: Still exists in rustc 1.13.0-nightly (923bac4 2016-09-06) |
…d, r=alexcrichton Work around pointer aliasing issue in Vec::extend_from_slice, extend_with_element Due to missing noalias annotations for &mut T in general (issue #31681), in larger programs extend_from_slice and extend_with_element may both compile very poorly. What is observed is that the .set_len() calls are not lifted out of the loop, even for `Vec<u8>`. Use a local length variable for the Vec length instead, and use a scope guard to write this value back to self.len when the scope ends or on panic. Then the alias analysis is easy. This affects extend_from_slice, extend_with_element, the vec![x; n] macro, Write impls for Vec<u8>, BufWriter, etc (but may / may not have triggered since inlining can be enough for the compiler to get it right). Fixes #32155 Fixes #33518 Closes #17844
When compiling this program with nightly and beta, the runtime increases by 1/3, and perf reports that "vec::Vec$LT$T$GT$::extend_with_element::..." is taking 20% of the runtime. This doesn't occur with the stable compiler.
Are there any tools, or perf options, that I can use to further debug this?
The text was updated successfully, but these errors were encountered: