Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nightly performance issue with vec::extend_with_element #32155

Closed
keeperofdakeys opened this issue Mar 9, 2016 · 15 comments
Closed

Nightly performance issue with vec::extend_with_element #32155

keeperofdakeys opened this issue Mar 9, 2016 · 15 comments
Labels
I-slow Issue: Problems and improvements with respect to performance of generated code.

Comments

@keeperofdakeys
Copy link
Contributor

When compiling this program with nightly and beta, the runtime increases by 1/3, and perf reports that "vec::Vec$LT$T$GT$::extend_with_element::..." is taking 20% of the runtime. This doesn't occur with the stable compiler.

Are there any tools, or perf options, that I can use to further debug this?

@keeperofdakeys
Copy link
Contributor Author

If I'm reading callgrind/kcachegrind output correctly, this is being called by io::buffered::BufReader::...::with_capacity, but the extend_with_element function is the thing that's actually taking all the time.

@alexcrichton
Copy link
Member

Thanks for the report! This may be related to some minor tweak in the standard library or maybe an LLVM update (not sure), but is there is something that we can run locally to help diagnose as well?

The main profilers I've used at least are Instruments.app and perf, but they basically just tell you a hot function which needs to be looked at. If there's only one hot function then you can look at the LLVM IR and try to see what the differences are as well.

@bluss
Copy link
Member

bluss commented Mar 9, 2016

I can reproduce with for example the code vec![0u8; n] where n is not compile time constant.

version rustc 1.9.0-nightly (998a672 2016-03-07)

@bluss
Copy link
Member

bluss commented Mar 9, 2016

I'll have to check if #31999 can have caused this

@alexcrichton
Copy link
Member

@bluss are you sure? The IR for this function looks exactly the same on stable/beta/nightly for me:

pub fn foo(n: usize) -> Vec<u8> {
    vec![0; n]
}

@bluss
Copy link
Member

bluss commented Mar 9, 2016

I'm trying to reduce a testcase, I've only gotten it down to this: https://gist.github.com/bluss/c31b308feb347067ab19

The case that is slow in nightly is fillvec_vec_macro_u8. It's slow in -C opt--level=3 but not in -C opt-level=2

@bluss
Copy link
Member

bluss commented Mar 9, 2016

the .as_ptr() change is unrelated (I tried with it reverted).

@keeperofdakeys
Copy link
Contributor Author

@bluss From what I could see, this was happening on Beta as well. So it might be easier to find the problem commit on that branch.

@steveklabnik steveklabnik added the I-slow Issue: Problems and improvements with respect to performance of generated code. label Mar 11, 2016
@keeperofdakeys
Copy link
Contributor Author

I've done some further investigation, and found the compiler version where this bug was first introduced. On rust-nightly 2016-02-11 the regression did not occur, on rust-nightly 2016-02-12 (and all following tested versions) the regression did occur.

Interestingly the testcase made by @bluss was almost identical for both of these compiler versions. Though strangely, there has been a ten-times speedup for resize, vec_macro_u32 and vec_macro_u8 between 2016-02-11/12 nightlies and a recent nightly.

For testing, I'm currently using perf report to identify a high count of vec::Vec$LT$T$GT$::extend_with_element when running the "psq" command from this repo. There is also a high count of "isolate_freepages_block", "je_arena_malloc_large", and "arena_chunk_alloc" on the problem compilers, so I'll try further investigating this in that direction.

@bluss
Copy link
Member

bluss commented Mar 12, 2016

Very valuable results! We can look widely at the relevant commits: git log --oneline --graph 3f4227af139f1da30710a9f07dc28e5a3ccc6fe5..ce4b75f25662cb9facafc4bef368410a2979b936

This commit range includes @dotdash's noalias removal in #31545

(We can scrap my testcase if it's not relevant.)

@bluss
Copy link
Member

bluss commented Mar 12, 2016

jemalloc was updated to 4.1.0 a week ago, we also now use it without the je_ prefix (affects some llvm malloc special casing).

@keeperofdakeys
Copy link
Contributor Author

Interestingly this doesn't seem to occur when I use opt-level=2 with a recent nightly compiler.

@keeperofdakeys
Copy link
Contributor Author

Alright, so after waiting though many hours of compilation, I've worked out it's definitely the noalias change introduced in a17fb64. (Or at least, the regression occurs on this commit, but does not occur for the previous commit).

I'll also mention again that this does impact the beta channel, so if there is some kind of fix for this, it would need to be back ported. If you want me to do any further diagnostics, I'll be happy to run them. But for now I don't think there is much else I can do with this.

@bluss
Copy link
Member

bluss commented Mar 13, 2016

What I see is that in the program psq, in the original report, with current nightly, extend_with_element compiles into a loop where it repeatedly sets the length of the vector in the loop. This is intended to be something that the optimizer lifts out of the loop.

       │ e0:   mov    BYTE PTR [rdx+rcx*1],0x0
 28,57lea    rsi,[rdi+rcx*1]
mov    QWORD PTR [rbx+0x10],rsi
mov    BYTE PTR [rdx+rcx*1+0x1],0x0
lea    rsi,[rdi+rcx*1+0x1]
mov    QWORD PTR [rbx+0x10],rsi
mov    BYTE PTR [rdx+rcx*1+0x2],0x0
 14,29lea    rsi,[rdi+rcx*1+0x2]
mov    QWORD PTR [rbx+0x10],rsi
mov    BYTE PTR [rdx+rcx*1+0x3],0x0
 28,57lea    rsi,[rdi+rcx*1+0x3]
mov    QWORD PTR [rbx+0x10],rsi
 14,29add    rcx,0x4
cmp    r8,rcx
       │     ↑ jne    e0      

tikue added a commit to tikue/tarpc that referenced this issue May 28, 2016
There was actually just a problem with the benchmark. gen_vec
was not being optimized, but only for bench_tarpc.
Potentially related to rust-lang/rust#32155?
@bluss
Copy link
Member

bluss commented Sep 7, 2016

Triage: Still exists in rustc 1.13.0-nightly (923bac4 2016-09-06)

bors added a commit that referenced this issue Sep 12, 2016
…d, r=alexcrichton

Work around pointer aliasing issue in Vec::extend_from_slice, extend_with_element

Due to missing noalias annotations for &mut T in general (issue #31681),
in larger programs extend_from_slice and extend_with_element may both
compile very poorly. What is observed is that the .set_len() calls are
not lifted out of the loop, even for `Vec<u8>`.

Use a local length variable for the Vec length instead, and use a scope
guard to write this value back to self.len when the scope ends or on
panic. Then the alias analysis is easy.

This affects extend_from_slice, extend_with_element, the vec![x; n]
macro, Write impls for Vec<u8>, BufWriter, etc (but may / may not
have triggered since inlining can be enough for the compiler to get it right).

Fixes #32155
Fixes #33518
Closes #17844
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I-slow Issue: Problems and improvements with respect to performance of generated code.
Projects
None yet
Development

No branches or pull requests

4 participants