-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vec::from_elem with primitives should be as fast as calloc #7136
Comments
It turns out this is substantially slower than I expected. With this test benchmark:
I'm getting these results:
|
@huonw mentioned in irc that with
Which is much better, but still not great. |
memset "cheats" with sse. from_elem spends a lot of time in move_val_init without optimizations so there's a bunch of unnecessary function calls. With optimizations, the thing that's killing it is the copies. |
LLVM knows how to optimize loops to the same code as memcpy/memmove/memset though, as long as you generate good IR. http://llvm.org/docs/doxygen/html/LoopIdiomRecognize_8cpp_source.html |
I added a bench for the literal vec repeat syntax as well: #[bench]
fn bench_vec_repeat(b: &mut extra::test::BenchHarness) {
do b.iter {
let v: ~[u8] = ~[0u8, ..1024];
}
} Without optimizations:
With optimizations (-O):
Looking at the optimized IR, both set_memory and vec_repeat become a memset but set_memory seems to have a lot more overhead. |
@erickt: this gets a lot better with the optimization passes from #7466, from ~3x as slow to ~2x as slow Before (opt-level=2):
After (opt-level=2)
|
@luqmana: the problem is with the code bloat from the surrounding code, rather than |
#7682 greatly speeds up Before:
After:
|
The performance of all 3 has improved, but
|
It looks like the remaining issue is our pointer arithmetic being slow. The |
Closes #8118, #7136 ~~~rust extern mod extra; use std::vec; use std::ptr; fn bench_from_elem(b: &mut extra::test::BenchHarness) { do b.iter { let v: ~[u8] = vec::from_elem(1024, 0u8); } } fn bench_set_memory(b: &mut extra::test::BenchHarness) { do b.iter { let mut v: ~[u8] = vec::with_capacity(1024); unsafe { let vp = vec::raw::to_mut_ptr(v); ptr::set_memory(vp, 0, 1024); vec::raw::set_len(&mut v, 1024); } } } fn bench_vec_repeat(b: &mut extra::test::BenchHarness) { do b.iter { let v: ~[u8] = ~[0u8, ..1024]; } } ~~~ Before: test bench_from_elem ... bench: 415 ns/iter (+/- 17) test bench_set_memory ... bench: 85 ns/iter (+/- 4) test bench_vec_repeat ... bench: 83 ns/iter (+/- 3) After: test bench_from_elem ... bench: 84 ns/iter (+/- 2) test bench_set_memory ... bench: 84 ns/iter (+/- 5) test bench_vec_repeat ... bench: 84 ns/iter (+/- 3)
Fixed by #8121.
|
This has regressed since that fix, @cmr is attempting a bisect atm. Update: bisect finished, #8780 is to blame. This loop isn't optimized. Maybe the Finallyalizer drop method needs to be inlined for further optimizations. |
…alid_sugg_macro_expansion, r=llogiq manual_unwrap_or: fix invalid code suggestion, due to macro expansion fixes rust-lang#6965 changelog: fix invalid code suggestion in `manual_unwrap_or` lint, due to macro expansion
While @cmr landed a nice optimization of
vec::from_elem
in #6876, he said that it's still not performing as fast as doing a malloc and aptr::set_memory
. We should figure out why it is not performing as well as it should be and fix it in order to remove the temptation of using the faster unsafe functions.The text was updated successfully, but these errors were encountered: