-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
std: Avoid ptr::copy
if unnecessary in vec::Drain
#50575
Conversation
This commit is spawned out of a performance regression investigation in rust-lang#50496. In tracking down this regression it turned out that the `expand_statements` function in the compiler was taking quite a long time. Further investigation showed two key properties: * The function was "fast" on glibc 2.24 and slow on glibc 2.23 * The hottest function was memmove from glibc Combined together it looked like glibc gained an optimization to the memmove function in 2.24. Ideally we don't want to rely on this optimization, so I wanted to dig further to see what was happening. The hottest part of `expand_statements` was `Drop for Drain` in the call to `splice` where we insert new statements into the original vector. This *should* be a cheap operation because we're draining and replacing iterators of the exact same length, but under the hood memmove was being called a lot, causing a slowdown on glibc 2.23. It turns out that at least one of the optimizations in glibc 2.24 was that `memmove` where the src/dst are equal becomes much faster. [This program][prog] executes in ~2.5s against glibc 2.23 and ~0.3s against glibc 2.24, exhibiting how glibc 2.24 is optimizing `memmove` if the src/dst are equal. And all that brings us to what this commit itself is doing. The change here is purely to `Drop for Drain` to avoid the call to `ptr::copy` if the region being copied doesn't actually need to be copied. For normal usage of just `Drain` itself this check isn't really necessary, but because `Splice` internally contains `Drain` this provides a nice speed boost on glibc 2.23. Overall this should fix the regression seen in rust-lang#50496 on glibc 2.23 and also fix the regression on Windows where `memmove` looks to not have this optimization. Note that the way `splice` was called in `expand_statements` would cause a quadratic number of elements to be copied via `memmove` which is likely why the tuple-stress benchmark showed such a severe regression. Closes rust-lang#50496 [prog]: https://gist.github.com/alexcrichton/c05bc51c6771bba5ae5b57561a6c1cd3
r? @aidanhs (rust_highfive has picked a reviewer for you, use r? to override) |
@bors try @Mark-Simulacrum could we do a perf check? |
std: Avoid `ptr::copy` if unnecessary in `vec::Drain` This commit is spawned out of a performance regression investigation in #50496. In tracking down this regression it turned out that the `expand_statements` function in the compiler was taking quite a long time. Further investigation showed two key properties: * The function was "fast" on glibc 2.24 and slow on glibc 2.23 * The hottest function was memmove from glibc Combined together it looked like glibc gained an optimization to the memmove function in 2.24. Ideally we don't want to rely on this optimization, so I wanted to dig further to see what was happening. The hottest part of `expand_statements` was `Drop for Drain` in the call to `splice` where we insert new statements into the original vector. This *should* be a cheap operation because we're draining and replacing iterators of the exact same length, but under the hood memmove was being called a lot, causing a slowdown on glibc 2.23. It turns out that at least one of the optimizations in glibc 2.24 was that `memmove` where the src/dst are equal becomes much faster. [This program][prog] executes in ~2.5s against glibc 2.23 and ~0.3s against glibc 2.24, exhibiting how glibc 2.24 is optimizing `memmove` if the src/dst are equal. And all that brings us to what this commit itself is doing. The change here is purely to `Drop for Drain` to avoid the call to `ptr::copy` if the region being copied doesn't actually need to be copied. For normal usage of just `Drain` itself this check isn't really necessary, but because `Splice` internally contains `Drain` this provides a nice speed boost on glibc 2.23. Overall this should fix the regression seen in #50496 on glibc 2.23 and also fix the regression on Windows where `memmove` looks to not have this optimization. Note that the way `splice` was called in `expand_statements` would cause a quadratic number of elements to be copied via `memmove` which is likely why the tuple-stress benchmark showed such a severe regression. Closes #50496 [prog]: https://gist.github.com/alexcrichton/c05bc51c6771bba5ae5b57561a6c1cd3
Also beta-nominating this for 1.27. |
Perf has been queued. |
This seems like a reasonable thing to do even if it doesn't fix the perf regression. r=me |
☀️ Test successful - status-travis |
FWIW locally on my computer (glibc 2.23)
|
While the link I pasted above doesn't work yet this is the results comparing to the previous commit on master so either this PR or the rollup before it caused the speed boost, but I'm gonna optimistically say it was this PR :) @bors: r=sfackler |
📌 Commit 254b601 has been approved by |
@bors: rollup |
…fackler std: Avoid `ptr::copy` if unnecessary in `vec::Drain` This commit is spawned out of a performance regression investigation in rust-lang#50496. In tracking down this regression it turned out that the `expand_statements` function in the compiler was taking quite a long time. Further investigation showed two key properties: * The function was "fast" on glibc 2.24 and slow on glibc 2.23 * The hottest function was memmove from glibc Combined together it looked like glibc gained an optimization to the memmove function in 2.24. Ideally we don't want to rely on this optimization, so I wanted to dig further to see what was happening. The hottest part of `expand_statements` was `Drop for Drain` in the call to `splice` where we insert new statements into the original vector. This *should* be a cheap operation because we're draining and replacing iterators of the exact same length, but under the hood memmove was being called a lot, causing a slowdown on glibc 2.23. It turns out that at least one of the optimizations in glibc 2.24 was that `memmove` where the src/dst are equal becomes much faster. [This program][prog] executes in ~2.5s against glibc 2.23 and ~0.3s against glibc 2.24, exhibiting how glibc 2.24 is optimizing `memmove` if the src/dst are equal. And all that brings us to what this commit itself is doing. The change here is purely to `Drop for Drain` to avoid the call to `ptr::copy` if the region being copied doesn't actually need to be copied. For normal usage of just `Drain` itself this check isn't really necessary, but because `Splice` internally contains `Drain` this provides a nice speed boost on glibc 2.23. Overall this should fix the regression seen in rust-lang#50496 on glibc 2.23 and also fix the regression on Windows where `memmove` looks to not have this optimization. Note that the way `splice` was called in `expand_statements` would cause a quadratic number of elements to be copied via `memmove` which is likely why the tuple-stress benchmark showed such a severe regression. Closes rust-lang#50496 [prog]: https://gist.github.com/alexcrichton/c05bc51c6771bba5ae5b57561a6c1cd3
…fackler std: Avoid `ptr::copy` if unnecessary in `vec::Drain` This commit is spawned out of a performance regression investigation in rust-lang#50496. In tracking down this regression it turned out that the `expand_statements` function in the compiler was taking quite a long time. Further investigation showed two key properties: * The function was "fast" on glibc 2.24 and slow on glibc 2.23 * The hottest function was memmove from glibc Combined together it looked like glibc gained an optimization to the memmove function in 2.24. Ideally we don't want to rely on this optimization, so I wanted to dig further to see what was happening. The hottest part of `expand_statements` was `Drop for Drain` in the call to `splice` where we insert new statements into the original vector. This *should* be a cheap operation because we're draining and replacing iterators of the exact same length, but under the hood memmove was being called a lot, causing a slowdown on glibc 2.23. It turns out that at least one of the optimizations in glibc 2.24 was that `memmove` where the src/dst are equal becomes much faster. [This program][prog] executes in ~2.5s against glibc 2.23 and ~0.3s against glibc 2.24, exhibiting how glibc 2.24 is optimizing `memmove` if the src/dst are equal. And all that brings us to what this commit itself is doing. The change here is purely to `Drop for Drain` to avoid the call to `ptr::copy` if the region being copied doesn't actually need to be copied. For normal usage of just `Drain` itself this check isn't really necessary, but because `Splice` internally contains `Drain` this provides a nice speed boost on glibc 2.23. Overall this should fix the regression seen in rust-lang#50496 on glibc 2.23 and also fix the regression on Windows where `memmove` looks to not have this optimization. Note that the way `splice` was called in `expand_statements` would cause a quadratic number of elements to be copied via `memmove` which is likely why the tuple-stress benchmark showed such a severe regression. Closes rust-lang#50496 [prog]: https://gist.github.com/alexcrichton/c05bc51c6771bba5ae5b57561a6c1cd3
Rollup of 18 pull requests Successful merges: - #49423 (Extend tests for RFC1598 (GAT)) - #50010 (Give SliceIndex impls a test suite of girth befitting the implementation (and fix a UTF8 boundary check)) - #50447 (Fix update-references for tests within subdirectories.) - #50514 (Pull in a wasm fix from LLVM upstream) - #50524 (Make DepGraph::previous_work_products immutable) - #50532 (Don't use Lock for heavily accessed CrateMetadata::cnum_map.) - #50538 ( Make CrateNum allocation more thread-safe. ) - #50564 (Inline `Span` methods.) - #50565 (Use SmallVec for DepNodeIndex within dep_graph.) - #50569 (Allow for specifying a linker plugin for cross-language LTO) - #50572 (Clarify in the docs that `mul_add` is not always faster.) - #50574 (add fn `into_inner(self) -> (Idx, Idx)` to RangeInclusive (#49022)) - #50575 (std: Avoid `ptr::copy` if unnecessary in `vec::Drain`) - #50588 (Move "See also" disambiguation links for primitive types to top) - #50590 (Fix tuple struct field spans) - #50591 (Restore RawVec::reserve* documentation) - #50598 (Remove unnecessary mutable borrow and resizing in DepGraph::serialize) - #50606 (Retry when downloading the Docker cache.) Failed merges: - #50161 (added missing implementation hint) - #50558 (Remove all reference to DepGraph::work_products)
[beta] Process backports * #50575: std: Avoid `ptr::copy` if unnecessary in `vec::Drain` r? @alexcrichton
This commit is spawned out of a performance regression investigation in #50496.
In tracking down this regression it turned out that the
expand_statements
function in the compiler was taking quite a long time. Further investigation
showed two key properties:
Combined together it looked like glibc gained an optimization to the memmove
function in 2.24. Ideally we don't want to rely on this optimization, so I
wanted to dig further to see what was happening.
The hottest part of
expand_statements
wasDrop for Drain
in the call tosplice
where we insert new statements into the original vector. This shouldbe a cheap operation because we're draining and replacing iterators of the exact
same length, but under the hood memmove was being called a lot, causing a
slowdown on glibc 2.23.
It turns out that at least one of the optimizations in glibc 2.24 was that
memmove
where the src/dst are equal becomes much faster. This programexecutes in ~2.5s against glibc 2.23 and ~0.3s against glibc 2.24, exhibiting
how glibc 2.24 is optimizing
memmove
if the src/dst are equal.And all that brings us to what this commit itself is doing. The change here is
purely to
Drop for Drain
to avoid the call toptr::copy
if the region beingcopied doesn't actually need to be copied. For normal usage of just
Drain
itself this check isn't really necessary, but because
Splice
internallycontains
Drain
this provides a nice speed boost on glibc 2.23. Overall thisshould fix the regression seen in #50496 on glibc 2.23 and also fix the
regression on Windows where
memmove
looks to not have this optimization.Note that the way
splice
was called inexpand_statements
would cause aquadratic number of elements to be copied via
memmove
which is likely why thetuple-stress benchmark showed such a severe regression.
Closes #50496