std: Avoid `ptr::copy` if unnecessary in `vec::Drain` #50575

alexcrichton · 2018-05-09T16:11:20Z

This commit is spawned out of a performance regression investigation in #50496.
In tracking down this regression it turned out that the expand_statements
function in the compiler was taking quite a long time. Further investigation
showed two key properties:

The function was "fast" on glibc 2.24 and slow on glibc 2.23
The hottest function was memmove from glibc

Combined together it looked like glibc gained an optimization to the memmove
function in 2.24. Ideally we don't want to rely on this optimization, so I
wanted to dig further to see what was happening.

The hottest part of expand_statements was Drop for Drain in the call to
splice where we insert new statements into the original vector. This should
be a cheap operation because we're draining and replacing iterators of the exact
same length, but under the hood memmove was being called a lot, causing a
slowdown on glibc 2.23.

It turns out that at least one of the optimizations in glibc 2.24 was that
memmove where the src/dst are equal becomes much faster. This program
executes in ~2.5s against glibc 2.23 and ~0.3s against glibc 2.24, exhibiting
how glibc 2.24 is optimizing memmove if the src/dst are equal.

And all that brings us to what this commit itself is doing. The change here is
purely to Drop for Drain to avoid the call to ptr::copy if the region being
copied doesn't actually need to be copied. For normal usage of just Drain
itself this check isn't really necessary, but because Splice internally
contains Drain this provides a nice speed boost on glibc 2.23. Overall this
should fix the regression seen in #50496 on glibc 2.23 and also fix the
regression on Windows where memmove looks to not have this optimization.

Note that the way splice was called in expand_statements would cause a
quadratic number of elements to be copied via memmove which is likely why the
tuple-stress benchmark showed such a severe regression.

Closes #50496

This commit is spawned out of a performance regression investigation in rust-lang#50496. In tracking down this regression it turned out that the `expand_statements` function in the compiler was taking quite a long time. Further investigation showed two key properties: * The function was "fast" on glibc 2.24 and slow on glibc 2.23 * The hottest function was memmove from glibc Combined together it looked like glibc gained an optimization to the memmove function in 2.24. Ideally we don't want to rely on this optimization, so I wanted to dig further to see what was happening. The hottest part of `expand_statements` was `Drop for Drain` in the call to `splice` where we insert new statements into the original vector. This *should* be a cheap operation because we're draining and replacing iterators of the exact same length, but under the hood memmove was being called a lot, causing a slowdown on glibc 2.23. It turns out that at least one of the optimizations in glibc 2.24 was that `memmove` where the src/dst are equal becomes much faster. [This program][prog] executes in ~2.5s against glibc 2.23 and ~0.3s against glibc 2.24, exhibiting how glibc 2.24 is optimizing `memmove` if the src/dst are equal. And all that brings us to what this commit itself is doing. The change here is purely to `Drop for Drain` to avoid the call to `ptr::copy` if the region being copied doesn't actually need to be copied. For normal usage of just `Drain` itself this check isn't really necessary, but because `Splice` internally contains `Drain` this provides a nice speed boost on glibc 2.23. Overall this should fix the regression seen in rust-lang#50496 on glibc 2.23 and also fix the regression on Windows where `memmove` looks to not have this optimization. Note that the way `splice` was called in `expand_statements` would cause a quadratic number of elements to be copied via `memmove` which is likely why the tuple-stress benchmark showed such a severe regression. Closes rust-lang#50496 [prog]: https://gist.github.com/alexcrichton/c05bc51c6771bba5ae5b57561a6c1cd3

rust-highfive · 2018-05-09T16:11:23Z

r? @aidanhs

(rust_highfive has picked a reviewer for you, use r? to override)

kennytm · 2018-05-09T16:17:58Z

@bors try

@Mark-Simulacrum could we do a perf check?

bors · 2018-05-09T16:18:10Z

⌛ Trying commit 254b601 with merge 1aac55d...

std: Avoid `ptr::copy` if unnecessary in `vec::Drain` This commit is spawned out of a performance regression investigation in #50496. In tracking down this regression it turned out that the `expand_statements` function in the compiler was taking quite a long time. Further investigation showed two key properties: * The function was "fast" on glibc 2.24 and slow on glibc 2.23 * The hottest function was memmove from glibc Combined together it looked like glibc gained an optimization to the memmove function in 2.24. Ideally we don't want to rely on this optimization, so I wanted to dig further to see what was happening. The hottest part of `expand_statements` was `Drop for Drain` in the call to `splice` where we insert new statements into the original vector. This *should* be a cheap operation because we're draining and replacing iterators of the exact same length, but under the hood memmove was being called a lot, causing a slowdown on glibc 2.23. It turns out that at least one of the optimizations in glibc 2.24 was that `memmove` where the src/dst are equal becomes much faster. [This program][prog] executes in ~2.5s against glibc 2.23 and ~0.3s against glibc 2.24, exhibiting how glibc 2.24 is optimizing `memmove` if the src/dst are equal. And all that brings us to what this commit itself is doing. The change here is purely to `Drop for Drain` to avoid the call to `ptr::copy` if the region being copied doesn't actually need to be copied. For normal usage of just `Drain` itself this check isn't really necessary, but because `Splice` internally contains `Drain` this provides a nice speed boost on glibc 2.23. Overall this should fix the regression seen in #50496 on glibc 2.23 and also fix the regression on Windows where `memmove` looks to not have this optimization. Note that the way `splice` was called in `expand_statements` would cause a quadratic number of elements to be copied via `memmove` which is likely why the tuple-stress benchmark showed such a severe regression. Closes #50496 [prog]: https://gist.github.com/alexcrichton/c05bc51c6771bba5ae5b57561a6c1cd3

kennytm · 2018-05-09T16:22:57Z

Also beta-nominating this for 1.27.

Mark-Simulacrum · 2018-05-09T17:56:41Z

Perf has been queued.

sfackler · 2018-05-09T18:15:16Z

This seems like a reasonable thing to do even if it doesn't fix the perf regression. r=me

bors · 2018-05-09T18:29:47Z

☀️ Test successful - status-travis
State: approved= try=True

alexcrichton · 2018-05-09T18:39:24Z

FWIW locally on my computer (glibc 2.23)

$ time rustc +stable main.rs
rustc +stable main.rs  4.94s user 0.18s system 100% cpu 5.080 total
$ time rustc +beta main.rs
rustc +beta main.rs  5.00s user 0.09s system 98% cpu 5.185 total
$ time rustc +nightly main.rs
^C
rustc +nightly main.rs  206.66s user 0.14s system 99% cpu 3:27.45 total
$ time rustc +1aac55d18084910bbfa1d25733a5393860616b8b main.rs
rustc +1aac55d18084910bbfa1d25733a5393860616b8b main.rs  4.06s user 0.05s system 100% cpu 4.092 total

alexcrichton · 2018-05-09T18:43:12Z

future perf results link

alexcrichton · 2018-05-09T23:55:58Z

While the link I pasted above doesn't work yet this is the results comparing to the previous commit on master so either this PR or the rollup before it caused the speed boost, but I'm gonna optimistically say it was this PR :)

@bors: r=sfackler

bors · 2018-05-09T23:55:59Z

📌 Commit 254b601 has been approved by sfackler

alexcrichton · 2018-05-10T14:15:33Z

@bors: rollup

…fackler std: Avoid `ptr::copy` if unnecessary in `vec::Drain` This commit is spawned out of a performance regression investigation in rust-lang#50496. In tracking down this regression it turned out that the `expand_statements` function in the compiler was taking quite a long time. Further investigation showed two key properties: * The function was "fast" on glibc 2.24 and slow on glibc 2.23 * The hottest function was memmove from glibc Combined together it looked like glibc gained an optimization to the memmove function in 2.24. Ideally we don't want to rely on this optimization, so I wanted to dig further to see what was happening. The hottest part of `expand_statements` was `Drop for Drain` in the call to `splice` where we insert new statements into the original vector. This *should* be a cheap operation because we're draining and replacing iterators of the exact same length, but under the hood memmove was being called a lot, causing a slowdown on glibc 2.23. It turns out that at least one of the optimizations in glibc 2.24 was that `memmove` where the src/dst are equal becomes much faster. [This program][prog] executes in ~2.5s against glibc 2.23 and ~0.3s against glibc 2.24, exhibiting how glibc 2.24 is optimizing `memmove` if the src/dst are equal. And all that brings us to what this commit itself is doing. The change here is purely to `Drop for Drain` to avoid the call to `ptr::copy` if the region being copied doesn't actually need to be copied. For normal usage of just `Drain` itself this check isn't really necessary, but because `Splice` internally contains `Drain` this provides a nice speed boost on glibc 2.23. Overall this should fix the regression seen in rust-lang#50496 on glibc 2.23 and also fix the regression on Windows where `memmove` looks to not have this optimization. Note that the way `splice` was called in `expand_statements` would cause a quadratic number of elements to be copied via `memmove` which is likely why the tuple-stress benchmark showed such a severe regression. Closes rust-lang#50496 [prog]: https://gist.github.com/alexcrichton/c05bc51c6771bba5ae5b57561a6c1cd3

Rollup of 18 pull requests Successful merges: - #49423 (Extend tests for RFC1598 (GAT)) - #50010 (Give SliceIndex impls a test suite of girth befitting the implementation (and fix a UTF8 boundary check)) - #50447 (Fix update-references for tests within subdirectories.) - #50514 (Pull in a wasm fix from LLVM upstream) - #50524 (Make DepGraph::previous_work_products immutable) - #50532 (Don't use Lock for heavily accessed CrateMetadata::cnum_map.) - #50538 ( Make CrateNum allocation more thread-safe. ) - #50564 (Inline `Span` methods.) - #50565 (Use SmallVec for DepNodeIndex within dep_graph.) - #50569 (Allow for specifying a linker plugin for cross-language LTO) - #50572 (Clarify in the docs that `mul_add` is not always faster.) - #50574 (add fn `into_inner(self) -> (Idx, Idx)` to RangeInclusive (#49022)) - #50575 (std: Avoid `ptr::copy` if unnecessary in `vec::Drain`) - #50588 (Move "See also" disambiguation links for primitive types to top) - #50590 (Fix tuple struct field spans) - #50591 (Restore RawVec::reserve* documentation) - #50598 (Remove unnecessary mutable borrow and resizing in DepGraph::serialize) - #50606 (Retry when downloading the Docker cache.) Failed merges: - #50161 (added missing implementation hint) - #50558 (Remove all reference to DepGraph::work_products)

@alexcrichton

[beta] Process backports * #50575: std: Avoid `ptr::copy` if unnecessary in `vec::Drain` r? @alexcrichton

rust-highfive assigned aidanhs May 9, 2018

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label May 9, 2018

alexcrichton mentioned this pull request May 9, 2018

Severe regression in html5ever build time #50496

Closed

kennytm added S-waiting-on-perf Status: Waiting on a perf run to be completed. beta-nominated Nominated for backporting to the compiler in the beta channel. labels May 9, 2018

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 9, 2018

kennytm removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label May 10, 2018

alexcrichton added the beta-accepted Accepted for backporting to the compiler in the beta channel. label May 10, 2018

alexcrichton mentioned this pull request May 10, 2018

Rollup of 19 pull requests #50608

Closed

alexcrichton mentioned this pull request May 10, 2018

Rollup of 18 pull requests #50611

Merged

pietroalbini mentioned this pull request May 10, 2018

[beta] Process backports #50631

Merged

bors merged commit 254b601 into rust-lang:master May 11, 2018

bors added a commit that referenced this pull request May 11, 2018

Auto merge of #50631 - pietroalbini:beta-backports, r=alexcrichton

1e057a2

[beta] Process backports * #50575: std: Avoid `ptr::copy` if unnecessary in `vec::Drain` r? @alexcrichton

pietroalbini removed the beta-nominated Nominated for backporting to the compiler in the beta channel. label May 11, 2018

alexcrichton deleted the faster-drain-drop branch June 29, 2018 19:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

std: Avoid `ptr::copy` if unnecessary in `vec::Drain` #50575

std: Avoid `ptr::copy` if unnecessary in `vec::Drain` #50575

alexcrichton commented May 9, 2018

rust-highfive commented May 9, 2018

kennytm commented May 9, 2018

bors commented May 9, 2018

kennytm commented May 9, 2018

Mark-Simulacrum commented May 9, 2018

sfackler commented May 9, 2018

bors commented May 9, 2018

alexcrichton commented May 9, 2018

alexcrichton commented May 9, 2018

alexcrichton commented May 9, 2018

bors commented May 9, 2018

alexcrichton commented May 10, 2018

std: Avoid ptr::copy if unnecessary in vec::Drain #50575

std: Avoid ptr::copy if unnecessary in vec::Drain #50575

Conversation

alexcrichton commented May 9, 2018

rust-highfive commented May 9, 2018

kennytm commented May 9, 2018

bors commented May 9, 2018

kennytm commented May 9, 2018

Mark-Simulacrum commented May 9, 2018

sfackler commented May 9, 2018

bors commented May 9, 2018

alexcrichton commented May 9, 2018

alexcrichton commented May 9, 2018

alexcrichton commented May 9, 2018

bors commented May 9, 2018

alexcrichton commented May 10, 2018

std: Avoid `ptr::copy` if unnecessary in `vec::Drain` #50575

std: Avoid `ptr::copy` if unnecessary in `vec::Drain` #50575