-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Severe regression in html5ever build time #50496
Comments
The "tuple stress" benchmark is affected even more while most of the other things have improved. Maybe that gives a hint? |
My first guess would be quadratic or exponential blowup. Maybe someone can do a |
Yes, good idea! |
That may also be the source of the html5ever regression. |
Thanks, @alexcrichton! cc also @oli-obk & @eddyb |
@alexcrichton Could you post instructions for replicating the slow down? Neither @nnethercote or I could reproduce it -- despite running perf on the appropriate commit range locally. |
So should we revert that PR's commits? |
@Mark-Simulacrum sure yeah, using rustup-toolchain-install-master I did the equivalent of:
|
We should probably only run the disaggregation pass if the opt level is > 1. That's what's done for the copy prop pass already |
@fitzgen assuming @Mark-Simulacrum and @nnethercote can reproduce I think yeah we'll probably want to start off by reverting and possibly doing what @oli-obk suggests in a follow-up |
I cannot reproduce, which is odd, since I can't tell what the difference could be that would make this much of a difference...
|
No repro on macOS, although the c116 version is consistently slower than a272.
|
Ok I swear I'm not going crazy! I think though given this data it may only be reproducible on Ubuntu 16.04? I can reproduce with this FROM ubuntu:16.04
RUN apt-get update -y && apt-get install -y curl gcc libssl-dev file make pkg-config time
RUN curl https://sh.rustup.rs | sh -s -- -y
ENV PATH=$PATH:/root/.cargo/bin
RUN cargo install rustup-toolchain-install-master
RUN rustup-toolchain-install-master \
c1168be5360f17516b233be85ebb193bb4e612bf \
a2726846f6d6280b752019472b6bd791c0752006
RUN curl -Lo main.rs \
https://github.com/rust-lang-nursery/rustc-perf/raw/254d28f6181cd5d20cedcd3fa9ae36df847da958/collector/benchmarks/tuple-stress/src/main.rs
RUN time rustc +a2726846f6d6280b752019472b6bd791c0752006 main.rs
RUN time rustc +c1168be5360f17516b233be85ebb193bb4e612bf main.rs Can y'all reproduce the slowdown for html5ever? |
Sure enough starting the docker image from 18.04 means I cannot reproduce the regression. Maybe some glibc bug was fixed which I was accidentally running into? The html5ever regression though is still worrisome and seems like it may not be connected to a glibc bug |
👍 I can repro the tuple-stress slow down on 16.04.4 LTS.
GDB backtrace sample
Code surrounding the call of that memmove
|
Presuming the same two commits would be responsible for the html5ever regression, I also cannot reproduce it on 18.04 Ubuntu. |
Using this dockerfile: FROM ubuntu:16.04
RUN apt-get update -y && apt-get install -y curl gcc libssl-dev file make pkg-config time git
RUN curl https://sh.rustup.rs | sh -s -- -y
ENV PATH=$PATH:/root/.cargo/bin
RUN cargo install rustup-toolchain-install-master
RUN rustup-toolchain-install-master \
c1168be5360f17516b233be85ebb193bb4e612bf \
a2726846f6d6280b752019472b6bd791c0752006
RUN git clone https://github.com/rust-lang-nursery/rustc-perf
RUN cargo +a2726846f6d6280b752019472b6bd791c0752006 \
fetch --manifest-path rustc-perf/collector/benchmarks/html5ever/Cargo.toml
RUN cargo +a2726846f6d6280b752019472b6bd791c0752006 \
build --manifest-path rustc-perf/collector/benchmarks/html5ever/Cargo.toml
RUN cargo +c1168be5360f17516b233be85ebb193bb4e612bf \
build --manifest-path rustc-perf/collector/benchmarks/html5ever/Cargo.toml I can reproduce the regression from 10s to 20s on html5ever. Starting from 18.04 it does indeed not show a regression. The perf server is indeed 16.04. Ok so it looks like this is a regression specific to something that changed between glibc 2.23 and 2.27. With the profiles it looks like it's something related to the efficiency of memcpy/memmove. It also seems like this isn't reproducible on OSX (thanks @kennytm). I can, however, reproduce it locally on Windows 64-bit MSVC. Since @kennytm could reproduce a slightly slowdown on OSX, I wonder if these code paths do something like exercise a ton of moves? It may be the case that glibc optimized one of its routines in the 2.23 -> 2.27 transition (and OSX already had it). In that case it may appear like there's no regression on Linux/OSX but on glibc 2.23 and Windows they may have unoptimized versions of the routines causing slowdowns. All in all I think that this is a regression that is still worth fixing, despite it not immediately being applicable on glibc 2.27 or OSX. The fact that it affects somewhat-recent versions of Ubuntu and Windows is enough for me. |
Isn't glibc 2.27 the version released after the spectre/meltdown/whatever fixes that slow (primarily Intel but to a lesser extent AMD as well) systems down? |
No sorry glibc 2.27 is just what I tested. I've now narrowed it down (conveniently) to glibc 2.23 and 2.24. Ubuntu 16.04 ships with 2.23 and 16.10 ships with 2.24. On 2.24 I'm seeing a faster compilation Perusing the glibc release announcement it looks like this change may be why we're not seeing the regression in newer Linux |
Ok I've found the source of the regression, will post a PR soon. |
It's a pleasure watching you folks work! |
This commit is spawned out of a performance regression investigation in rust-lang#50496. In tracking down this regression it turned out that the `expand_statements` function in the compiler was taking quite a long time. Further investigation showed two key properties: * The function was "fast" on glibc 2.24 and slow on glibc 2.23 * The hottest function was memmove from glibc Combined together it looked like glibc gained an optimization to the memmove function in 2.24. Ideally we don't want to rely on this optimization, so I wanted to dig further to see what was happening. The hottest part of `expand_statements` was `Drop for Drain` in the call to `splice` where we insert new statements into the original vector. This *should* be a cheap operation because we're draining and replacing iterators of the exact same length, but under the hood memmove was being called a lot, causing a slowdown on glibc 2.23. It turns out that at least one of the optimizations in glibc 2.24 was that `memmove` where the src/dst are equal becomes much faster. [This program][prog] executes in ~2.5s against glibc 2.23 and ~0.3s against glibc 2.24, exhibiting how glibc 2.24 is optimizing `memmove` if the src/dst are equal. And all that brings us to what this commit itself is doing. The change here is purely to `Drop for Drain` to avoid the call to `ptr::copy` if the region being copied doesn't actually need to be copied. For normal usage of just `Drain` itself this check isn't really necessary, but because `Splice` internally contains `Drain` this provides a nice speed boost on glibc 2.23. Overall this should fix the regression seen in rust-lang#50496 on glibc 2.23 and also fix the regression on Windows where `memmove` looks to not have this optimization. Note that the way `splice` was called in `expand_statements` would cause a quadratic number of elements to be copied via `memmove` which is likely why the tuple-stress benchmark showed such a severe regression. Closes rust-lang#50496 [prog]: https://gist.github.com/alexcrichton/c05bc51c6771bba5ae5b57561a6c1cd3
I've posted what I believe is a fix to #50575 |
std: Avoid `ptr::copy` if unnecessary in `vec::Drain` This commit is spawned out of a performance regression investigation in #50496. In tracking down this regression it turned out that the `expand_statements` function in the compiler was taking quite a long time. Further investigation showed two key properties: * The function was "fast" on glibc 2.24 and slow on glibc 2.23 * The hottest function was memmove from glibc Combined together it looked like glibc gained an optimization to the memmove function in 2.24. Ideally we don't want to rely on this optimization, so I wanted to dig further to see what was happening. The hottest part of `expand_statements` was `Drop for Drain` in the call to `splice` where we insert new statements into the original vector. This *should* be a cheap operation because we're draining and replacing iterators of the exact same length, but under the hood memmove was being called a lot, causing a slowdown on glibc 2.23. It turns out that at least one of the optimizations in glibc 2.24 was that `memmove` where the src/dst are equal becomes much faster. [This program][prog] executes in ~2.5s against glibc 2.23 and ~0.3s against glibc 2.24, exhibiting how glibc 2.24 is optimizing `memmove` if the src/dst are equal. And all that brings us to what this commit itself is doing. The change here is purely to `Drop for Drain` to avoid the call to `ptr::copy` if the region being copied doesn't actually need to be copied. For normal usage of just `Drain` itself this check isn't really necessary, but because `Splice` internally contains `Drain` this provides a nice speed boost on glibc 2.23. Overall this should fix the regression seen in #50496 on glibc 2.23 and also fix the regression on Windows where `memmove` looks to not have this optimization. Note that the way `splice` was called in `expand_statements` would cause a quadratic number of elements to be copied via `memmove` which is likely why the tuple-stress benchmark showed such a severe regression. Closes #50496 [prog]: https://gist.github.com/alexcrichton/c05bc51c6771bba5ae5b57561a6c1cd3
triage: P-high Has pending fix. |
…fackler std: Avoid `ptr::copy` if unnecessary in `vec::Drain` This commit is spawned out of a performance regression investigation in rust-lang#50496. In tracking down this regression it turned out that the `expand_statements` function in the compiler was taking quite a long time. Further investigation showed two key properties: * The function was "fast" on glibc 2.24 and slow on glibc 2.23 * The hottest function was memmove from glibc Combined together it looked like glibc gained an optimization to the memmove function in 2.24. Ideally we don't want to rely on this optimization, so I wanted to dig further to see what was happening. The hottest part of `expand_statements` was `Drop for Drain` in the call to `splice` where we insert new statements into the original vector. This *should* be a cheap operation because we're draining and replacing iterators of the exact same length, but under the hood memmove was being called a lot, causing a slowdown on glibc 2.23. It turns out that at least one of the optimizations in glibc 2.24 was that `memmove` where the src/dst are equal becomes much faster. [This program][prog] executes in ~2.5s against glibc 2.23 and ~0.3s against glibc 2.24, exhibiting how glibc 2.24 is optimizing `memmove` if the src/dst are equal. And all that brings us to what this commit itself is doing. The change here is purely to `Drop for Drain` to avoid the call to `ptr::copy` if the region being copied doesn't actually need to be copied. For normal usage of just `Drain` itself this check isn't really necessary, but because `Splice` internally contains `Drain` this provides a nice speed boost on glibc 2.23. Overall this should fix the regression seen in rust-lang#50496 on glibc 2.23 and also fix the regression on Windows where `memmove` looks to not have this optimization. Note that the way `splice` was called in `expand_statements` would cause a quadratic number of elements to be copied via `memmove` which is likely why the tuple-stress benchmark showed such a severe regression. Closes rust-lang#50496 [prog]: https://gist.github.com/alexcrichton/c05bc51c6771bba5ae5b57561a6c1cd3
…fackler std: Avoid `ptr::copy` if unnecessary in `vec::Drain` This commit is spawned out of a performance regression investigation in rust-lang#50496. In tracking down this regression it turned out that the `expand_statements` function in the compiler was taking quite a long time. Further investigation showed two key properties: * The function was "fast" on glibc 2.24 and slow on glibc 2.23 * The hottest function was memmove from glibc Combined together it looked like glibc gained an optimization to the memmove function in 2.24. Ideally we don't want to rely on this optimization, so I wanted to dig further to see what was happening. The hottest part of `expand_statements` was `Drop for Drain` in the call to `splice` where we insert new statements into the original vector. This *should* be a cheap operation because we're draining and replacing iterators of the exact same length, but under the hood memmove was being called a lot, causing a slowdown on glibc 2.23. It turns out that at least one of the optimizations in glibc 2.24 was that `memmove` where the src/dst are equal becomes much faster. [This program][prog] executes in ~2.5s against glibc 2.23 and ~0.3s against glibc 2.24, exhibiting how glibc 2.24 is optimizing `memmove` if the src/dst are equal. And all that brings us to what this commit itself is doing. The change here is purely to `Drop for Drain` to avoid the call to `ptr::copy` if the region being copied doesn't actually need to be copied. For normal usage of just `Drain` itself this check isn't really necessary, but because `Splice` internally contains `Drain` this provides a nice speed boost on glibc 2.23. Overall this should fix the regression seen in rust-lang#50496 on glibc 2.23 and also fix the regression on Windows where `memmove` looks to not have this optimization. Note that the way `splice` was called in `expand_statements` would cause a quadratic number of elements to be copied via `memmove` which is likely why the tuple-stress benchmark showed such a severe regression. Closes rust-lang#50496 [prog]: https://gist.github.com/alexcrichton/c05bc51c6771bba5ae5b57561a6c1cd3
This commit is spawned out of a performance regression investigation in #50496. In tracking down this regression it turned out that the `expand_statements` function in the compiler was taking quite a long time. Further investigation showed two key properties: * The function was "fast" on glibc 2.24 and slow on glibc 2.23 * The hottest function was memmove from glibc Combined together it looked like glibc gained an optimization to the memmove function in 2.24. Ideally we don't want to rely on this optimization, so I wanted to dig further to see what was happening. The hottest part of `expand_statements` was `Drop for Drain` in the call to `splice` where we insert new statements into the original vector. This *should* be a cheap operation because we're draining and replacing iterators of the exact same length, but under the hood memmove was being called a lot, causing a slowdown on glibc 2.23. It turns out that at least one of the optimizations in glibc 2.24 was that `memmove` where the src/dst are equal becomes much faster. [This program][prog] executes in ~2.5s against glibc 2.23 and ~0.3s against glibc 2.24, exhibiting how glibc 2.24 is optimizing `memmove` if the src/dst are equal. And all that brings us to what this commit itself is doing. The change here is purely to `Drop for Drain` to avoid the call to `ptr::copy` if the region being copied doesn't actually need to be copied. For normal usage of just `Drain` itself this check isn't really necessary, but because `Splice` internally contains `Drain` this provides a nice speed boost on glibc 2.23. Overall this should fix the regression seen in #50496 on glibc 2.23 and also fix the regression on Windows where `memmove` looks to not have this optimization. Note that the way `splice` was called in `expand_statements` would cause a quadratic number of elements to be copied via `memmove` which is likely why the tuple-stress benchmark showed such a severe regression. Closes #50496 [prog]: https://gist.github.com/alexcrichton/c05bc51c6771bba5ae5b57561a6c1cd3
The build time of html5ever has gone up by more than 70% for debug builds recently:
http://perf.rust-lang.org/compare.html?start=4745092d608e65ec869c0ebdb27c535f27606ea4&end=6a87289fa4f49b6bdd62f33f69a580026223421f&stat=instructions:u
There are quite a few candidates for introducing this regression in the changeset:
4745092...6a87289
Would anybody from @rust-lang/wg-compiler-performance or @rust-lang/infra care to bisect further?
The text was updated successfully, but these errors were encountered: