Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Severe regression in html5ever build time #50496

Closed
michaelwoerister opened this issue May 7, 2018 · 24 comments · Fixed by #50575
Closed

Severe regression in html5ever build time #50496

michaelwoerister opened this issue May 7, 2018 · 24 comments · Fixed by #50575
Labels
I-compiletime Issue: Problems and improvements with respect to compile times. O-linux Operating system: Linux P-high High priority regression-from-stable-to-beta Performance or correctness regression from stable to beta. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. WG-compiler-performance Working group: Compiler Performance
Milestone

Comments

@michaelwoerister
Copy link
Member

The build time of html5ever has gone up by more than 70% for debug builds recently:
http://perf.rust-lang.org/compare.html?start=4745092d608e65ec869c0ebdb27c535f27606ea4&end=6a87289fa4f49b6bdd62f33f69a580026223421f&stat=instructions:u

There are quite a few candidates for introducing this regression in the changeset:
4745092...6a87289

Would anybody from @rust-lang/wg-compiler-performance or @rust-lang/infra care to bisect further?

@michaelwoerister michaelwoerister added I-compiletime Issue: Problems and improvements with respect to compile times. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. WG-compiler-performance Working group: Compiler Performance labels May 7, 2018
@michaelwoerister
Copy link
Member Author

The "tuple stress" benchmark is affected even more while most of the other things have improved. Maybe that gives a hint?

@ishitatsuyuki
Copy link
Contributor

My first guess would be quadratic or exponential blowup. Maybe someone can do a perf focus?

@michaelwoerister
Copy link
Member Author

Yes, good idea!

@michaelwoerister michaelwoerister added the regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. label May 7, 2018
@alexcrichton
Copy link
Member

Bisection for tuple-stress points to #50282 (cc @fitzgen)

Before that PR rustc main.rs took ~4s and afterwards I killed it after 2m as I didn't want to wait for it to finish

@alexcrichton alexcrichton added this to the 1.27 milestone May 7, 2018
@alexcrichton
Copy link
Member

That may also be the source of the html5ever regression. cargo rustc --lib -- -Z time-passes takes 10s before that commit (the entire cargo rustc time) and 21s afterwards.

@michaelwoerister
Copy link
Member Author

Thanks, @alexcrichton! cc also @oli-obk & @eddyb

@Mark-Simulacrum
Copy link
Member

@alexcrichton Could you post instructions for replicating the slow down? Neither @nnethercote or I could reproduce it -- despite running perf on the appropriate commit range locally.

@fitzgen
Copy link
Member

fitzgen commented May 7, 2018

So should we revert that PR's commits?

@alexcrichton
Copy link
Member

@Mark-Simulacrum sure yeah, using rustup-toolchain-install-master I did the equivalent of:

$ rustup-toolchain-install-master c1168be5360f17516b233be85ebb193bb4e612bf a2726846f6d6280b752019472b6bd791c0752006
$ pwd
/home/alex/code/rustc-perf
$ git rev-parse HEAD
254d28f6181cd5d20cedcd3fa9ae36df847da958
$ time rustc +a2726846f6d6280b752019472b6bd791c0752006 collector/benchmarks/tuple-stress/src/main.rs
rustc +a2726846f6d6280b752019472b6bd791c0752006   4.58s user 0.72s system 100% cpu 5.264 total
$ time rustc +c1168be5360f17516b233be85ebb193bb4e612bf collector/benchmarks/tuple-stress/src/main.rs
^C
rustc +c1168be5360f17516b233be85ebb193bb4e612bf   109.70s user 0.60s system 99% cpu 1:50.31 total

@oli-obk
Copy link
Contributor

oli-obk commented May 7, 2018

We should probably only run the disaggregation pass if the opt level is > 1. That's what's done for the copy prop pass already

@alexcrichton
Copy link
Member

@fitzgen assuming @Mark-Simulacrum and @nnethercote can reproduce I think yeah we'll probably want to start off by reverting and possibly doing what @oli-obk suggests in a follow-up

@Mark-Simulacrum
Copy link
Member

I cannot reproduce, which is odd, since I can't tell what the difference could be that would make this much of a difference...

perf.rlo$ time rustc +a2726846f6d6280b752019472b6bd791c0752006 collector/benchmarks/tuple-stress/src/main.rs

real    0m4.688s
user    0m4.559s
sys     0m0.160s
perf.rlo$ time rustc +c1168be5360f17516b233be85ebb193bb4e612bf collector/benchmarks/tuple-stress/src/main.rs

real    0m4.852s
user    0m4.659s
sys     0m0.219s

@kennytm
Copy link
Member

kennytm commented May 8, 2018

No repro on macOS, although the c116 version is consistently slower than a272.

~/Downloads:92$ time rustc +c1168be5360f17516b233be85ebb193bb4e612bf main.rs

real	0m6.634s
user	0m6.154s
sys	0m0.474s

~/Downloads:93$ time rustc +a2726846f6d6280b752019472b6bd791c0752006 main.rs

real	0m6.171s
user	0m5.848s
sys	0m0.353s

~/Downloads:94$ time rustc +c1168be5360f17516b233be85ebb193bb4e612bf main.rs

real	0m6.492s
user	0m6.083s
sys	0m0.434s

~/Downloads:95$ time rustc +a2726846f6d6280b752019472b6bd791c0752006 main.rs

real	0m6.178s
user	0m5.848s
sys	0m0.360s

@alexcrichton
Copy link
Member

alexcrichton commented May 8, 2018

Ok I swear I'm not going crazy! I think though given this data it may only be reproducible on Ubuntu 16.04? I can reproduce with this Dockerfile:

FROM ubuntu:16.04

RUN apt-get update -y && apt-get install -y curl gcc libssl-dev file make pkg-config time

RUN curl https://sh.rustup.rs | sh -s -- -y

ENV PATH=$PATH:/root/.cargo/bin
RUN cargo install rustup-toolchain-install-master
RUN rustup-toolchain-install-master \
  c1168be5360f17516b233be85ebb193bb4e612bf  \
  a2726846f6d6280b752019472b6bd791c0752006

RUN curl -Lo main.rs \
  https://github.com/rust-lang-nursery/rustc-perf/raw/254d28f6181cd5d20cedcd3fa9ae36df847da958/collector/benchmarks/tuple-stress/src/main.rs

RUN time rustc +a2726846f6d6280b752019472b6bd791c0752006 main.rs
RUN time rustc +c1168be5360f17516b233be85ebb193bb4e612bf main.rs

Can y'all reproduce the slowdown for html5ever?

@alexcrichton
Copy link
Member

Sure enough starting the docker image from 18.04 means I cannot reproduce the regression.

Maybe some glibc bug was fixed which I was accidentally running into? The html5ever regression though is still worrisome and seems like it may not be connected to a glibc bug

@kennytm kennytm added the O-linux Operating system: Linux label May 8, 2018
@kennytm
Copy link
Member

kennytm commented May 8, 2018

👍 I can repro the tuple-stress slow down on 16.04.4 LTS.

-Ztime-passes indicates it is stuck at the translation item collection step.

GDB backtrace sample
#0  __memmove_avx_unaligned () at ../sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S:325
#1  0x00007ffff49a23b3 in rustc::mir::BasicBlockData::expand_statements::hf1d4ebe84ecd6e03 () from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/bin/../lib/../lib/librustc_mir-6950d58feb41fa53.so
#2  0x00007ffff48b9bb5 in _$LT$rustc_mir..transform..deaggregator..Deaggregator$u20$as$u20$rustc_mir..transform..MirPass$GT$::run_pass::hbb873b9298ecbcc7 ()
   from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/bin/../lib/../lib/librustc_mir-6950d58feb41fa53.so
#3  0x00007ffff4900f48 in rustc_mir::transform::optimized_mir::_$u7b$$u7b$closure$u7d$$u7d$::h31fd21092d0543e7 ()
   from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/bin/../lib/../lib/librustc_mir-6950d58feb41fa53.so
#4  0x00007ffff48ea8d1 in rustc_mir::transform::optimized_mir::h1131a13b2ae69a43 () from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/bin/../lib/../lib/librustc_mir-6950d58feb41fa53.so
#5  0x00007ffff3cdb059 in rustc::ty::maps::_$LT$impl$u20$rustc..ty..maps..config..QueryConfig$LT$$u27$tcx$GT$$u20$for$u20$rustc..ty..maps..queries..optimized_mir$LT$$u27$tcx$GT$$GT$::compute::hc1ffed4360780ff2 ()
   from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/bin/../lib/../lib/librustc-6264c07cbba9680a.so
#6  0x00007ffff37e2a31 in rustc::dep_graph::graph::DepGraph::with_task_impl::h3ce3074a9b38daa4 () from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/bin/../lib/../lib/librustc-6264c07cbba9680a.so
#7  0x00007ffff39a83d7 in rustc::ty::context::tls::with_related_context::h2449d2461fb7e66f () from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/bin/../lib/../lib/librustc-6264c07cbba9680a.so
#8  0x00007ffff3a301a2 in rustc::ty::maps::plumbing::_$LT$impl$u20$rustc..ty..context..TyCtxt$LT$$u27$a$C$$u20$$u27$gcx$C$$u20$$u27$tcx$GT$$GT$::force_query_with_job::h6c5f6a6ee29b4c8c ()
   from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/bin/../lib/../lib/librustc-6264c07cbba9680a.so
#9  0x00007ffff3a9fa43 in rustc::ty::maps::plumbing::_$LT$impl$u20$rustc..ty..context..TyCtxt$LT$$u27$a$C$$u20$$u27$gcx$C$$u20$$u27$tcx$GT$$GT$::get_query::h6b180b58d3e77599 ()
   from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/bin/../lib/../lib/librustc-6264c07cbba9680a.so
#10 0x00007ffff3b0d17b in rustc::ty::_$LT$impl$u20$rustc..ty..context..TyCtxt$LT$$u27$a$C$$u20$$u27$gcx$C$$u20$$u27$tcx$GT$$GT$::maybe_optimized_mir::hbb43c33ed1c2e50f ()
   from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/bin/../lib/../lib/librustc-6264c07cbba9680a.so
#11 0x00007ffff492c71b in _$LT$rustc_mir..interpret..eval_context..EvalContext$LT$$u27$a$C$$u20$$u27$mir$C$$u20$$u27$tcx$C$$u20$M$GT$$GT$::load_mir::hdadbb8aaab5a0add ()
   from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/bin/../lib/../lib/librustc_mir-6950d58feb41fa53.so
#12 0x00007ffff4775200 in rustc_mir::interpret::const_eval::eval_body_and_ecx::hb4b25a904afc078e () from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/bin/../lib/../lib/librustc_mir-6950d58feb41fa53.so
#13 0x00007ffff4779068 in rustc_mir::interpret::const_eval::const_eval_provider::he924d56230e7faca () from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/bin/../lib/../lib/librustc_mir-6950d58feb41fa53.so
#14 0x00007ffff3cdbb57 in rustc::ty::maps::_$LT$impl$u20$rustc..ty..maps..config..QueryConfig$LT$$u27$tcx$GT$$u20$for$u20$rustc..ty..maps..queries..const_eval$LT$$u27$tcx$GT$$GT$::compute::h8881ebdb8a6b3fc9 ()
   from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/bin/../lib/../lib/librustc-6264c07cbba9680a.so
#15 0x00007ffff37f9789 in rustc::dep_graph::graph::DepGraph::with_task_impl::h970e6f3457ee9779 () from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/bin/../lib/../lib/librustc-6264c07cbba9680a.so
#16 0x00007ffff39fd5cf in rustc::ty::context::tls::with_related_context::hf6aa86ba669caacf () from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/bin/../lib/../lib/librustc-6264c07cbba9680a.so
#17 0x00007ffff3a315d8 in rustc::ty::maps::plumbing::_$LT$impl$u20$rustc..ty..context..TyCtxt$LT$$u27$a$C$$u20$$u27$gcx$C$$u20$$u27$tcx$GT$$GT$::force_query_with_job::h767d969b981cd0e1 ()
   from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/bin/../lib/../lib/librustc-6264c07cbba9680a.so
#18 0x00007ffff3a898a6 in rustc::ty::maps::plumbing::_$LT$impl$u20$rustc..ty..context..TyCtxt$LT$$u27$a$C$$u20$$u27$gcx$C$$u20$$u27$tcx$GT$$GT$::get_query::h402310c1d566e53b ()
   from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/bin/../lib/../lib/librustc-6264c07cbba9680a.so
#19 0x00007ffff3b0eacd in rustc::ty::maps::_$LT$impl$u20$rustc..ty..context..TyCtxt$LT$$u27$a$C$$u20$$u27$tcx$C$$u20$$u27$lcx$GT$$GT$::const_eval::h0dc1da25520f77c1 ()
   from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/bin/../lib/../lib/librustc-6264c07cbba9680a.so
#20 0x00007ffff4868556 in rustc_mir::monomorphize::collector::collect_items_rec::h62a919462e2120d0 () from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/bin/../lib/../lib/librustc_mir-6950d58feb41fa53.so
#21 0x00007ffff4867e82 in rustc_mir::monomorphize::collector::collect_crate_mono_items::h95367eff4f34f085 ()
   from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/bin/../lib/../lib/librustc_mir-6950d58feb41fa53.so
#22 0x00007fffed1dd4ba in rustc::util::common::time::h2e056e8595b0c79b () from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/lib/rustlib/x86_64-unknown-linux-gnu/codegen-backends/librustc_trans-llvm.so
#23 0x00007fffed2a02c9 in rustc_trans::base::collect_and_partition_translation_items::h675aeb9c6f31a5ac ()
   from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/lib/rustlib/x86_64-unknown-linux-gnu/codegen-backends/librustc_trans-llvm.so
#24 0x00007ffff37da225 in rustc::dep_graph::graph::DepGraph::with_task_impl::h2495eb20a211973b () from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/bin/../lib/../lib/librustc-6264c07cbba9680a.so
#25 0x00007ffff39e38fc in rustc::ty::context::tls::with_related_context::hb6f352ea70136898 () from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/bin/../lib/../lib/librustc-6264c07cbba9680a.so
#26 0x00007ffff3a231f0 in rustc::ty::maps::plumbing::_$LT$impl$u20$rustc..ty..context..TyCtxt$LT$$u27$a$C$$u20$$u27$gcx$C$$u20$$u27$tcx$GT$$GT$::force_query_with_job::h20d94168c3140725 ()
   from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/bin/../lib/../lib/librustc-6264c07cbba9680a.so
#27 0x00007ffff3a8f472 in rustc::ty::maps::plumbing::_$LT$impl$u20$rustc..ty..context..TyCtxt$LT$$u27$a$C$$u20$$u27$gcx$C$$u20$$u27$tcx$GT$$GT$::get_query::h47577a0ffde54cdb ()
   from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/bin/../lib/../lib/librustc-6264c07cbba9680a.so
#28 0x00007fffed29e541 in rustc_trans::base::trans_crate::hee88ee89ffeda7af () from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/lib/rustlib/x86_64-unknown-linux-gnu/codegen-backends/librustc_trans-llvm.so
#29 0x00007fffed201440 in _$LT$rustc_trans..LlvmTransCrate$u20$as$u20$rustc_trans_utils..trans_crate..TransCrate$GT$::trans_crate::h800dcda488afb790 ()
   from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/lib/rustlib/x86_64-unknown-linux-gnu/codegen-backends/librustc_trans-llvm.so
#30 0x00007ffff7a2ed36 in rustc::util::common::time::hcf488eceeda0a9cc () from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/bin/../lib/librustc_driver-faeae2bec58daae3.so
#31 0x00007ffff7a1ea4d in rustc_driver::driver::phase_4_translate_to_llvm::h13463ad7ae997e4b () from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/bin/../lib/librustc_driver-faeae2bec58daae3.so
#32 0x00007ffff7ac3016 in rustc_driver::driver::compile_input::_$u7b$$u7b$closure$u7d$$u7d$::h31e9ff1f64156b74 ()
   from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/bin/../lib/librustc_driver-faeae2bec58daae3.so
#33 0x00007ffff7ab6af5 in rustc::ty::context::tls::enter_context::hb86cbd2c401dfb18 () from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/bin/../lib/librustc_driver-faeae2bec58daae3.so
#34 0x00007ffff7a818be in _$LT$std..thread..local..LocalKey$LT$T$GT$$GT$::with::h19074a28b3607093 () from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/bin/../lib/librustc_driver-faeae2bec58daae3.so
#35 0x00007ffff79ceb5c in rustc::ty::context::TyCtxt::create_and_enter::h3499c83bc74e5346 () from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/bin/../lib/librustc_driver-faeae2bec58daae3.so
#36 0x00007ffff7a17a0b in rustc_driver::driver::compile_input::ha789cddfb4d7cf8b () from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/bin/../lib/librustc_driver-faeae2bec58daae3.so
#37 0x00007ffff7ad5735 in rustc_driver::run_compiler_impl::he0bf21247a724f12 () from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/bin/../lib/librustc_driver-faeae2bec58daae3.so
#38 0x00007ffff7a36cf5 in _$LT$scoped_tls..ScopedKey$LT$T$GT$$GT$::set::h0a4a9cc70840e625 () from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/bin/../lib/librustc_driver-faeae2bec58daae3.so
#39 0x00007ffff79eae95 in syntax::with_globals::h426e57a0d566387f () from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/bin/../lib/librustc_driver-faeae2bec58daae3.so
#40 0x00007ffff79dde2e in _$LT$std..panic..AssertUnwindSafe$LT$F$GT$$u20$as$u20$core..ops..function..FnOnce$LT$$LP$$RP$$GT$$GT$::call_once::h31900b1f5e62de36 ()
   from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/bin/../lib/librustc_driver-faeae2bec58daae3.so
#41 0x00007ffff766ff7a in __rust_maybe_catch_panic () at libpanic_unwind/lib.rs:105
#42 0x00007ffff7ad2814 in rustc_driver::run::hbcc80b82a59ea630 () from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/bin/../lib/librustc_driver-faeae2bec58daae3.so
#43 0x00007ffff7ae02bb in rustc_driver::main::hd5d5ac3338638be9 () from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/bin/../lib/librustc_driver-faeae2bec58daae3.so
#44 0x0000555555554c13 in std::rt::lang_start::_$u7b$$u7b$closure$u7d$$u7d$::hbdc018091d795d35 ()
#45 0x00007ffff7630073 in std::rt::lang_start_internal::_$u7b$$u7b$closure$u7d$$u7d$::h0e173cc915a5091f () at libstd/rt.rs:59
#46 std::panicking::try::do_call::h40752f40c51cf13d () at libstd/panicking.rs:310
#47 0x00007ffff766ff7a in __rust_maybe_catch_panic () at libpanic_unwind/lib.rs:105
#48 0x00007ffff7645cf6 in std::panicking::try::h431a7a5b8edb0415 () at libstd/panicking.rs:289
#49 std::panic::catch_unwind::h1eb313d318d299fa () at libstd/panic.rs:374
#50 std::rt::lang_start_internal::h83a5cbcfc3241381 () at libstd/rt.rs:58
#51 0x0000555555554c74 in main ()
Code surrounding the call of that memmove
   0x00007ffff49a2303 <+3411>:	je     0x7ffff49a237b <_ZN5rustc3mir14BasicBlockData17expand_statements17hf1d4ebe84ecd6e03E+3531>
   0x00007ffff49a2305 <+3413>:	mov    0x100(%rsp),%rax
   0x00007ffff49a230d <+3421>:	mov    %rax,0xa0(%rsp)
   0x00007ffff49a2315 <+3429>:	movaps 0xf0(%rsp),%xmm0
   0x00007ffff49a231d <+3437>:	movaps %xmm0,0x90(%rsp)
   0x00007ffff49a2325 <+3445>:	movaps 0xb0(%rsp),%xmm0
   0x00007ffff49a232d <+3453>:	movaps 0xc0(%rsp),%xmm1
   0x00007ffff49a2335 <+3461>:	movdqa 0xd0(%rsp),%xmm2
   0x00007ffff49a233e <+3470>:	movaps 0xe0(%rsp),%xmm3
   0x00007ffff49a2346 <+3478>:	movaps %xmm3,0x80(%rsp)
   0x00007ffff49a234e <+3486>:	movdqa %xmm2,0x70(%rsp)
   0x00007ffff49a2354 <+3492>:	movaps %xmm1,0x60(%rsp)
   0x00007ffff49a2359 <+3497>:	movaps %xmm0,0x50(%rsp)
   0x00007ffff49a235e <+3502>:	mov    %r12,%rdi
   0x00007ffff49a2361 <+3505>:	callq  0x7ffff46c8880 <_ZN4core3ptr13drop_in_place17h3188964708fd179cE.llvm.12071625485570946806>
   0x00007ffff49a2366 <+3510>:	jmpq   0x7ffff49a2290 <_ZN5rustc3mir14BasicBlockData17expand_statements17hf1d4ebe84ecd6e03E+3296>
   0x00007ffff49a236b <+3515>:	nopl   0x0(%rax,%rax,1)
   0x00007ffff49a2370 <+3520>:	movl   $0x9,0xb0(%rsp)
   0x00007ffff49a237b <+3531>:	mov    0x368(%rsp),%rax
   0x00007ffff49a2383 <+3539>:	test   %rax,%rax
   0x00007ffff49a2386 <+3542>:	je     0x7ffff49a23c4 <_ZN5rustc3mir14BasicBlockData17expand_statements17hf1d4ebe84ecd6e03E+3604>
   0x00007ffff49a2388 <+3544>:	mov    0x380(%rsp),%r15
   0x00007ffff49a2390 <+3552>:	mov    (%r15),%rcx
   0x00007ffff49a2393 <+3555>:	mov    0x10(%r15),%rbx
   0x00007ffff49a2397 <+3559>:	imul   $0x58,0x360(%rsp),%rsi
   0x00007ffff49a23a0 <+3568>:	add    %rcx,%rsi
   0x00007ffff49a23a3 <+3571>:	imul   $0x58,%rbx,%rdi
   0x00007ffff49a23a7 <+3575>:	add    %rcx,%rdi
   0x00007ffff49a23aa <+3578>:	imul   $0x58,%rax,%rdx
   0x00007ffff49a23ae <+3582>:	callq  0x7ffff46be350 <memmove@plt>
=> 0x00007ffff49a23b3 <+3587>:	add    0x368(%rsp),%rbx
   0x00007ffff49a23bb <+3595>:	mov    %rbx,0x10(%r15)
   0x00007ffff49a23bf <+3599>:	mov    0x28(%rsp),%r15
   0x00007ffff49a23c4 <+3604>:	mov    %r14,%rdi
   0x00007ffff49a23c7 <+3607>:	callq  0x7ffff46e3080 <_ZN71_$LT$alloc..vec..IntoIter$LT$T$GT$$u20$as$u20$core..ops..drop..Drop$GT$4drop17h62e13b79d8b9a38aE>
   0x00007ffff49a23cc <+3612>:	mov    0x3b0(%rsp),%rdi
   0x00007ffff49a23d4 <+3620>:	mov    $0x38,%esi
   0x00007ffff49a23d9 <+3625>:	mov    $0x8,%edx
   0x00007ffff49a23de <+3630>:	callq  0x7ffff46bbd40 <__rust_dealloc@plt>
   0x00007ffff49a23e3 <+3635>:	lea    0x3b8(%rsp),%rdi
   0x00007ffff49a23eb <+3643>:	callq  0x7ffff4993fd0 <_ZN4core3ptr13drop_in_place17hdc52da31c044fba0E.llvm.13745968970215452383>
   0x00007ffff49a23f0 <+3648>:	cmpl   $0x9,0x3f8(%rsp)
   0x00007ffff49a23f8 <+3656>:	je     0x7ffff49a2407 <_ZN5rustc3mir14BasicBlockData17expand_statements17hf1d4ebe84ecd6e03E+3671>
   0x00007ffff49a23fa <+3658>:	lea    0x3f8(%rsp),%rdi
   0x00007ffff49a2402 <+3666>:	callq  0x7ffff49911e0 <_ZN4core3ptr13drop_in_place17h3188964708fd179cE>
   0x00007ffff49a2407 <+3671>:	mov    0x48(%rsp),%rsi
   0x00007ffff49a240c <+3676>:	cmp    0x40(%rsp),%rsi
   0x00007ffff49a2411 <+3681>:	lea    0x120(%rsp),%r14
   0x00007ffff49a2419 <+3689>:	jne    0x7ffff49a1ff3 <_ZN5rustc3mir14BasicBlockData17expand_statements17hf1d4ebe84ecd6e03E+2627>
   0x00007ffff49a241f <+3695>:	lea    0x30(%rsp),%rdi
   0x00007ffff49a2424 <+3700>:	callq  0x7ffff46e2b30 <_ZN71_$LT$alloc..vec..IntoIter$LT$T$GT$$u20$as$u20$core..ops..drop..Drop$GT$4drop17h3943334a18285159E>
   0x00007ffff49a2429 <+3705>:	lea    -0x28(%rbp),%rsp
   0x00007ffff49a242d <+3709>:	pop    %rbx
   0x00007ffff49a242e <+3710>:	pop    %r12
   0x00007ffff49a2430 <+3712>:	pop    %r13
   0x00007ffff49a2432 <+3714>:	pop    %r14
   0x00007ffff49a2434 <+3716>:	pop    %r15
   0x00007ffff49a2436 <+3718>:	pop    %rbp
   0x00007ffff49a2437 <+3719>:	retq   
   0x00007ffff49a2438 <+3720>:	add    $0xffffffffffffffff,%rsi
   0x00007ffff49a243c <+3724>:	lea    0x31231d(%rip),%rdi        # 0x7ffff4cb4760
   0x00007ffff49a2443 <+3731>:	callq  0x7ffff46bc360 <_ZN4core9panicking18panic_bounds_check17hcc582842c20cc97bE@plt>
   0x00007ffff49a2448 <+3736>:	jmp    0x7ffff49a245d <_ZN5rustc3mir14BasicBlockData17expand_statements17hf1d4ebe84ecd6e03E+3757>
   0x00007ffff49a244a <+3738>:	add    $0xffffffffffffffff,%r15
   0x00007ffff49a244e <+3742>:	lea    0x312323(%rip),%rdi        # 0x7ffff4cb4778
   0x00007ffff49a2455 <+3749>:	mov    %r15,%rsi
   0x00007ffff49a2458 <+3752>:	callq  0x7ffff46bc360 <_ZN4core9panicking18panic_bounds_check17hcc582842c20cc97bE@plt>
   0x00007ffff49a245d <+3757>:	ud2    
   0x00007ffff49a245f <+3759>:	lea    0x312c52(%rip),%rax        # 0x7ffff4cb50b8
   0x00007ffff49a2466 <+3766>:	mov    %rax,0x120(%rsp)
   0x00007ffff49a246e <+3774>:	movq   $0x1,0x128(%rsp)
   0x00007ffff49a247a <+3786>:	movq   $0x0,0x130(%rsp)
   0x00007ffff49a2486 <+3798>:	lea    0x324ab(%rip),%rax        # 0x7ffff49d4938 <byte_str.f.llvm.13745968970215452383>

@Mark-Simulacrum
Copy link
Member

Presuming the same two commits would be responsible for the html5ever regression, I also cannot reproduce it on 18.04 Ubuntu.

@pietroalbini pietroalbini added regression-from-stable-to-beta Performance or correctness regression from stable to beta. and removed regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. labels May 9, 2018
@alexcrichton
Copy link
Member

Using this dockerfile:

FROM ubuntu:16.04

RUN apt-get update -y && apt-get install -y curl gcc libssl-dev file make pkg-config time git

RUN curl https://sh.rustup.rs | sh -s -- -y

ENV PATH=$PATH:/root/.cargo/bin
RUN cargo install rustup-toolchain-install-master
RUN rustup-toolchain-install-master \
  c1168be5360f17516b233be85ebb193bb4e612bf  \
  a2726846f6d6280b752019472b6bd791c0752006

RUN git clone https://github.com/rust-lang-nursery/rustc-perf

RUN cargo +a2726846f6d6280b752019472b6bd791c0752006 \
  fetch --manifest-path rustc-perf/collector/benchmarks/html5ever/Cargo.toml
RUN cargo +a2726846f6d6280b752019472b6bd791c0752006 \
  build --manifest-path rustc-perf/collector/benchmarks/html5ever/Cargo.toml
RUN cargo +c1168be5360f17516b233be85ebb193bb4e612bf \
build --manifest-path rustc-perf/collector/benchmarks/html5ever/Cargo.toml

I can reproduce the regression from 10s to 20s on html5ever.

Starting from 18.04 it does indeed not show a regression.

The perf server is indeed 16.04.

Ok so it looks like this is a regression specific to something that changed between glibc 2.23 and 2.27. With the profiles it looks like it's something related to the efficiency of memcpy/memmove. It also seems like this isn't reproducible on OSX (thanks @kennytm). I can, however, reproduce it locally on Windows 64-bit MSVC.

Since @kennytm could reproduce a slightly slowdown on OSX, I wonder if these code paths do something like exercise a ton of moves? It may be the case that glibc optimized one of its routines in the 2.23 -> 2.27 transition (and OSX already had it). In that case it may appear like there's no regression on Linux/OSX but on glibc 2.23 and Windows they may have unoptimized versions of the routines causing slowdowns.

All in all I think that this is a regression that is still worth fixing, despite it not immediately being applicable on glibc 2.27 or OSX. The fact that it affects somewhat-recent versions of Ubuntu and Windows is enough for me.

@OvermindDL1
Copy link

Isn't glibc 2.27 the version released after the spectre/meltdown/whatever fixes that slow (primarily Intel but to a lesser extent AMD as well) systems down?

@alexcrichton
Copy link
Member

No sorry glibc 2.27 is just what I tested. I've now narrowed it down (conveniently) to glibc 2.23 and 2.24. Ubuntu 16.04 ships with 2.23 and 16.10 ships with 2.24. On 2.24 I'm seeing a faster compilation

Perusing the glibc release announcement it looks like this change may be why we're not seeing the regression in newer Linux

@alexcrichton
Copy link
Member

Ok I've found the source of the regression, will post a PR soon.

@michaelwoerister
Copy link
Member Author

It's a pleasure watching you folks work! :D

alexcrichton added a commit to alexcrichton/rust that referenced this issue May 9, 2018
This commit is spawned out of a performance regression investigation in rust-lang#50496.
In tracking down this regression it turned out that the `expand_statements`
function in the compiler was taking quite a long time. Further investigation
showed two key properties:

* The function was "fast" on glibc 2.24 and slow on glibc 2.23
* The hottest function was memmove from glibc

Combined together it looked like glibc gained an optimization to the memmove
function in 2.24. Ideally we don't want to rely on this optimization, so I
wanted to dig further to see what was happening.

The hottest part of `expand_statements` was `Drop for Drain` in the call to
`splice` where we insert new statements into the original vector. This *should*
be a cheap operation because we're draining and replacing iterators of the exact
same length, but under the hood memmove was being called a lot, causing a
slowdown on glibc 2.23.

It turns out that at least one of the optimizations in glibc 2.24 was that
`memmove` where the src/dst are equal becomes much faster. [This program][prog]
executes in ~2.5s against glibc 2.23 and ~0.3s against glibc 2.24, exhibiting
how glibc 2.24 is optimizing `memmove` if the src/dst are equal.

And all that brings us to what this commit itself is doing. The change here is
purely to `Drop for Drain` to avoid the call to `ptr::copy` if the region being
copied doesn't actually need to be copied. For normal usage of just `Drain`
itself this check isn't really necessary, but because `Splice` internally
contains `Drain` this provides a nice speed boost on glibc 2.23. Overall this
should fix the regression seen in rust-lang#50496 on glibc 2.23 and also fix the
regression on Windows where `memmove` looks to not have this optimization.

Note that the way `splice` was called in `expand_statements` would cause a
quadratic number of elements to be copied via `memmove` which is likely why the
tuple-stress benchmark showed such a severe regression.

Closes rust-lang#50496

[prog]: https://gist.github.com/alexcrichton/c05bc51c6771bba5ae5b57561a6c1cd3
@alexcrichton
Copy link
Member

I've posted what I believe is a fix to #50575

bors added a commit that referenced this issue May 9, 2018
std: Avoid `ptr::copy` if unnecessary in `vec::Drain`

This commit is spawned out of a performance regression investigation in #50496.
In tracking down this regression it turned out that the `expand_statements`
function in the compiler was taking quite a long time. Further investigation
showed two key properties:

* The function was "fast" on glibc 2.24 and slow on glibc 2.23
* The hottest function was memmove from glibc

Combined together it looked like glibc gained an optimization to the memmove
function in 2.24. Ideally we don't want to rely on this optimization, so I
wanted to dig further to see what was happening.

The hottest part of `expand_statements` was `Drop for Drain` in the call to
`splice` where we insert new statements into the original vector. This *should*
be a cheap operation because we're draining and replacing iterators of the exact
same length, but under the hood memmove was being called a lot, causing a
slowdown on glibc 2.23.

It turns out that at least one of the optimizations in glibc 2.24 was that
`memmove` where the src/dst are equal becomes much faster. [This program][prog]
executes in ~2.5s against glibc 2.23 and ~0.3s against glibc 2.24, exhibiting
how glibc 2.24 is optimizing `memmove` if the src/dst are equal.

And all that brings us to what this commit itself is doing. The change here is
purely to `Drop for Drain` to avoid the call to `ptr::copy` if the region being
copied doesn't actually need to be copied. For normal usage of just `Drain`
itself this check isn't really necessary, but because `Splice` internally
contains `Drain` this provides a nice speed boost on glibc 2.23. Overall this
should fix the regression seen in #50496 on glibc 2.23 and also fix the
regression on Windows where `memmove` looks to not have this optimization.

Note that the way `splice` was called in `expand_statements` would cause a
quadratic number of elements to be copied via `memmove` which is likely why the
tuple-stress benchmark showed such a severe regression.

Closes #50496

[prog]: https://gist.github.com/alexcrichton/c05bc51c6771bba5ae5b57561a6c1cd3
@nikomatsakis
Copy link
Contributor

triage: P-high

Has pending fix.

@oli-obk oli-obk added the P-high High priority label May 10, 2018
alexcrichton added a commit to alexcrichton/rust that referenced this issue May 10, 2018
…fackler

std: Avoid `ptr::copy` if unnecessary in `vec::Drain`

This commit is spawned out of a performance regression investigation in rust-lang#50496.
In tracking down this regression it turned out that the `expand_statements`
function in the compiler was taking quite a long time. Further investigation
showed two key properties:

* The function was "fast" on glibc 2.24 and slow on glibc 2.23
* The hottest function was memmove from glibc

Combined together it looked like glibc gained an optimization to the memmove
function in 2.24. Ideally we don't want to rely on this optimization, so I
wanted to dig further to see what was happening.

The hottest part of `expand_statements` was `Drop for Drain` in the call to
`splice` where we insert new statements into the original vector. This *should*
be a cheap operation because we're draining and replacing iterators of the exact
same length, but under the hood memmove was being called a lot, causing a
slowdown on glibc 2.23.

It turns out that at least one of the optimizations in glibc 2.24 was that
`memmove` where the src/dst are equal becomes much faster. [This program][prog]
executes in ~2.5s against glibc 2.23 and ~0.3s against glibc 2.24, exhibiting
how glibc 2.24 is optimizing `memmove` if the src/dst are equal.

And all that brings us to what this commit itself is doing. The change here is
purely to `Drop for Drain` to avoid the call to `ptr::copy` if the region being
copied doesn't actually need to be copied. For normal usage of just `Drain`
itself this check isn't really necessary, but because `Splice` internally
contains `Drain` this provides a nice speed boost on glibc 2.23. Overall this
should fix the regression seen in rust-lang#50496 on glibc 2.23 and also fix the
regression on Windows where `memmove` looks to not have this optimization.

Note that the way `splice` was called in `expand_statements` would cause a
quadratic number of elements to be copied via `memmove` which is likely why the
tuple-stress benchmark showed such a severe regression.

Closes rust-lang#50496

[prog]: https://gist.github.com/alexcrichton/c05bc51c6771bba5ae5b57561a6c1cd3
alexcrichton added a commit to alexcrichton/rust that referenced this issue May 10, 2018
…fackler

std: Avoid `ptr::copy` if unnecessary in `vec::Drain`

This commit is spawned out of a performance regression investigation in rust-lang#50496.
In tracking down this regression it turned out that the `expand_statements`
function in the compiler was taking quite a long time. Further investigation
showed two key properties:

* The function was "fast" on glibc 2.24 and slow on glibc 2.23
* The hottest function was memmove from glibc

Combined together it looked like glibc gained an optimization to the memmove
function in 2.24. Ideally we don't want to rely on this optimization, so I
wanted to dig further to see what was happening.

The hottest part of `expand_statements` was `Drop for Drain` in the call to
`splice` where we insert new statements into the original vector. This *should*
be a cheap operation because we're draining and replacing iterators of the exact
same length, but under the hood memmove was being called a lot, causing a
slowdown on glibc 2.23.

It turns out that at least one of the optimizations in glibc 2.24 was that
`memmove` where the src/dst are equal becomes much faster. [This program][prog]
executes in ~2.5s against glibc 2.23 and ~0.3s against glibc 2.24, exhibiting
how glibc 2.24 is optimizing `memmove` if the src/dst are equal.

And all that brings us to what this commit itself is doing. The change here is
purely to `Drop for Drain` to avoid the call to `ptr::copy` if the region being
copied doesn't actually need to be copied. For normal usage of just `Drain`
itself this check isn't really necessary, but because `Splice` internally
contains `Drain` this provides a nice speed boost on glibc 2.23. Overall this
should fix the regression seen in rust-lang#50496 on glibc 2.23 and also fix the
regression on Windows where `memmove` looks to not have this optimization.

Note that the way `splice` was called in `expand_statements` would cause a
quadratic number of elements to be copied via `memmove` which is likely why the
tuple-stress benchmark showed such a severe regression.

Closes rust-lang#50496

[prog]: https://gist.github.com/alexcrichton/c05bc51c6771bba5ae5b57561a6c1cd3
bors pushed a commit that referenced this issue May 11, 2018
This commit is spawned out of a performance regression investigation in #50496.
In tracking down this regression it turned out that the `expand_statements`
function in the compiler was taking quite a long time. Further investigation
showed two key properties:

* The function was "fast" on glibc 2.24 and slow on glibc 2.23
* The hottest function was memmove from glibc

Combined together it looked like glibc gained an optimization to the memmove
function in 2.24. Ideally we don't want to rely on this optimization, so I
wanted to dig further to see what was happening.

The hottest part of `expand_statements` was `Drop for Drain` in the call to
`splice` where we insert new statements into the original vector. This *should*
be a cheap operation because we're draining and replacing iterators of the exact
same length, but under the hood memmove was being called a lot, causing a
slowdown on glibc 2.23.

It turns out that at least one of the optimizations in glibc 2.24 was that
`memmove` where the src/dst are equal becomes much faster. [This program][prog]
executes in ~2.5s against glibc 2.23 and ~0.3s against glibc 2.24, exhibiting
how glibc 2.24 is optimizing `memmove` if the src/dst are equal.

And all that brings us to what this commit itself is doing. The change here is
purely to `Drop for Drain` to avoid the call to `ptr::copy` if the region being
copied doesn't actually need to be copied. For normal usage of just `Drain`
itself this check isn't really necessary, but because `Splice` internally
contains `Drain` this provides a nice speed boost on glibc 2.23. Overall this
should fix the regression seen in #50496 on glibc 2.23 and also fix the
regression on Windows where `memmove` looks to not have this optimization.

Note that the way `splice` was called in `expand_statements` would cause a
quadratic number of elements to be copied via `memmove` which is likely why the
tuple-stress benchmark showed such a severe regression.

Closes #50496

[prog]: https://gist.github.com/alexcrichton/c05bc51c6771bba5ae5b57561a6c1cd3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I-compiletime Issue: Problems and improvements with respect to compile times. O-linux Operating system: Linux P-high High priority regression-from-stable-to-beta Performance or correctness regression from stable to beta. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. WG-compiler-performance Working group: Compiler Performance
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants