Severe regression in html5ever build time #50496

michaelwoerister · 2018-05-07T10:02:22Z

The build time of html5ever has gone up by more than 70% for debug builds recently:
http://perf.rust-lang.org/compare.html?start=4745092d608e65ec869c0ebdb27c535f27606ea4&end=6a87289fa4f49b6bdd62f33f69a580026223421f&stat=instructions:u

There are quite a few candidates for introducing this regression in the changeset:
4745092...6a87289

Would anybody from @rust-lang/wg-compiler-performance or @rust-lang/infra care to bisect further?

The text was updated successfully, but these errors were encountered:

michaelwoerister · 2018-05-07T10:06:44Z

The "tuple stress" benchmark is affected even more while most of the other things have improved. Maybe that gives a hint?

ishitatsuyuki · 2018-05-07T10:12:34Z

My first guess would be quadratic or exponential blowup. Maybe someone can do a perf focus?

michaelwoerister · 2018-05-07T10:20:06Z

Yes, good idea!

alexcrichton · 2018-05-07T14:25:50Z

Bisection for tuple-stress points to #50282 (cc @fitzgen)

Before that PR rustc main.rs took ~4s and afterwards I killed it after 2m as I didn't want to wait for it to finish

alexcrichton · 2018-05-07T14:27:21Z

That may also be the source of the html5ever regression. cargo rustc --lib -- -Z time-passes takes 10s before that commit (the entire cargo rustc time) and 21s afterwards.

michaelwoerister · 2018-05-07T14:44:57Z

Thanks, @alexcrichton! cc also @oli-obk & @eddyb

Mark-Simulacrum · 2018-05-07T15:35:49Z

@alexcrichton Could you post instructions for replicating the slow down? Neither @nnethercote or I could reproduce it -- despite running perf on the appropriate commit range locally.

fitzgen · 2018-05-07T17:12:29Z

So should we revert that PR's commits?

alexcrichton · 2018-05-07T17:19:07Z

@Mark-Simulacrum sure yeah, using rustup-toolchain-install-master I did the equivalent of:

$ rustup-toolchain-install-master c1168be5360f17516b233be85ebb193bb4e612bf a2726846f6d6280b752019472b6bd791c0752006
$ pwd
/home/alex/code/rustc-perf
$ git rev-parse HEAD
254d28f6181cd5d20cedcd3fa9ae36df847da958
$ time rustc +a2726846f6d6280b752019472b6bd791c0752006 collector/benchmarks/tuple-stress/src/main.rs
rustc +a2726846f6d6280b752019472b6bd791c0752006   4.58s user 0.72s system 100% cpu 5.264 total
$ time rustc +c1168be5360f17516b233be85ebb193bb4e612bf collector/benchmarks/tuple-stress/src/main.rs
^C
rustc +c1168be5360f17516b233be85ebb193bb4e612bf   109.70s user 0.60s system 99% cpu 1:50.31 total

oli-obk · 2018-05-07T17:30:49Z

We should probably only run the disaggregation pass if the opt level is > 1. That's what's done for the copy prop pass already

alexcrichton · 2018-05-07T19:58:14Z

@fitzgen assuming @Mark-Simulacrum and @nnethercote can reproduce I think yeah we'll probably want to start off by reverting and possibly doing what @oli-obk suggests in a follow-up

Mark-Simulacrum · 2018-05-08T17:30:38Z

I cannot reproduce, which is odd, since I can't tell what the difference could be that would make this much of a difference...

perf.rlo$ time rustc +a2726846f6d6280b752019472b6bd791c0752006 collector/benchmarks/tuple-stress/src/main.rs

real    0m4.688s
user    0m4.559s
sys     0m0.160s
perf.rlo$ time rustc +c1168be5360f17516b233be85ebb193bb4e612bf collector/benchmarks/tuple-stress/src/main.rs

real    0m4.852s
user    0m4.659s
sys     0m0.219s

kennytm · 2018-05-08T18:05:24Z

No repro on macOS, although the c116 version is consistently slower than a272.

~/Downloads:92$ time rustc +c1168be5360f17516b233be85ebb193bb4e612bf main.rs

real	0m6.634s
user	0m6.154s
sys	0m0.474s

~/Downloads:93$ time rustc +a2726846f6d6280b752019472b6bd791c0752006 main.rs

real	0m6.171s
user	0m5.848s
sys	0m0.353s

~/Downloads:94$ time rustc +c1168be5360f17516b233be85ebb193bb4e612bf main.rs

real	0m6.492s
user	0m6.083s
sys	0m0.434s

~/Downloads:95$ time rustc +a2726846f6d6280b752019472b6bd791c0752006 main.rs

real	0m6.178s
user	0m5.848s
sys	0m0.360s

alexcrichton · 2018-05-08T19:40:06Z

Ok I swear I'm not going crazy! I think though given this data it may only be reproducible on Ubuntu 16.04? I can reproduce with this Dockerfile:

FROM ubuntu:16.04

RUN apt-get update -y && apt-get install -y curl gcc libssl-dev file make pkg-config time

RUN curl https://sh.rustup.rs | sh -s -- -y

ENV PATH=$PATH:/root/.cargo/bin
RUN cargo install rustup-toolchain-install-master
RUN rustup-toolchain-install-master \
  c1168be5360f17516b233be85ebb193bb4e612bf  \
  a2726846f6d6280b752019472b6bd791c0752006

RUN curl -Lo main.rs \
  https://github.com/rust-lang-nursery/rustc-perf/raw/254d28f6181cd5d20cedcd3fa9ae36df847da958/collector/benchmarks/tuple-stress/src/main.rs

RUN time rustc +a2726846f6d6280b752019472b6bd791c0752006 main.rs
RUN time rustc +c1168be5360f17516b233be85ebb193bb4e612bf main.rs

Can y'all reproduce the slowdown for html5ever?

alexcrichton · 2018-05-08T19:44:01Z

Sure enough starting the docker image from 18.04 means I cannot reproduce the regression.

Maybe some glibc bug was fixed which I was accidentally running into? The html5ever regression though is still worrisome and seems like it may not be connected to a glibc bug

kennytm · 2018-05-08T20:39:58Z

👍 I can repro the tuple-stress slow down on 16.04.4 LTS.

-Ztime-passes indicates it is stuck at the translation item collection step.

GDB backtrace sample

#0  __memmove_avx_unaligned () at ../sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S:325
#1  0x00007ffff49a23b3 in rustc::mir::BasicBlockData::expand_statements::hf1d4ebe84ecd6e03 () from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/bin/../lib/../lib/librustc_mir-6950d58feb41fa53.so
#2  0x00007ffff48b9bb5 in _$LT$rustc_mir..transform..deaggregator..Deaggregator$u20$as$u20$rustc_mir..transform..MirPass$GT$::run_pass::hbb873b9298ecbcc7 ()
   from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/bin/../lib/../lib/librustc_mir-6950d58feb41fa53.so
#3  0x00007ffff4900f48 in rustc_mir::transform::optimized_mir::_$u7b$$u7b$closure$u7d$$u7d$::h31fd21092d0543e7 ()
   from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/bin/../lib/../lib/librustc_mir-6950d58feb41fa53.so
#4  0x00007ffff48ea8d1 in rustc_mir::transform::optimized_mir::h1131a13b2ae69a43 () from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/bin/../lib/../lib/librustc_mir-6950d58feb41fa53.so
#5  0x00007ffff3cdb059 in rustc::ty::maps::_$LT$impl$u20$rustc..ty..maps..config..QueryConfig$LT$$u27$tcx$GT$$u20$for$u20$rustc..ty..maps..queries..optimized_mir$LT$$u27$tcx$GT$$GT$::compute::hc1ffed4360780ff2 ()
   from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/bin/../lib/../lib/librustc-6264c07cbba9680a.so
#6  0x00007ffff37e2a31 in rustc::dep_graph::graph::DepGraph::with_task_impl::h3ce3074a9b38daa4 () from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/bin/../lib/../lib/librustc-6264c07cbba9680a.so
#7  0x00007ffff39a83d7 in rustc::ty::context::tls::with_related_context::h2449d2461fb7e66f () from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/bin/../lib/../lib/librustc-6264c07cbba9680a.so
#8  0x00007ffff3a301a2 in rustc::ty::maps::plumbing::_$LT$impl$u20$rustc..ty..context..TyCtxt$LT$$u27$a$C$$u20$$u27$gcx$C$$u20$$u27$tcx$GT$$GT$::force_query_with_job::h6c5f6a6ee29b4c8c ()
   from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/bin/../lib/../lib/librustc-6264c07cbba9680a.so
#9  0x00007ffff3a9fa43 in rustc::ty::maps::plumbing::_$LT$impl$u20$rustc..ty..context..TyCtxt$LT$$u27$a$C$$u20$$u27$gcx$C$$u20$$u27$tcx$GT$$GT$::get_query::h6b180b58d3e77599 ()
   from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/bin/../lib/../lib/librustc-6264c07cbba9680a.so
#10 0x00007ffff3b0d17b in rustc::ty::_$LT$impl$u20$rustc..ty..context..TyCtxt$LT$$u27$a$C$$u20$$u27$gcx$C$$u20$$u27$tcx$GT$$GT$::maybe_optimized_mir::hbb43c33ed1c2e50f ()
   from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/bin/../lib/../lib/librustc-6264c07cbba9680a.so
#11 0x00007ffff492c71b in _$LT$rustc_mir..interpret..eval_context..EvalContext$LT$$u27$a$C$$u20$$u27$mir$C$$u20$$u27$tcx$C$$u20$M$GT$$GT$::load_mir::hdadbb8aaab5a0add ()
   from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/bin/../lib/../lib/librustc_mir-6950d58feb41fa53.so
#12 0x00007ffff4775200 in rustc_mir::interpret::const_eval::eval_body_and_ecx::hb4b25a904afc078e () from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/bin/../lib/../lib/librustc_mir-6950d58feb41fa53.so
#13 0x00007ffff4779068 in rustc_mir::interpret::const_eval::const_eval_provider::he924d56230e7faca () from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/bin/../lib/../lib/librustc_mir-6950d58feb41fa53.so
#14 0x00007ffff3cdbb57 in rustc::ty::maps::_$LT$impl$u20$rustc..ty..maps..config..QueryConfig$LT$$u27$tcx$GT$$u20$for$u20$rustc..ty..maps..queries..const_eval$LT$$u27$tcx$GT$$GT$::compute::h8881ebdb8a6b3fc9 ()
   from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/bin/../lib/../lib/librustc-6264c07cbba9680a.so
#15 0x00007ffff37f9789 in rustc::dep_graph::graph::DepGraph::with_task_impl::h970e6f3457ee9779 () from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/bin/../lib/../lib/librustc-6264c07cbba9680a.so
#16 0x00007ffff39fd5cf in rustc::ty::context::tls::with_related_context::hf6aa86ba669caacf () from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/bin/../lib/../lib/librustc-6264c07cbba9680a.so
#17 0x00007ffff3a315d8 in rustc::ty::maps::plumbing::_$LT$impl$u20$rustc..ty..context..TyCtxt$LT$$u27$a$C$$u20$$u27$gcx$C$$u20$$u27$tcx$GT$$GT$::force_query_with_job::h767d969b981cd0e1 ()
   from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/bin/../lib/../lib/librustc-6264c07cbba9680a.so
#18 0x00007ffff3a898a6 in rustc::ty::maps::plumbing::_$LT$impl$u20$rustc..ty..context..TyCtxt$LT$$u27$a$C$$u20$$u27$gcx$C$$u20$$u27$tcx$GT$$GT$::get_query::h402310c1d566e53b ()
   from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/bin/../lib/../lib/librustc-6264c07cbba9680a.so
#19 0x00007ffff3b0eacd in rustc::ty::maps::_$LT$impl$u20$rustc..ty..context..TyCtxt$LT$$u27$a$C$$u20$$u27$tcx$C$$u20$$u27$lcx$GT$$GT$::const_eval::h0dc1da25520f77c1 ()
   from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/bin/../lib/../lib/librustc-6264c07cbba9680a.so
#20 0x00007ffff4868556 in rustc_mir::monomorphize::collector::collect_items_rec::h62a919462e2120d0 () from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/bin/../lib/../lib/librustc_mir-6950d58feb41fa53.so
#21 0x00007ffff4867e82 in rustc_mir::monomorphize::collector::collect_crate_mono_items::h95367eff4f34f085 ()
   from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/bin/../lib/../lib/librustc_mir-6950d58feb41fa53.so
#22 0x00007fffed1dd4ba in rustc::util::common::time::h2e056e8595b0c79b () from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/lib/rustlib/x86_64-unknown-linux-gnu/codegen-backends/librustc_trans-llvm.so
#23 0x00007fffed2a02c9 in rustc_trans::base::collect_and_partition_translation_items::h675aeb9c6f31a5ac ()
   from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/lib/rustlib/x86_64-unknown-linux-gnu/codegen-backends/librustc_trans-llvm.so
#24 0x00007ffff37da225 in rustc::dep_graph::graph::DepGraph::with_task_impl::h2495eb20a211973b () from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/bin/../lib/../lib/librustc-6264c07cbba9680a.so
#25 0x00007ffff39e38fc in rustc::ty::context::tls::with_related_context::hb6f352ea70136898 () from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/bin/../lib/../lib/librustc-6264c07cbba9680a.so
#26 0x00007ffff3a231f0 in rustc::ty::maps::plumbing::_$LT$impl$u20$rustc..ty..context..TyCtxt$LT$$u27$a$C$$u20$$u27$gcx$C$$u20$$u27$tcx$GT$$GT$::force_query_with_job::h20d94168c3140725 ()
   from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/bin/../lib/../lib/librustc-6264c07cbba9680a.so
#27 0x00007ffff3a8f472 in rustc::ty::maps::plumbing::_$LT$impl$u20$rustc..ty..context..TyCtxt$LT$$u27$a$C$$u20$$u27$gcx$C$$u20$$u27$tcx$GT$$GT$::get_query::h47577a0ffde54cdb ()
   from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/bin/../lib/../lib/librustc-6264c07cbba9680a.so
#28 0x00007fffed29e541 in rustc_trans::base::trans_crate::hee88ee89ffeda7af () from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/lib/rustlib/x86_64-unknown-linux-gnu/codegen-backends/librustc_trans-llvm.so
#29 0x00007fffed201440 in _$LT$rustc_trans..LlvmTransCrate$u20$as$u20$rustc_trans_utils..trans_crate..TransCrate$GT$::trans_crate::h800dcda488afb790 ()
   from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/lib/rustlib/x86_64-unknown-linux-gnu/codegen-backends/librustc_trans-llvm.so
#30 0x00007ffff7a2ed36 in rustc::util::common::time::hcf488eceeda0a9cc () from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/bin/../lib/librustc_driver-faeae2bec58daae3.so
#31 0x00007ffff7a1ea4d in rustc_driver::driver::phase_4_translate_to_llvm::h13463ad7ae997e4b () from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/bin/../lib/librustc_driver-faeae2bec58daae3.so
#32 0x00007ffff7ac3016 in rustc_driver::driver::compile_input::_$u7b$$u7b$closure$u7d$$u7d$::h31e9ff1f64156b74 ()
   from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/bin/../lib/librustc_driver-faeae2bec58daae3.so
#33 0x00007ffff7ab6af5 in rustc::ty::context::tls::enter_context::hb86cbd2c401dfb18 () from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/bin/../lib/librustc_driver-faeae2bec58daae3.so
#34 0x00007ffff7a818be in _$LT$std..thread..local..LocalKey$LT$T$GT$$GT$::with::h19074a28b3607093 () from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/bin/../lib/librustc_driver-faeae2bec58daae3.so
#35 0x00007ffff79ceb5c in rustc::ty::context::TyCtxt::create_and_enter::h3499c83bc74e5346 () from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/bin/../lib/librustc_driver-faeae2bec58daae3.so
#36 0x00007ffff7a17a0b in rustc_driver::driver::compile_input::ha789cddfb4d7cf8b () from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/bin/../lib/librustc_driver-faeae2bec58daae3.so
#37 0x00007ffff7ad5735 in rustc_driver::run_compiler_impl::he0bf21247a724f12 () from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/bin/../lib/librustc_driver-faeae2bec58daae3.so
#38 0x00007ffff7a36cf5 in _$LT$scoped_tls..ScopedKey$LT$T$GT$$GT$::set::h0a4a9cc70840e625 () from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/bin/../lib/librustc_driver-faeae2bec58daae3.so
#39 0x00007ffff79eae95 in syntax::with_globals::h426e57a0d566387f () from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/bin/../lib/librustc_driver-faeae2bec58daae3.so
#40 0x00007ffff79dde2e in _$LT$std..panic..AssertUnwindSafe$LT$F$GT$$u20$as$u20$core..ops..function..FnOnce$LT$$LP$$RP$$GT$$GT$::call_once::h31900b1f5e62de36 ()
   from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/bin/../lib/librustc_driver-faeae2bec58daae3.so
#41 0x00007ffff766ff7a in __rust_maybe_catch_panic () at libpanic_unwind/lib.rs:105
#42 0x00007ffff7ad2814 in rustc_driver::run::hbcc80b82a59ea630 () from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/bin/../lib/librustc_driver-faeae2bec58daae3.so
#43 0x00007ffff7ae02bb in rustc_driver::main::hd5d5ac3338638be9 () from /home/kennytm/.rustup/toolchains/c1168be5360f17516b233be85ebb193bb4e612bf/bin/../lib/librustc_driver-faeae2bec58daae3.so
#44 0x0000555555554c13 in std::rt::lang_start::_$u7b$$u7b$closure$u7d$$u7d$::hbdc018091d795d35 ()
#45 0x00007ffff7630073 in std::rt::lang_start_internal::_$u7b$$u7b$closure$u7d$$u7d$::h0e173cc915a5091f () at libstd/rt.rs:59
#46 std::panicking::try::do_call::h40752f40c51cf13d () at libstd/panicking.rs:310
#47 0x00007ffff766ff7a in __rust_maybe_catch_panic () at libpanic_unwind/lib.rs:105
#48 0x00007ffff7645cf6 in std::panicking::try::h431a7a5b8edb0415 () at libstd/panicking.rs:289
#49 std::panic::catch_unwind::h1eb313d318d299fa () at libstd/panic.rs:374
#50 std::rt::lang_start_internal::h83a5cbcfc3241381 () at libstd/rt.rs:58
#51 0x0000555555554c74 in main ()

Code surrounding the call of that memmove

   0x00007ffff49a2303 <+3411>:	je     0x7ffff49a237b <_ZN5rustc3mir14BasicBlockData17expand_statements17hf1d4ebe84ecd6e03E+3531>
   0x00007ffff49a2305 <+3413>:	mov    0x100(%rsp),%rax
   0x00007ffff49a230d <+3421>:	mov    %rax,0xa0(%rsp)
   0x00007ffff49a2315 <+3429>:	movaps 0xf0(%rsp),%xmm0
   0x00007ffff49a231d <+3437>:	movaps %xmm0,0x90(%rsp)
   0x00007ffff49a2325 <+3445>:	movaps 0xb0(%rsp),%xmm0
   0x00007ffff49a232d <+3453>:	movaps 0xc0(%rsp),%xmm1
   0x00007ffff49a2335 <+3461>:	movdqa 0xd0(%rsp),%xmm2
   0x00007ffff49a233e <+3470>:	movaps 0xe0(%rsp),%xmm3
   0x00007ffff49a2346 <+3478>:	movaps %xmm3,0x80(%rsp)
   0x00007ffff49a234e <+3486>:	movdqa %xmm2,0x70(%rsp)
   0x00007ffff49a2354 <+3492>:	movaps %xmm1,0x60(%rsp)
   0x00007ffff49a2359 <+3497>:	movaps %xmm0,0x50(%rsp)
   0x00007ffff49a235e <+3502>:	mov    %r12,%rdi
   0x00007ffff49a2361 <+3505>:	callq  0x7ffff46c8880 <_ZN4core3ptr13drop_in_place17h3188964708fd179cE.llvm.12071625485570946806>
   0x00007ffff49a2366 <+3510>:	jmpq   0x7ffff49a2290 <_ZN5rustc3mir14BasicBlockData17expand_statements17hf1d4ebe84ecd6e03E+3296>
   0x00007ffff49a236b <+3515>:	nopl   0x0(%rax,%rax,1)
   0x00007ffff49a2370 <+3520>:	movl   $0x9,0xb0(%rsp)
   0x00007ffff49a237b <+3531>:	mov    0x368(%rsp),%rax
   0x00007ffff49a2383 <+3539>:	test   %rax,%rax
   0x00007ffff49a2386 <+3542>:	je     0x7ffff49a23c4 <_ZN5rustc3mir14BasicBlockData17expand_statements17hf1d4ebe84ecd6e03E+3604>
   0x00007ffff49a2388 <+3544>:	mov    0x380(%rsp),%r15
   0x00007ffff49a2390 <+3552>:	mov    (%r15),%rcx
   0x00007ffff49a2393 <+3555>:	mov    0x10(%r15),%rbx
   0x00007ffff49a2397 <+3559>:	imul   $0x58,0x360(%rsp),%rsi
   0x00007ffff49a23a0 <+3568>:	add    %rcx,%rsi
   0x00007ffff49a23a3 <+3571>:	imul   $0x58,%rbx,%rdi
   0x00007ffff49a23a7 <+3575>:	add    %rcx,%rdi
   0x00007ffff49a23aa <+3578>:	imul   $0x58,%rax,%rdx
   0x00007ffff49a23ae <+3582>:	callq  0x7ffff46be350 <memmove@plt>
=> 0x00007ffff49a23b3 <+3587>:	add    0x368(%rsp),%rbx
   0x00007ffff49a23bb <+3595>:	mov    %rbx,0x10(%r15)
   0x00007ffff49a23bf <+3599>:	mov    0x28(%rsp),%r15
   0x00007ffff49a23c4 <+3604>:	mov    %r14,%rdi
   0x00007ffff49a23c7 <+3607>:	callq  0x7ffff46e3080 <_ZN71_$LT$alloc..vec..IntoIter$LT$T$GT$$u20$as$u20$core..ops..drop..Drop$GT$4drop17h62e13b79d8b9a38aE>
   0x00007ffff49a23cc <+3612>:	mov    0x3b0(%rsp),%rdi
   0x00007ffff49a23d4 <+3620>:	mov    $0x38,%esi
   0x00007ffff49a23d9 <+3625>:	mov    $0x8,%edx
   0x00007ffff49a23de <+3630>:	callq  0x7ffff46bbd40 <__rust_dealloc@plt>
   0x00007ffff49a23e3 <+3635>:	lea    0x3b8(%rsp),%rdi
   0x00007ffff49a23eb <+3643>:	callq  0x7ffff4993fd0 <_ZN4core3ptr13drop_in_place17hdc52da31c044fba0E.llvm.13745968970215452383>
   0x00007ffff49a23f0 <+3648>:	cmpl   $0x9,0x3f8(%rsp)
   0x00007ffff49a23f8 <+3656>:	je     0x7ffff49a2407 <_ZN5rustc3mir14BasicBlockData17expand_statements17hf1d4ebe84ecd6e03E+3671>
   0x00007ffff49a23fa <+3658>:	lea    0x3f8(%rsp),%rdi
   0x00007ffff49a2402 <+3666>:	callq  0x7ffff49911e0 <_ZN4core3ptr13drop_in_place17h3188964708fd179cE>
   0x00007ffff49a2407 <+3671>:	mov    0x48(%rsp),%rsi
   0x00007ffff49a240c <+3676>:	cmp    0x40(%rsp),%rsi
   0x00007ffff49a2411 <+3681>:	lea    0x120(%rsp),%r14
   0x00007ffff49a2419 <+3689>:	jne    0x7ffff49a1ff3 <_ZN5rustc3mir14BasicBlockData17expand_statements17hf1d4ebe84ecd6e03E+2627>
   0x00007ffff49a241f <+3695>:	lea    0x30(%rsp),%rdi
   0x00007ffff49a2424 <+3700>:	callq  0x7ffff46e2b30 <_ZN71_$LT$alloc..vec..IntoIter$LT$T$GT$$u20$as$u20$core..ops..drop..Drop$GT$4drop17h3943334a18285159E>
   0x00007ffff49a2429 <+3705>:	lea    -0x28(%rbp),%rsp
   0x00007ffff49a242d <+3709>:	pop    %rbx
   0x00007ffff49a242e <+3710>:	pop    %r12
   0x00007ffff49a2430 <+3712>:	pop    %r13
   0x00007ffff49a2432 <+3714>:	pop    %r14
   0x00007ffff49a2434 <+3716>:	pop    %r15
   0x00007ffff49a2436 <+3718>:	pop    %rbp
   0x00007ffff49a2437 <+3719>:	retq   
   0x00007ffff49a2438 <+3720>:	add    $0xffffffffffffffff,%rsi
   0x00007ffff49a243c <+3724>:	lea    0x31231d(%rip),%rdi        # 0x7ffff4cb4760
   0x00007ffff49a2443 <+3731>:	callq  0x7ffff46bc360 <_ZN4core9panicking18panic_bounds_check17hcc582842c20cc97bE@plt>
   0x00007ffff49a2448 <+3736>:	jmp    0x7ffff49a245d <_ZN5rustc3mir14BasicBlockData17expand_statements17hf1d4ebe84ecd6e03E+3757>
   0x00007ffff49a244a <+3738>:	add    $0xffffffffffffffff,%r15
   0x00007ffff49a244e <+3742>:	lea    0x312323(%rip),%rdi        # 0x7ffff4cb4778
   0x00007ffff49a2455 <+3749>:	mov    %r15,%rsi
   0x00007ffff49a2458 <+3752>:	callq  0x7ffff46bc360 <_ZN4core9panicking18panic_bounds_check17hcc582842c20cc97bE@plt>
   0x00007ffff49a245d <+3757>:	ud2    
   0x00007ffff49a245f <+3759>:	lea    0x312c52(%rip),%rax        # 0x7ffff4cb50b8
   0x00007ffff49a2466 <+3766>:	mov    %rax,0x120(%rsp)
   0x00007ffff49a246e <+3774>:	movq   $0x1,0x128(%rsp)
   0x00007ffff49a247a <+3786>:	movq   $0x0,0x130(%rsp)
   0x00007ffff49a2486 <+3798>:	lea    0x324ab(%rip),%rax        # 0x7ffff49d4938 <byte_str.f.llvm.13745968970215452383>

Mark-Simulacrum · 2018-05-09T00:12:22Z

Presuming the same two commits would be responsible for the html5ever regression, I also cannot reproduce it on 18.04 Ubuntu.

alexcrichton · 2018-05-09T14:45:23Z

Using this dockerfile:

FROM ubuntu:16.04

RUN apt-get update -y && apt-get install -y curl gcc libssl-dev file make pkg-config time git

RUN curl https://sh.rustup.rs | sh -s -- -y

ENV PATH=$PATH:/root/.cargo/bin
RUN cargo install rustup-toolchain-install-master
RUN rustup-toolchain-install-master \
  c1168be5360f17516b233be85ebb193bb4e612bf  \
  a2726846f6d6280b752019472b6bd791c0752006

RUN git clone https://github.com/rust-lang-nursery/rustc-perf

RUN cargo +a2726846f6d6280b752019472b6bd791c0752006 \
  fetch --manifest-path rustc-perf/collector/benchmarks/html5ever/Cargo.toml
RUN cargo +a2726846f6d6280b752019472b6bd791c0752006 \
  build --manifest-path rustc-perf/collector/benchmarks/html5ever/Cargo.toml
RUN cargo +c1168be5360f17516b233be85ebb193bb4e612bf \
build --manifest-path rustc-perf/collector/benchmarks/html5ever/Cargo.toml

I can reproduce the regression from 10s to 20s on html5ever.

Starting from 18.04 it does indeed not show a regression.

The perf server is indeed 16.04.

Ok so it looks like this is a regression specific to something that changed between glibc 2.23 and 2.27. With the profiles it looks like it's something related to the efficiency of memcpy/memmove. It also seems like this isn't reproducible on OSX (thanks @kennytm). I can, however, reproduce it locally on Windows 64-bit MSVC.

Since @kennytm could reproduce a slightly slowdown on OSX, I wonder if these code paths do something like exercise a ton of moves? It may be the case that glibc optimized one of its routines in the 2.23 -> 2.27 transition (and OSX already had it). In that case it may appear like there's no regression on Linux/OSX but on glibc 2.23 and Windows they may have unoptimized versions of the routines causing slowdowns.

All in all I think that this is a regression that is still worth fixing, despite it not immediately being applicable on glibc 2.27 or OSX. The fact that it affects somewhat-recent versions of Ubuntu and Windows is enough for me.

OvermindDL1 · 2018-05-09T15:05:09Z

Isn't glibc 2.27 the version released after the spectre/meltdown/whatever fixes that slow (primarily Intel but to a lesser extent AMD as well) systems down?

alexcrichton · 2018-05-09T15:11:21Z

No sorry glibc 2.27 is just what I tested. I've now narrowed it down (conveniently) to glibc 2.23 and 2.24. Ubuntu 16.04 ships with 2.23 and 16.10 ships with 2.24. On 2.24 I'm seeing a faster compilation

Perusing the glibc release announcement it looks like this change may be why we're not seeing the regression in newer Linux

alexcrichton · 2018-05-09T16:00:13Z

Ok I've found the source of the regression, will post a PR soon.

michaelwoerister · 2018-05-09T16:04:27Z

It's a pleasure watching you folks work! :D

This commit is spawned out of a performance regression investigation in rust-lang#50496. In tracking down this regression it turned out that the `expand_statements` function in the compiler was taking quite a long time. Further investigation showed two key properties: * The function was "fast" on glibc 2.24 and slow on glibc 2.23 * The hottest function was memmove from glibc Combined together it looked like glibc gained an optimization to the memmove function in 2.24. Ideally we don't want to rely on this optimization, so I wanted to dig further to see what was happening. The hottest part of `expand_statements` was `Drop for Drain` in the call to `splice` where we insert new statements into the original vector. This *should* be a cheap operation because we're draining and replacing iterators of the exact same length, but under the hood memmove was being called a lot, causing a slowdown on glibc 2.23. It turns out that at least one of the optimizations in glibc 2.24 was that `memmove` where the src/dst are equal becomes much faster. [This program][prog] executes in ~2.5s against glibc 2.23 and ~0.3s against glibc 2.24, exhibiting how glibc 2.24 is optimizing `memmove` if the src/dst are equal. And all that brings us to what this commit itself is doing. The change here is purely to `Drop for Drain` to avoid the call to `ptr::copy` if the region being copied doesn't actually need to be copied. For normal usage of just `Drain` itself this check isn't really necessary, but because `Splice` internally contains `Drain` this provides a nice speed boost on glibc 2.23. Overall this should fix the regression seen in rust-lang#50496 on glibc 2.23 and also fix the regression on Windows where `memmove` looks to not have this optimization. Note that the way `splice` was called in `expand_statements` would cause a quadratic number of elements to be copied via `memmove` which is likely why the tuple-stress benchmark showed such a severe regression. Closes rust-lang#50496 [prog]: https://gist.github.com/alexcrichton/c05bc51c6771bba5ae5b57561a6c1cd3

alexcrichton · 2018-05-09T16:11:38Z

I've posted what I believe is a fix to #50575

std: Avoid `ptr::copy` if unnecessary in `vec::Drain` This commit is spawned out of a performance regression investigation in #50496. In tracking down this regression it turned out that the `expand_statements` function in the compiler was taking quite a long time. Further investigation showed two key properties: * The function was "fast" on glibc 2.24 and slow on glibc 2.23 * The hottest function was memmove from glibc Combined together it looked like glibc gained an optimization to the memmove function in 2.24. Ideally we don't want to rely on this optimization, so I wanted to dig further to see what was happening. The hottest part of `expand_statements` was `Drop for Drain` in the call to `splice` where we insert new statements into the original vector. This *should* be a cheap operation because we're draining and replacing iterators of the exact same length, but under the hood memmove was being called a lot, causing a slowdown on glibc 2.23. It turns out that at least one of the optimizations in glibc 2.24 was that `memmove` where the src/dst are equal becomes much faster. [This program][prog] executes in ~2.5s against glibc 2.23 and ~0.3s against glibc 2.24, exhibiting how glibc 2.24 is optimizing `memmove` if the src/dst are equal. And all that brings us to what this commit itself is doing. The change here is purely to `Drop for Drain` to avoid the call to `ptr::copy` if the region being copied doesn't actually need to be copied. For normal usage of just `Drain` itself this check isn't really necessary, but because `Splice` internally contains `Drain` this provides a nice speed boost on glibc 2.23. Overall this should fix the regression seen in #50496 on glibc 2.23 and also fix the regression on Windows where `memmove` looks to not have this optimization. Note that the way `splice` was called in `expand_statements` would cause a quadratic number of elements to be copied via `memmove` which is likely why the tuple-stress benchmark showed such a severe regression. Closes #50496 [prog]: https://gist.github.com/alexcrichton/c05bc51c6771bba5ae5b57561a6c1cd3

nikomatsakis · 2018-05-10T14:13:10Z

triage: P-high

Has pending fix.

…fackler std: Avoid `ptr::copy` if unnecessary in `vec::Drain` This commit is spawned out of a performance regression investigation in rust-lang#50496. In tracking down this regression it turned out that the `expand_statements` function in the compiler was taking quite a long time. Further investigation showed two key properties: * The function was "fast" on glibc 2.24 and slow on glibc 2.23 * The hottest function was memmove from glibc Combined together it looked like glibc gained an optimization to the memmove function in 2.24. Ideally we don't want to rely on this optimization, so I wanted to dig further to see what was happening. The hottest part of `expand_statements` was `Drop for Drain` in the call to `splice` where we insert new statements into the original vector. This *should* be a cheap operation because we're draining and replacing iterators of the exact same length, but under the hood memmove was being called a lot, causing a slowdown on glibc 2.23. It turns out that at least one of the optimizations in glibc 2.24 was that `memmove` where the src/dst are equal becomes much faster. [This program][prog] executes in ~2.5s against glibc 2.23 and ~0.3s against glibc 2.24, exhibiting how glibc 2.24 is optimizing `memmove` if the src/dst are equal. And all that brings us to what this commit itself is doing. The change here is purely to `Drop for Drain` to avoid the call to `ptr::copy` if the region being copied doesn't actually need to be copied. For normal usage of just `Drain` itself this check isn't really necessary, but because `Splice` internally contains `Drain` this provides a nice speed boost on glibc 2.23. Overall this should fix the regression seen in rust-lang#50496 on glibc 2.23 and also fix the regression on Windows where `memmove` looks to not have this optimization. Note that the way `splice` was called in `expand_statements` would cause a quadratic number of elements to be copied via `memmove` which is likely why the tuple-stress benchmark showed such a severe regression. Closes rust-lang#50496 [prog]: https://gist.github.com/alexcrichton/c05bc51c6771bba5ae5b57561a6c1cd3

This commit is spawned out of a performance regression investigation in #50496. In tracking down this regression it turned out that the `expand_statements` function in the compiler was taking quite a long time. Further investigation showed two key properties: * The function was "fast" on glibc 2.24 and slow on glibc 2.23 * The hottest function was memmove from glibc Combined together it looked like glibc gained an optimization to the memmove function in 2.24. Ideally we don't want to rely on this optimization, so I wanted to dig further to see what was happening. The hottest part of `expand_statements` was `Drop for Drain` in the call to `splice` where we insert new statements into the original vector. This *should* be a cheap operation because we're draining and replacing iterators of the exact same length, but under the hood memmove was being called a lot, causing a slowdown on glibc 2.23. It turns out that at least one of the optimizations in glibc 2.24 was that `memmove` where the src/dst are equal becomes much faster. [This program][prog] executes in ~2.5s against glibc 2.23 and ~0.3s against glibc 2.24, exhibiting how glibc 2.24 is optimizing `memmove` if the src/dst are equal. And all that brings us to what this commit itself is doing. The change here is purely to `Drop for Drain` to avoid the call to `ptr::copy` if the region being copied doesn't actually need to be copied. For normal usage of just `Drain` itself this check isn't really necessary, but because `Splice` internally contains `Drain` this provides a nice speed boost on glibc 2.23. Overall this should fix the regression seen in #50496 on glibc 2.23 and also fix the regression on Windows where `memmove` looks to not have this optimization. Note that the way `splice` was called in `expand_statements` would cause a quadratic number of elements to be copied via `memmove` which is likely why the tuple-stress benchmark showed such a severe regression. Closes #50496 [prog]: https://gist.github.com/alexcrichton/c05bc51c6771bba5ae5b57561a6c1cd3

michaelwoerister added I-compiletime Issue: Problems and improvements with respect to compile times. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. WG-compiler-performance Working group: Compiler Performance labels May 7, 2018

michaelwoerister added the regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. label May 7, 2018

alexcrichton added this to the 1.27 milestone May 7, 2018

kennytm added the O-linux Operating system: Linux label May 8, 2018

pietroalbini added regression-from-stable-to-beta Performance or correctness regression from stable to beta. and removed regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. labels May 9, 2018

alexcrichton mentioned this issue May 9, 2018

std: Avoid ptr::copy if unnecessary in vec::Drain #50575

Merged

oli-obk added the P-high High priority label May 10, 2018

bors closed this as completed in #50575 May 11, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Severe regression in html5ever build time #50496

Severe regression in html5ever build time #50496

michaelwoerister commented May 7, 2018

michaelwoerister commented May 7, 2018

ishitatsuyuki commented May 7, 2018

michaelwoerister commented May 7, 2018

alexcrichton commented May 7, 2018

alexcrichton commented May 7, 2018

michaelwoerister commented May 7, 2018

Mark-Simulacrum commented May 7, 2018

fitzgen commented May 7, 2018

alexcrichton commented May 7, 2018

oli-obk commented May 7, 2018

alexcrichton commented May 7, 2018

Mark-Simulacrum commented May 8, 2018

kennytm commented May 8, 2018 •

edited

Loading

alexcrichton commented May 8, 2018 •

edited

Loading

alexcrichton commented May 8, 2018

kennytm commented May 8, 2018 •

edited

Loading

Mark-Simulacrum commented May 9, 2018

alexcrichton commented May 9, 2018

OvermindDL1 commented May 9, 2018

alexcrichton commented May 9, 2018

alexcrichton commented May 9, 2018

michaelwoerister commented May 9, 2018

alexcrichton commented May 9, 2018

nikomatsakis commented May 10, 2018

Severe regression in html5ever build time #50496

Severe regression in html5ever build time #50496

Comments

michaelwoerister commented May 7, 2018

michaelwoerister commented May 7, 2018

ishitatsuyuki commented May 7, 2018

michaelwoerister commented May 7, 2018

alexcrichton commented May 7, 2018

alexcrichton commented May 7, 2018

michaelwoerister commented May 7, 2018

Mark-Simulacrum commented May 7, 2018

fitzgen commented May 7, 2018

alexcrichton commented May 7, 2018

oli-obk commented May 7, 2018

alexcrichton commented May 7, 2018

Mark-Simulacrum commented May 8, 2018

kennytm commented May 8, 2018 • edited Loading

alexcrichton commented May 8, 2018 • edited Loading

alexcrichton commented May 8, 2018

kennytm commented May 8, 2018 • edited Loading

Mark-Simulacrum commented May 9, 2018

alexcrichton commented May 9, 2018

OvermindDL1 commented May 9, 2018

alexcrichton commented May 9, 2018

alexcrichton commented May 9, 2018

michaelwoerister commented May 9, 2018

alexcrichton commented May 9, 2018

nikomatsakis commented May 10, 2018

kennytm commented May 8, 2018 •

edited

Loading

alexcrichton commented May 8, 2018 •

edited

Loading

kennytm commented May 8, 2018 •

edited

Loading