Compile-Time Performance Regression #37864

Mark-Simulacrum · 2016-11-18T16:37:20Z

#37660 appears to have regressed performance by ~6% on bootstrap, due to a near tripling in time for item-bodies checking (23s to 62s). I'm not sure if that was expected or not, but someone should probably investigate. Let me know if I should open a new issue about that.

See here for a comparison across all crates.

cc @nikomatsakis

Mark-Simulacrum · 2016-11-18T16:39:26Z

Better comparison here (more localized to the exact time range): http://perf.rust-lang.org/compare.html?date_a=2016-11-17T05%3A13%3A09.000Z&date_b=2016-11-18T01%3A31%3A01.000Z&kind=benchmarks&crates=futures-rs-test-all%2Chelloworld%2Chtml5ever-2016-08-25%2Chyper.0.5.0%2Cinflate-0.1.0%2Cissue-32062-equality-relations-complexity%2Cissue-32278-big-array-of-strings%2Cjld-day15-parser%2Cpiston-image-0.10.3%2Cpiston-image-0.3.11%2Cregex-macros.0.1.30%2Cregex.0.1.30%2Crust-encoding-0.3.0%2Crust-encoding.0.2.32%2Csyntex-0.42.2%2Csyntex-0.42.2-incr-clean&phases=total&group_by=crate.

nikomatsakis · 2016-11-18T21:49:33Z

Hmm, yeah, not really expected.

nikomatsakis · 2016-11-18T21:57:43Z

This is the diff that affected rustc_typeck crate:

https://gist.github.com/nikomatsakis/ca47ebbcd264452539074899b6d09355

Not seeing yet what might have caused such a perturbation.

michaelwoerister · 2016-11-18T22:03:56Z

@eddyb: You also had a look at that PR, any ideas?

Mark-Simulacrum · 2016-11-18T22:08:23Z

Is it possible the PR increased the amount of code in librustc and hit a pathological case in some way? This shows that most crates had little-to-no difference except for librustc itself, which jumped up by ~30 seconds.

michaelwoerister · 2016-11-18T22:12:57Z

The syntex-syntax test case also shows the regression, so there's definitely something to it.

Mark-Simulacrum · 2016-11-18T22:14:09Z

It looks like I was wrong with my initial assessment about rustc being the only one to show the increase, but this graph shows that it has the largest increase by far out of most rustc crates in that pass.

eddyb · 2016-11-19T01:40:48Z

I'd suggest comparing with callgrind: if the number of calls related to inference, for example, change, well... My suspicion is basically "impl children get checked twice" but then tests couldn't pass because errors would also be doubled? I'm not sure.

nnethercote · 2016-11-21T04:45:13Z

nearest_common_ancestor is at least part of the problem, according to Cachegrind. For example, from syntex:

--------------------------------------------------------------------------------
             Ir
--------------------------------------------------------------------------------
141,932,296,063  PROGRAM TOTALS

--------------------------------------------------------------------------------
           Ir  file:function
--------------------------------------------------------------------------------
6,040,510,628  /home/njn/moz/rust0/src/librustc/middle/region.rs:rustc::middle::region::RegionMaps::nearest_common_ancestor
5,280,761,738  /build/glibc-Qz8a69/glibc-2.23/malloc/malloc.c:_int_malloc
4,376,166,866  /home/njn/moz/rust0/src/librustc/middle/region.rs:rustc::middle::region::RegionMaps::nearest_common_ancestor::ancestors_of
3,449,453,667  /home/njn/moz/rust0/src/libcollections/vec.rs:rustc::middle::region::RegionMaps::nearest_common_ancestor::ancestors_of
3,236,809,213  /build/glibc-Qz8a69/glibc-2.23/malloc/malloc.c:_int_free
1,979,912,076  ???:???
1,959,605,584  /build/glibc-Qz8a69/glibc-2.23/string/../sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S:__memcpy_avx_unaligned
1,899,088,086  /home/njn/moz/rust0/src/rt/miniz.c:tdefl_compress
1,767,561,366  /build/glibc-Qz8a69/glibc-2.23/malloc/malloc.c:malloc
1,540,380,901  /home/njn/moz/rust0/src/libstd/collections/hash/table.rs:<std::collections::hash::set::HashSet<T, S>>::insert
1,377,680,312  /home/njn/moz/rust0/src/liballoc/raw_vec.rs:rustc::middle::region::RegionMaps::nearest_common_ancestor::ancestors_of
1,144,766,653  /home/njn/moz/rust0/src/libstd/collections/hash/table.rs:<std::collections::hash::set::HashSet<T, S>>::get

The 142M instructions executed is up from 124M. I've not seen nearest_common_ancestor in any profiles prior to today.

If the problem can't be found soon I suggest reverting #37660.

nikomatsakis · 2016-11-21T16:10:19Z

So I did some experimentation with a small test file: it's certainly not as simple as something in typeck happening twice, or at least if it is I didn't figure out what yet.

@nnethercote's samples suggest something about region inference, but we are not running regionck more often, as far as I can tell (nor typeck). Or at least we don't do so on a very simple test case. I will try experimenting with some bigger ones.

If the problem can't be found soon I suggest reverting #37660.

I'm not ready to revert yet. Please consult with me before considering such a thing.

nikomatsakis · 2016-11-21T17:34:14Z

OK, I may have found the culprit.

nikomatsakis · 2016-11-21T18:35:57Z

Fix in #37920

nnethercote · 2016-11-21T19:59:30Z

I'm not ready to revert yet. Please consult with me before considering such a thing.

I wouldn't presume to revert, but my statement was intended to mark the beginning of such a consultation :)

retep998 · 2016-11-22T08:37:50Z

winapi was totally hit by this.
https://gist.github.com/Arnavion/ddcd1af4dc3393b35fa0f11c5dc3119e

Arnavion · 2016-11-22T09:20:29Z

And I confirmed that #37920 fixes winapi build times back to ~35s, the same that it takes with stable.

The `visit_fn` code mutates its surrounding context. Between *items*, this was saved/restored, but between impl items it was not. This meant that we wound up with `CallSiteScope` entries with two parents (or more!). As far as I can tell, this is harmless in actual type-checking, since the regions you interact with are always from at most one of those branches. But it can slow things down. Before, the effect was limited, since it only applied to impl items within an impl. After rust-lang#37660, impl items are visisted all together at the end, and hence this could create a very messed up hierarchy. Isolating impl item properly solves both issues. I cannot come up with a way to unit-test this; for posterity, however, you can observe the messed up hierarchies with a test as simple as the following, which would create a callsite scope with two parents both before and after ``` struct Foo { } impl Foo { fn bar(&self) -> usize { 22 } fn baz(&self) -> usize { 22 } } fn main() { } ``` Fixes rust-lang#37864.

@michaelwoerister

in region, treat current (and future) item-likes alike The `visit_fn` code mutates its surrounding context. Between *items*, this was saved/restored, but between impl items it was not. This meant that we wound up with `CallSiteScope` entries with two parents (or more!). As far as I can tell, this is harmless in actual type-checking, since the regions you interact with are always from at most one of those branches. But it can slow things down. Before, the effect was limited, since it only applied to impl items within an impl. After #37660, impl items are visisted all together at the end, and hence this could create a very messed up hierarchy. Isolating impl item properly solves both issues. I cannot come up with a way to unit-test this; for posterity, however, you can observe the messed up hierarchies with a test as simple as the following, which would create a callsite scope with two parents both before and after ``` struct Foo { } impl Foo { fn bar(&self) -> usize { 22 } fn baz(&self) -> usize { 22 } } fn main() { } ``` Fixes #37864. r? @michaelwoerister cc @pnkfelix -- can you think of a way to make a regr test?

@michaelwoerister

in region, treat current (and future) item-likes alike The `visit_fn` code mutates its surrounding context. Between *items*, this was saved/restored, but between impl items it was not. This meant that we wound up with `CallSiteScope` entries with two parents (or more!). As far as I can tell, this is harmless in actual type-checking, since the regions you interact with are always from at most one of those branches. But it can slow things down. Before, the effect was limited, since it only applied to impl items within an impl. After #37660, impl items are visisted all together at the end, and hence this could create a very messed up hierarchy. Isolating impl item properly solves both issues. I cannot come up with a way to unit-test this; for posterity, however, you can observe the messed up hierarchies with a test as simple as the following, which would create a callsite scope with two parents both before and after ``` struct Foo { } impl Foo { fn bar(&self) -> usize { 22 } fn baz(&self) -> usize { 22 } } fn main() { } ``` Fixes #37864. r? @michaelwoerister cc @pnkfelix -- can you think of a way to make a regr test?

@michaelwoerister

in region, treat current (and future) item-likes alike The `visit_fn` code mutates its surrounding context. Between *items*, this was saved/restored, but between impl items it was not. This meant that we wound up with `CallSiteScope` entries with two parents (or more!). As far as I can tell, this is harmless in actual type-checking, since the regions you interact with are always from at most one of those branches. But it can slow things down. Before, the effect was limited, since it only applied to impl items within an impl. After #37660, impl items are visisted all together at the end, and hence this could create a very messed up hierarchy. Isolating impl item properly solves both issues. I cannot come up with a way to unit-test this; for posterity, however, you can observe the messed up hierarchies with a test as simple as the following, which would create a callsite scope with two parents both before and after ``` struct Foo { } impl Foo { fn bar(&self) -> usize { 22 } fn baz(&self) -> usize { 22 } } fn main() { } ``` Fixes #37864. r? @michaelwoerister cc @pnkfelix -- can you think of a way to make a regr test?

nnethercote · 2016-12-05T00:27:54Z

I remeasured and I can confirm the regression is fixed. Performance on rustc-benchmarks is basically equivalent to what it was on Nov 14, my last measurement prior to the regression.

michaelwoerister · 2016-12-05T14:10:21Z

🎉

nikomatsakis self-assigned this Nov 18, 2016

nikomatsakis added I-compiletime Issue: Problems and improvements with respect to compile times. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Nov 18, 2016

nikomatsakis mentioned this issue Nov 21, 2016

in region, treat current (and future) item-likes alike #37920

Merged

bors closed this as completed in #37920 Dec 4, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compile-Time Performance Regression #37864

Compile-Time Performance Regression #37864

Mark-Simulacrum commented Nov 18, 2016

Mark-Simulacrum commented Nov 18, 2016

nikomatsakis commented Nov 18, 2016

nikomatsakis commented Nov 18, 2016

michaelwoerister commented Nov 18, 2016

Mark-Simulacrum commented Nov 18, 2016

michaelwoerister commented Nov 18, 2016

Mark-Simulacrum commented Nov 18, 2016

eddyb commented Nov 19, 2016

nnethercote commented Nov 21, 2016

nikomatsakis commented Nov 21, 2016

nikomatsakis commented Nov 21, 2016

nikomatsakis commented Nov 21, 2016

nnethercote commented Nov 21, 2016

retep998 commented Nov 22, 2016

Arnavion commented Nov 22, 2016

nnethercote commented Dec 5, 2016

michaelwoerister commented Dec 5, 2016

Compile-Time Performance Regression #37864

Compile-Time Performance Regression #37864

Comments

Mark-Simulacrum commented Nov 18, 2016

Mark-Simulacrum commented Nov 18, 2016

nikomatsakis commented Nov 18, 2016

nikomatsakis commented Nov 18, 2016

michaelwoerister commented Nov 18, 2016

Mark-Simulacrum commented Nov 18, 2016

michaelwoerister commented Nov 18, 2016

Mark-Simulacrum commented Nov 18, 2016

eddyb commented Nov 19, 2016

nnethercote commented Nov 21, 2016

nikomatsakis commented Nov 21, 2016

nikomatsakis commented Nov 21, 2016

nikomatsakis commented Nov 21, 2016

nnethercote commented Nov 21, 2016

retep998 commented Nov 22, 2016

Arnavion commented Nov 22, 2016

nnethercote commented Dec 5, 2016

michaelwoerister commented Dec 5, 2016