Nonlinear compile time blow-up with deeply nested types #38528

sfackler · 2016-12-22T00:51:17Z

Compiling the postgres-tokio crate at sfackler/rust-postgres@d27518b goes from 5 seconds to 45 seconds on nightly if the two .boxed() calls in the middle of this call chain are removed: https://github.com/sfackler/rust-postgres/blob/d27518ba76d76ccaa59b3ccd63e981bd8bd0ef33/postgres-tokio/src/lib.rs#L342-L408.

Looks like 15 seconds is spent in translation item collection, and 39 seconds is spent in translation:

time: 0.004	parsing
time: 0.005	recursion limit
time: 0.000	crate injection
time: 0.001	plugin loading
time: 0.000	plugin registration
time: 0.238	expansion
time: 0.000	maybe building test harness
time: 0.000	maybe creating a macro crate
time: 0.000	checking for inline asm in case the target doesn't support it
time: 0.000	complete gated feature checking
time: 0.008	early lint checks
time: 0.000	AST validation
time: 0.005	name resolution
time: 0.004	lowering ast -> hir
time: 0.002	indexing hir
time: 0.000	attribute checking
time: 0.001	language item collection
time: 0.001	lifetime resolution
time: 0.000	looking for entry point
time: 0.000	looking for plugin registrar
time: 0.001	region resolution
time: 0.000	loop checking
time: 0.000	static item recursion checking
time: 0.005	compute_incremental_hashes_map
time: 0.000	load_dep_graph
time: 0.000	stability index
time: 0.001	stability checking
time: 0.011	type collecting
time: 0.000	variance inference
time: 0.000	impl wf inference
time: 0.021	coherence checking
time: 0.008	wf checking
time: 0.003	item-types checking
time: 0.500	item-bodies checking
time: 0.000	drop-impl checking
time: 0.003	const checking
time: 0.001	privacy checking
time: 0.000	intrinsic checking
time: 0.000	effect checking
time: 0.001	match checking
time: 0.000	liveness checking
time: 0.002	rvalue checking
time: 0.014	MIR dump
  time: 0.001	SimplifyCfg
  time: 0.002	QualifyAndPromoteConstants
  time: 0.002	TypeckMir
  time: 0.000	SimplifyBranches
  time: 0.001	SimplifyCfg
time: 0.006	MIR cleanup and validation
time: 0.006	borrow checking
time: 0.000	reachability checking
time: 0.000	death checking
time: 0.000	unused lib feature checking
time: 1.366	lint checking
time: 0.000	resolving dependency formats
  time: 0.000	NoLandingPads
  time: 0.001	SimplifyCfg
  time: 0.001	EraseRegions
  time: 0.000	AddCallGuards
  time: 0.008	ElaborateDrops
  time: 0.000	NoLandingPads
  time: 0.001	SimplifyCfg
  time: 0.001	InstCombine
  time: 0.000	Deaggregator
  time: 0.000	CopyPropagation
  time: 0.001	SimplifyLocals
  time: 0.000	AddCallGuards
  time: 0.000	PreTrans
time: 0.015	MIR optimisations
  time: 0.001	write metadata
  time: 15.086	translation item collection
  time: 0.020	codegen unit partitioning
  time: 0.022	internalize symbols
time: 39.376	translation
time: 0.000	assert dep graph
time: 0.000	serialize dep graph
  time: 0.089	llvm function passes [0]
  time: 0.037	llvm module passes [0]
  time: 2.148	codegen passes [0]
  time: 0.001	codegen passes [0]
time: 2.432	LLVM passes
time: 0.000	serialize work products
time: 0.084	linking

Things are significantly worse on 1.13 - 2 minutes in translation!

Some discussion in IRC: https://botbot.me/mozilla/rust-internals/2016-12-22/?msg=78294648&page=1

cc @aturon

UPDATE: #40280 was closed as a duplicate of this. It had the following sample code:

let &(first, B, C, D, E, F, G, H) = self;
let iter = first.buffers_list();
let iter = iter.chain(B.buffers_list());
let iter = iter.chain(C.buffers_list());
let iter = iter.chain(D.buffers_list());
let iter = iter.chain(E.buffers_list());
let iter = iter.chain(F.buffers_list());
let iter = iter.chain(G.buffers_list());
let iter = iter.chain(H.buffers_list());
Box::new(iter)

--nmatsakis

The text was updated successfully, but these errors were encountered:

nikomatsakis · 2016-12-22T16:53:55Z

Things are significantly worse on 1.13 - 2 minutes in translation!

This may be the effect of the projection cache.

aturon · 2016-12-22T17:04:39Z

cc @alexcrichton This is obviously a show-stopper for futures.

nikomatsakis · 2016-12-22T19:03:51Z

I can do some profiling. I've had some plans for improving collection/trans that I think may be related. One question to try and answer is what %age of this is just "we are making more code" vs "we are wasting time doing things in trait selection that could be cached". I have observed the latter from time to time and had some thoughts on how to fix it.

alexcrichton · 2016-12-25T21:50:11Z

I've also seen this before, with tokio-socks5 as well. Removing just a handful of the trait objects in that file makes the compile time of the crate shoot from 2.34s to 89.52s (!!)

dwrensha · 2017-01-20T02:13:09Z

Here's a simple example of something that takes a very long time to compile:

future::ok::<(),()>(()).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(()));

nikomatsakis · 2017-03-09T21:51:57Z

Closed #40280 as a duplicate, moved some example code into the issue header.

vorner · 2017-03-18T21:44:51Z

Just a stupid question ‒ the „translation item collection“, which is one of the two culprits here, is listed in the „MIR optimisations“. Does it really have to happen at all on non-optimised debug build?

bluss · 2017-03-18T21:56:59Z

It's natural to read it that way, but the headers are actual after their group, so it's part of translation

vorner · 2017-03-27T08:20:41Z

This seems to be hard enough to take some time fixing. So I thought of a workaround, if someone is also interested. It uses the trick with placing trait objects to split the chains of modifiers, but only on testing/debug builds where the compilation speed matters, while it keeps the complex but hopefully faster concrete types in release build:

This one is for streams (that's what I needed), but can obviously work for futures or other things as well:

#[cfg(debug_assertions)]
fn test_boxed<T, E, S>(s: S) -> Box<Stream<Item = T, Error = E>>
    where S: Stream<Item = T, Error = E> + 'static
{
    Box::new(s)
}
#[cfg(not(debug_assertions))]
fn test_boxed<S>(s: S) -> S {
    s
}

(I didn't find any better config option than the debug_assertions, but I hope that one is good enough)

Compilation of tests is *really* slow, most likely due to rust-lang/rust#41696 or rust-lang/rust#38528

Mark-Simulacrum · 2017-07-02T21:37:38Z

Nominating in place of #42941. cc @eddyb (nominator)

jonhoo · 2017-07-02T21:52:34Z

Repeating another bad case from #42941:

extern crate futures;
use futures::{future, IntoFuture, Future};
fn main() {
    let t: std::result::Result<(), ()> = Ok(());
    let f = t
            .into_future()
            .and_then(|_| future::ok(()))
            .and_then(|_| future::ok(()))
            .and_then(|_| future::ok(()))
            .and_then(|_| future::ok(()))
            .and_then(|_| future::ok(()))
            .and_then(|_| future::ok(()))
            .and_then(|_| future::ok(()))
            .and_then(|_| future::ok(()))
            .and_then(|_| future::ok(()))
            .and_then(|_| future::ok(()))
            .and_then(|_| future::ok(()))
            .and_then(|_| future::ok(()))
            .and_then(|_| future::ok(()))
            .and_then(|_| future::ok(()))
            .and_then(|_| future::ok(()))
            .and_then(|_| future::ok(()))
            .and_then(|_| future::ok(()))
            .and_then(|_| future::ok(()));
    f.wait();
}

The code above takes ~750s to compile on my laptop (you can make it shorter/longer by removing/adding .and_then calls):

...
  time: 0.000; rss: 108MB       write metadata
  time: 472.840; rss: 110MB     translation item collection
  time: 0.001; rss: 112MB       codegen unit partitioning
  time: 0.001; rss: 133MB       internalize symbols
time: 732.741; rss: 133MB       translation
...

The extra ~250s is spent between codegen unit partitioning and internalize symbols.

jonhoo · 2017-07-06T02:06:38Z

Another observation from #42941 about why this is becoming even more important is that the newly-released hyper 0.11 uses futures everywhere, which will likely cause many more programs to use chains of futures like this going forward.

nikomatsakis · 2017-07-06T20:52:37Z

triage: P-high

We are raising to "high priority" to at least do some investigation and try to determine whether revised trait solving strategies will be of use here.

This commit should be reverted once that issue has been resolved.

arielb1 · 2017-08-10T20:23:15Z

status: waiting for niko

ishitatsuyuki · 2017-08-31T08:01:18Z

Bump. This really hurts impl Trait (boxing hides this issue).

ishitatsuyuki · 2018-02-17T06:14:59Z

@arielb1 You seem to have created this FIXME. Can you give me some hints on implementing cross-infcx cache?

ishitatsuyuki · 2018-02-17T15:00:00Z

I have launched a PoC in #48296 and I confirm that the exponential part should be resolved. I still observe some degree of polynomial time algorithms, which makes rustc chokes at about 50 chain().

ishitatsuyuki · 2018-02-17T15:02:01Z

@hcpl I suspect you copied the wrong code for the specialized version above, as I observe the same time-passes behavior and the code lacks main() despite you specify --crate-type bin for the specialized version.

Can you check again, and if possible, provide the correct specialized version code?

ishitatsuyuki · 2018-02-18T01:03:24Z

Update: although I have fixed the normalization part, the typeck still seems to be exponential on time, and very memory consuming. I'm investigating the cause of this second issue.

ishitatsuyuki · 2018-02-18T12:32:32Z

I have implemented a fix for the typeck part too, and I believe there's no more exponential algorithms in rustc anymore.

hcpl · 2018-02-18T12:41:26Z

@ishitatsuyuki the code is correct, but both rustc invocations should have been with --crate-type=lib src/lib.rs (which were assumed to be in different crates). Sorry about that.

I've changed filenames to foo_generic.rs and foo_specialized.rs respectively to better reflect their purpose.

Fix exponential projection complexity on nested types This implements solution 1 from #38528 (comment). The code quality is currently extremely poor, but we can improve them during review. Blocking issues: - we probably don't want a quadratic deduplication for obligations. - is there an alternative to deduplication? Based on #48315. Needs changelog. Noticable improvement on compile time is expected. Fix #38528 Close #39684 Close #43757

…sakis Fix exponential projection complexity on nested types This implements solution 1 from rust-lang#38528 (comment). The code quality is currently extremely poor, but we can improve them during review. Blocking issues: - we probably don't want a quadratic deduplication for obligations. - is there an alternative to deduplication? Based on rust-lang#48315. Needs changelog. Noticable improvement on compile time is expected. Fix rust-lang#38528 Close rust-lang#39684 Close rust-lang#43757

Compilation of tests is *really* slow, most likely due to rust-lang/rust#41696 or rust-lang/rust#38528

This commit should be reverted once that issue has been resolved.

Consider changing assert! to debug_assert! when it calls visit_with The perf run from rust-lang#52956 revealed that there were 3 benchmarks that benefited most from changing `assert!`s to `debug_assert!`s: - issue rust-lang#46449: avg -4.7% for -check - deeply-nested (AKA rust-lang#38528): avg -3.4% for -check - regression rust-lang#31157: avg -3.2% for -check I analyzed their fixing PRs and decided to look for potentially heavy assertions in the files they modified. I noticed that all of the non-trivial ones contained indirect calls to `visit_with()`. It might be a good idea to consider changing `assert!` to `debug_assert!` in those places in order to get the performance wins shown by the benchmarks.

remove obligation dedup from `impl_or_trait_obligations` Looking at the examples from rust-lang#38528 they all seem to compile fine even without this and it seems like this might be unnecessary effort

- we need buffers in the deserialization methods, so i want ahead and added them to the serialization methods too - `boxed` in the `and_then` chain apparently makes both runtime performance and compile time usable, see: [here]()rust-lang/rust#38528

sfackler added I-compiletime Issue: Problems and improvements with respect to compile times. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Dec 22, 2016

nikomatsakis mentioned this issue Mar 9, 2017

Methods take an insanely long time to compile #40280

Closed

jonhoo added a commit to jonhoo/fantoccini that referenced this issue Jun 20, 2017

Go back to ignoring the tests

79e1683

Compilation of tests is *really* slow, most likely due to rust-lang/rust#41696 or rust-lang/rust#38528

Mark-Simulacrum mentioned this issue Jun 23, 2017

Seemingly pathological behavior in typechecking, item collecting, and trans #33594

Closed

sfackler mentioned this issue Jul 2, 2017

Long compile time due to excessively long translation passes #42941

Closed

Mark-Simulacrum added the I-nominated label Jul 2, 2017

jonhoo mentioned this issue Jul 2, 2017

Borrow checking time regression from stable to nightly #43018

Closed

rust-highfive added P-high High priority and removed I-nominated labels Jul 6, 2017

nikomatsakis self-assigned this Jul 6, 2017

jonhoo added a commit to jonhoo/fantoccini that referenced this issue Jul 8, 2017

Box all chains to work around rust-lang/rust#38528

d2442af

This commit should be reverted once that issue has been resolved.

Mark-Simulacrum added the C-bug Category: This is a bug. label Jul 26, 2017

jonhoo mentioned this issue Aug 22, 2017

1.20.0-beta.1 regression in memory usage #43789

Closed

aturon added this to the impl period milestone Sep 4, 2017

ishitatsuyuki mentioned this issue Feb 17, 2018

Fix exponential projection complexity on nested types #48296

Merged

ishitatsuyuki mentioned this issue Feb 18, 2018

type checker takes O(~1.5^recursion_limit) time to reject simple-ish code #40353

Open

bors closed this as completed in #48296 Feb 25, 2018

jonhoo added a commit to jonhoo/fantoccini that referenced this issue Feb 27, 2018

Remove boxes now that rust-lang/rust#38528 is fixed

eb52484

ishitatsuyuki mentioned this issue Mar 1, 2018

Add benchmark for deeply nested types rust-lang/rustc-perf#185

Merged

phrohdoh pushed a commit to phrohdoh/fantoccini that referenced this issue May 11, 2018

Go back to ignoring the tests

5507e1d

Compilation of tests is *really* slow, most likely due to rust-lang/rust#41696 or rust-lang/rust#38528

phrohdoh pushed a commit to phrohdoh/fantoccini that referenced this issue May 11, 2018

Box all chains to work around rust-lang/rust#38528

6a0d5d3

This commit should be reverted once that issue has been resolved.

phrohdoh pushed a commit to phrohdoh/fantoccini that referenced this issue May 11, 2018

Remove boxes now that rust-lang/rust#38528 is fixed

59dc016

This was referenced Aug 2, 2018

Use debug_assert! instead of assert! where possible #52956

Closed

Consider changing assert! to debug_assert! when it calls visit_with #53025

Merged

scottmcm mentioned this issue Mar 24, 2020

Implement Chain with Fuses #70332

Closed

lcnr mentioned this issue May 5, 2021

remove obligation dedup from impl_or_trait_obligations #84944

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nonlinear compile time blow-up with deeply nested types #38528

Nonlinear compile time blow-up with deeply nested types #38528

sfackler commented Dec 22, 2016 •

edited by nikomatsakis

Loading

nikomatsakis commented Dec 22, 2016 •

edited

Loading

aturon commented Dec 22, 2016

nikomatsakis commented Dec 22, 2016

alexcrichton commented Dec 25, 2016

dwrensha commented Jan 20, 2017

nikomatsakis commented Mar 9, 2017

vorner commented Mar 18, 2017

bluss commented Mar 18, 2017

vorner commented Mar 27, 2017

Mark-Simulacrum commented Jul 2, 2017

jonhoo commented Jul 2, 2017

jonhoo commented Jul 6, 2017

nikomatsakis commented Jul 6, 2017

arielb1 commented Aug 10, 2017

ishitatsuyuki commented Aug 31, 2017

ishitatsuyuki commented Feb 17, 2018

ishitatsuyuki commented Feb 17, 2018

ishitatsuyuki commented Feb 17, 2018

ishitatsuyuki commented Feb 18, 2018

ishitatsuyuki commented Feb 18, 2018

hcpl commented Feb 18, 2018

Nonlinear compile time blow-up with deeply nested types #38528

Nonlinear compile time blow-up with deeply nested types #38528

Comments

sfackler commented Dec 22, 2016 • edited by nikomatsakis Loading

nikomatsakis commented Dec 22, 2016 • edited Loading

aturon commented Dec 22, 2016

nikomatsakis commented Dec 22, 2016

alexcrichton commented Dec 25, 2016

dwrensha commented Jan 20, 2017

nikomatsakis commented Mar 9, 2017

vorner commented Mar 18, 2017

bluss commented Mar 18, 2017

vorner commented Mar 27, 2017

Mark-Simulacrum commented Jul 2, 2017

jonhoo commented Jul 2, 2017

jonhoo commented Jul 6, 2017

nikomatsakis commented Jul 6, 2017

arielb1 commented Aug 10, 2017

ishitatsuyuki commented Aug 31, 2017

ishitatsuyuki commented Feb 17, 2018

ishitatsuyuki commented Feb 17, 2018

ishitatsuyuki commented Feb 17, 2018

ishitatsuyuki commented Feb 18, 2018

ishitatsuyuki commented Feb 18, 2018

hcpl commented Feb 18, 2018

sfackler commented Dec 22, 2016 •

edited by nikomatsakis

Loading

nikomatsakis commented Dec 22, 2016 •

edited

Loading