Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Significant amount of time to compile an empty file #43300

Closed
alexcrichton opened this issue Jul 17, 2017 · 7 comments
Closed

Significant amount of time to compile an empty file #43300

alexcrichton opened this issue Jul 17, 2017 · 7 comments
Labels
C-enhancement Category: An issue proposing an enhancement or a PR with one. I-compiletime Issue: Problems and improvements with respect to compile times. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@alexcrichton
Copy link
Member

Compiling an empty file into an rlib takes about 200 milliseconds locally which is a pretty significant chunk of time! Various passes look like:

$ touch foo.rs
$ rustc +nightly foo.rs --crate-type lib -Z  perf-stats -Z time-passes
time: 0.000; rss: 50MB	parsing
time: 0.000; rss: 50MB	recursion limit
time: 0.000; rss: 50MB	crate injection
time: 0.000; rss: 50MB	plugin loading
time: 0.000; rss: 50MB	plugin registration
time: 0.033; rss: 75MB	expansion
time: 0.000; rss: 75MB	maybe building test harness
time: 0.000; rss: 75MB	maybe creating a macro crate
time: 0.000; rss: 75MB	creating allocators
time: 0.000; rss: 75MB	checking for inline asm in case the target doesn't support it
time: 0.000; rss: 75MB	early lint checks
time: 0.000; rss: 75MB	AST validation
time: 0.000; rss: 78MB	name resolution
time: 0.000; rss: 78MB	complete gated feature checking
time: 0.000; rss: 78MB	lowering ast -> hir
time: 0.000; rss: 78MB	indexing hir
time: 0.000; rss: 78MB	attribute checking
time: 0.000; rss: 78MB	language item collection
time: 0.000; rss: 78MB	lifetime resolution
time: 0.000; rss: 78MB	looking for entry point
time: 0.000; rss: 78MB	looking for plugin registrar
time: 0.000; rss: 78MB	loop checking
time: 0.000; rss: 78MB	static item recursion checking
time: 0.000; rss: 78MB	compute_incremental_hashes_map
time: 0.000; rss: 78MB	load_dep_graph
time: 0.000; rss: 78MB	stability index
time: 0.000; rss: 78MB	stability checking
time: 0.000; rss: 78MB	type collecting
time: 0.000; rss: 78MB	impl wf inference
time: 0.000; rss: 78MB	coherence checking
time: 0.000; rss: 78MB	variance testing
time: 0.000; rss: 78MB	wf checking
time: 0.000; rss: 78MB	item-types checking
time: 0.000; rss: 78MB	item-bodies checking
time: 0.000; rss: 78MB	const checking
time: 0.000; rss: 78MB	privacy checking
time: 0.000; rss: 78MB	intrinsic checking
time: 0.000; rss: 78MB	effect checking
time: 0.000; rss: 78MB	match checking
time: 0.000; rss: 78MB	liveness checking
time: 0.000; rss: 78MB	borrow checking
time: 0.000; rss: 78MB	reachability checking
time: 0.000; rss: 78MB	death checking
time: 0.000; rss: 78MB	unused lib feature checking
time: 0.000; rss: 81MB	lint checking
time: 0.000; rss: 81MB	resolving dependency formats
  time: 0.000; rss: 81MB	write metadata
  time: 0.000; rss: 81MB	translation item collection
  time: 0.000; rss: 81MB	codegen unit partitioning
  time: 0.000; rss: 108MB	internalize symbols
time: 0.064; rss: 108MB	translation
time: 0.000; rss: 108MB	assert dep graph
time: 0.000; rss: 108MB	serialize dep graph
  time: 0.000; rss: 108MB	llvm function passes [1]
  time: 0.000; rss: 108MB	llvm module passes [1]
  time: 0.001; rss: 109MB	codegen passes [1]
  time: 0.001; rss: 109MB	codegen passes [0]
time: 0.007; rss: 110MB	LLVM passes
time: 0.000; rss: 110MB	serialize work products
time: 0.001; rss: 110MB	linking
Total time spent computing SVHs:               0.000
Total time spent computing incr. comp. hashes: 0.000
Total number of incr. comp. hashes computed:   4
Total number of bytes hashed for incr. comp.:  87
Average bytes hashed per incr. comp. HIR node: 21
Total time spent computing symbol hashes:      0.013
Total time spent decoding DefPath tables:      0.028

Notably:

time: 0.033; rss: 75MB	expansion
time: 0.064; rss: 108MB	translation
time: 0.007; rss: 110MB	LLVM passes

I believe the expansion timings are all related to:

Total time spent computing symbol hashes:      0.013
Total time spent decoding DefPath tables:      0.028
@alexcrichton alexcrichton added the I-compiletime Issue: Problems and improvements with respect to compile times. label Jul 17, 2017
@michaelwoerister
Copy link
Member

For incremental compilation I was thinking about implementing a simple hash-table that can be memory-mapped. DefPathTable would be another candidate for that.

@aturon
Copy link
Member

aturon commented Jul 19, 2017

@michaelwoerister Is that something you could give @alexcrichton a bit more info on and he could tackle?

@michaelwoerister
Copy link
Member

Each crate we are loading contains a DefPathTable and we are loading it eagerly at the moment. This can be quite a big piece of data and decoding it involves a reverse-lookup hash map from some of the loaded data.

I see now that having a mmapped version of that reverse lookup table would be kind of tricky, since the keys (DefKey) contain interned ast::Symbols. It would be possible to come up with something, it would be quite some work though, I guess.

As a starting pointing, one could try to decode the DefPathTables more lazily. E.g. decode the whole thing on first access or lazily populate the reverse lookup table.

@michaelwoerister
Copy link
Member

Hm, looking at the code again, it seems like we actually don't need the reverse lookup map anymore :D

When I switched the dep-graph to using stable DefPathHashes instead of actual DefPaths, I removed the last user of Definitions::retrace_path(), which in turn is the only user of the table.

I'll make a PR removing the table...

@michaelwoerister
Copy link
Member

I opened #43361 and would be interested in the impact this has here.

@alexcrichton
Copy link
Member Author

In testing locally @michaelwoerister it looks like #43361 shaves about 20ms off an empty file compile time, thanks @michaelwoerister! After that PR the longest timings are:

time: 0.050; rss: 91MB  translation
time: 0.017; rss: 59MB  expansion
time: 0.002; rss: 100MB LLVM passes
time: 0.001; rss: 100MB linking
time: 0.001; rss: 38MB  parsing

Mark-Simulacrum added a commit to Mark-Simulacrum/rust that referenced this issue Jul 24, 2017
…h, r=nikomatsakis

Remove unused DefPathTable::retrace_path()

`DefPathTable::retrace_path()` is not used anymore for a while now and removing it also removes the need to build the costly `DefPathTable::key_to_index` map for every upstream crate.

cc rust-lang#43300

r? @eddyb
@Mark-Simulacrum Mark-Simulacrum added the C-enhancement Category: An issue proposing an enhancement or a PR with one. label Jul 28, 2017
@jonas-schievink jonas-schievink added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Jan 31, 2020
@Mark-Simulacrum
Copy link
Member

Compiling an empty crate takes a little under 20ms today. This is a little better than clang (v14, what I have on my system) clang -c t.c on an empty C file, and around 2x worse than gcc (version 12) on my system.

$ hyperfine --warmup 3 -N "$HOME/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/bin/rustc --crate-type=lib empty.rs" "clang -c t.c" "gcc -c t.c"
Benchmark 1: /home/mark/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/bin/rustc --crate-type=lib empty.rs
  Time (mean ± σ):      16.6 ms ±   1.9 ms    [User: 7.4 ms, System: 9.7 ms]
  Range (min … max):    11.6 ms …  20.6 ms    145 runs

Benchmark 2: clang -c t.c
  Time (mean ± σ):      19.6 ms ±   1.8 ms    [User: 8.6 ms, System: 10.9 ms]
  Range (min … max):    15.8 ms …  22.6 ms    138 runs

Benchmark 3: gcc -c t.c
  Time (mean ± σ):       8.7 ms ±   1.3 ms    [User: 5.9 ms, System: 2.7 ms]
  Range (min … max):     5.3 ms …  11.4 ms    336 runs

Summary
  gcc -c t.c ran
    1.91 ± 0.37 times faster than /home/mark/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/bin/rustc --crate-type=lib empty.rs
    2.26 ± 0.40 times faster than clang -c t.c

The majority of the difference seems to be explained by the number of pages faulted in (possibly indirectly as a proxy for memory used, etc.) - perf stat -e faults reports gcc takes ~1489 faults and rustc takes ~2963 faults on this benchmark. I don't think there's much we can do to modify that without significant work. Given that we're on par with clang and within the same ballpark as gcc, I'm going to go ahead and close.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Category: An issue proposing an enhancement or a PR with one. I-compiletime Issue: Problems and improvements with respect to compile times. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

5 participants