[experimental] Allow for RLIBs to only contain metadata. #48373

michaelwoerister · 2018-02-20T15:48:47Z

This is an experimental implementation of #38913. I want to try and see if I can get it through a perf.rlo run. No real need to review this yet.

…rt statics.

rust-highfive · 2018-02-20T15:48:50Z

r? @cramertj

(rust_highfive has picked a reviewer for you, use r? to override)

michaelwoerister · 2018-02-20T15:49:18Z

@bors try

bors · 2018-02-20T15:49:28Z

⌛ Trying commit 0c034ad50096b3fcb2d1c29aa538ef5ad22d6603 with merge 1f12186df3cc7f2c3af0b55b9089a7b22fead331...

bors · 2018-02-20T18:27:25Z

☀️ Test successful - status-travis
State: approved= try=True

Mark-Simulacrum · 2018-02-20T20:59:40Z

er, well, not entirely sure this is even somewhat meaningful, but... http://perf.rust-lang.org/compare.html?start=27a046e9338fb0455c33b13e8fe28da78212dedc&end=1f12186df3cc7f2c3af0b55b9089a7b22fead331&stat=instructions%3Au

michaelwoerister · 2018-02-21T11:06:37Z

Alright, interesting :)

I think this mainly shows that we are not measuring quite the right thing with our benchmarks. MIR-only RLIBs postpone any LLVM work to until it is actually needed. In many of the benchmarks, we are only compiling RLIBs though, so we never need to generate code. However, that's not a realistic use case, I think. If you are compiling just for the diagnostics, cargo check is the way to go and should even be a little faster than MIR-only RLIBs. If our benchmarks would include compiling tests, executables, or anything other than an RLIB, the results would probably look very different.

I'll try to gather some more useful data.

michaelwoerister · 2018-02-21T17:54:50Z

So, I ran some benchmarks and the numbers are not all that encouraging, at least for now. While we generate a lot less code overall, it seems that the single-threaded trans phase in the final crate is bottlenecking us so much that compilation actually takes longer in many cases.

Here are some of the numbers I collected. The table shows the aggregate time spent for various tasks while compiling the whole crate graph. In many cases we do less work overall but due to worse parallelization, wall-clock time increases. The last table shows the number of functions we instantiate for the whole crate graph. It becomes substantially smaller in most cases. Maybe we should look into re-using generic instances separately if MIR-only RLIBs are blocked on parallelizing trans.

ripgrep - cargo build

	regular	MIR-only	%
LLVM codegen passes	33.90	32.52	95.9 %
LLVM function passes	1.39	1.35	97.5 %
LLVM module passes	2.18	1.95	89.8 %
MonoItem collection	2.80	2.09	74.4 %
translation	23.73	19.97	84.1 %
LLVM total	37.46	35.83	95.6 %
BUILD total	20.92	26.14	125.0 %

encoding-rs - cargo test --no-run

	regular	MIR-only	%
LLVM codegen passes	13.11	7.28	55.6 %
LLVM function passes	0.57	0.33	58.1 %
LLVM module passes	0.90	0.44	48.7 %
MonoItem collection	1.19	0.69	58.1 %
translation	8.68	6.08	70.1 %
LLVM total	14.59	8.06	55.2 %
BUILD total	15.73	14.37	91.4 %

webrender - cargo build

	regular	MIR-only	%
LLVM codegen passes	109.42	69.17	63.2 %
LLVM function passes	4.55	3.10	68.2 %
LLVM module passes	1.63	1.06	64.7 %
MonoItem collection	10.70	5.64	52.7 %
translation	102.95	58.70	57.0 %
LLVM total	115.60	73.33	63.4 %
BUILD total	72.30	68.64	94.9 %

futures-rs - cargo test --no-run

	regular	MIR-only	%
LLVM codegen passes	41.19	48.67	118.1 %
LLVM function passes	1.68	1.93	115.0 %
LLVM module passes	0.21	0.22	107.3 %
MonoItem collection	5.86	6.90	117.8 %
translation	55.48	69.84	125.9 %
LLVM total	43.08	50.82	118.0 %
BUILD total	17.28	19.18	111.0 %

tokio-webpush-simple - cargo build

	regular	MIR-only	%
LLVM codegen passes	33.98	22.55	66.4 %
LLVM function passes	1.53	0.95	62.0 %
LLVM module passes	0.30	0.20	66.8 %
MonoItem collection	3.80	2.11	55.5 %
translation	39.09	22.99	58.8 %
LLVM total	35.81	23.70	66.2 %
BUILD total	22.28	21.54	96.7 %

Number of LLVM function definitions generated for whole crate graph

	MIR-only	regular
ripgrep	22683	34239
encoding-rs test	8393	15116
webrender	72238	114239
futures-rs test	57565	46935
tokio-webpush-simple	27346	44961

cc @rust-lang/compiler

hanna-kruppe · 2018-02-21T19:16:12Z

it seems that the single-threaded trans phase in the final crate is bottlenecking us so much that compilation actually takes longer in many cases.

Huh? LLVM IR generation and optimization is parallelized with multiple CGUs (and indeed the times you report are generally smaller for MIR-only rlibs). Are you referring to the steps before that, like translation item collection, CGU partitioning, etc.?

nikomatsakis · 2018-02-21T19:19:13Z

@rkruppe I think @michaelwoerister is referring to the rustc code for generating the LLVM IR, not the LLVM execution.

nikomatsakis · 2018-02-21T19:19:21Z

Seems like a good reason to push harder on parallel rustc =)

nikomatsakis · 2018-02-21T19:21:49Z

@michaelwoerister

Do you have any explanation for this?

futures-rs test | 57565 | 46935

Why would it make more with mir-only rlibs?

hanna-kruppe · 2018-02-21T19:26:35Z

@nikomatsakis Oh! I was under the impression that's parallelized as well. Is it not?

michaelwoerister · 2018-02-22T10:06:41Z

Seems like a good reason to push harder on parallel rustc =)

Yes.

Do you have any explanation for this?

futures-rs test | 57565 | 46935

Why would it make more with mir-only rlibs?

My theory is that it's because futures-rs has lots of tests, each of which gets compiled into its own executable. So while we get more sharing within each executable, we lose out on sharing between executables. At a certain number of leaf crates not sharing non-generics outweighs the improved sharing of generic instances. I did not really verify this hypothesis though.

michaelwoerister · 2018-02-22T11:01:27Z

Oh! I was under the impression that's parallelized as well. Is it not?

It's not, unfortunately. Translation reads the tcx, which cannot be accessed from multiple threads.

retep998 · 2018-02-22T21:13:09Z

My theory is that it's because futures-rs has lots of tests, each of which gets compiled into its own executable. So while we get more sharing within each executable, we lose out on sharing between executables. At a certain number of leaf crates not sharing non-generics outweighs the improved sharing of generic instances. I did not really verify this hypothesis though.

So what you're saying is we need a workspace wide incremental codegen cache? So if one leaf does codegen, it gets cached and other leaf crates can reuse the existing work?

michaelwoerister · 2018-02-26T10:22:14Z

So what you're saying is we need a workspace wide incremental codegen cache? So if one leaf does codegen, it gets cached and other leaf crates can reuse the existing work?

I'm not sure if we could make this work. Maybe. It would solve part of the problem.

bors · 2018-02-27T03:17:26Z

☔ The latest upstream changes (presumably #48449) made this pull request unmergeable. Please resolve the merge conflicts.

michaelwoerister · 2018-03-01T13:04:56Z

Posted the results of this experiment in #38913. Closing.

Don't recompute SymbolExportLevel for upstream crates. The data collected in #48373 suggests that we can avoid generating up to 30% of the LLVM definitions by only instantiating function monomorphizations once with a given crate graph. Some more data, collected with a [proof-of-concept implementation](https://github.com/michaelwoerister/rust/commits/share-generics) of re-using monomorphizations, which is less efficient than the MIR-only RLIB approach, suggests that it's still around 25% LLVM definitions that we can save. So far, this PR only cleans up handling of symbol export status. Too early to review still.

michaelwoerister added 5 commits February 19, 2018 12:41

Use DefId instead of NodeId while generating debuginfo for statics.

d5ed655

Use DefId instead of NodeId in MonoItem::Static.

15ff0ad

Implement describe_def query for LOCAL_CRATE

8ff633c

Rename is_translated_fn query to is_translated_item and make it suppo…

1be7f96

…rt statics.

Allow for instantiating statics from upstream crates.

89b3ef3

rust-highfive assigned cramertj Feb 20, 2018

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Feb 20, 2018

michaelwoerister unassigned cramertj Feb 20, 2018

pietroalbini added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 20, 2018

experimental MIR-only RLIB support.

9614746

michaelwoerister force-pushed the mir-only-rlibs-1 branch from 0c034ad to 9614746 Compare February 21, 2018 12:33

japaric mentioned this pull request Feb 22, 2018

AVR support rust-embedded/wg#3

Closed

8 tasks

This was referenced Feb 28, 2018

Don't recompute SymbolExportLevel for upstream crates. #48611

Merged

Tracking issue for MIR-only RLIBs #38913

Closed

michaelwoerister closed this Mar 1, 2018

saethlin mentioned this pull request Jan 10, 2024

Unstably support MIR-only rlibs rust-lang/compiler-team#713

Closed

3 tasks

davidtwco mentioned this pull request Apr 10, 2024

Partial compilation using MIR-only rlibs rust-lang/compiler-team#738

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[experimental] Allow for RLIBs to only contain metadata. #48373

[experimental] Allow for RLIBs to only contain metadata. #48373

michaelwoerister commented Feb 20, 2018

rust-highfive commented Feb 20, 2018

michaelwoerister commented Feb 20, 2018

bors commented Feb 20, 2018

bors commented Feb 20, 2018

Mark-Simulacrum commented Feb 20, 2018

michaelwoerister commented Feb 21, 2018

michaelwoerister commented Feb 21, 2018

hanna-kruppe commented Feb 21, 2018

nikomatsakis commented Feb 21, 2018

nikomatsakis commented Feb 21, 2018

nikomatsakis commented Feb 21, 2018

hanna-kruppe commented Feb 21, 2018

michaelwoerister commented Feb 22, 2018

michaelwoerister commented Feb 22, 2018

retep998 commented Feb 22, 2018

michaelwoerister commented Feb 26, 2018

bors commented Feb 27, 2018

michaelwoerister commented Mar 1, 2018

[experimental] Allow for RLIBs to only contain metadata. #48373

[experimental] Allow for RLIBs to only contain metadata. #48373

Conversation

michaelwoerister commented Feb 20, 2018

rust-highfive commented Feb 20, 2018

michaelwoerister commented Feb 20, 2018

bors commented Feb 20, 2018

bors commented Feb 20, 2018

Mark-Simulacrum commented Feb 20, 2018

michaelwoerister commented Feb 21, 2018

michaelwoerister commented Feb 21, 2018

ripgrep - cargo build

encoding-rs - cargo test --no-run

webrender - cargo build

futures-rs - cargo test --no-run

tokio-webpush-simple - cargo build

Number of LLVM function definitions generated for whole crate graph

hanna-kruppe commented Feb 21, 2018

nikomatsakis commented Feb 21, 2018

nikomatsakis commented Feb 21, 2018

nikomatsakis commented Feb 21, 2018

hanna-kruppe commented Feb 21, 2018

michaelwoerister commented Feb 22, 2018

michaelwoerister commented Feb 22, 2018

retep998 commented Feb 22, 2018

michaelwoerister commented Feb 26, 2018

bors commented Feb 27, 2018

michaelwoerister commented Mar 1, 2018