Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a job immediately when looking in the query map and start it later #50067

Closed
wants to merge 1 commit into from

Conversation

Zoxc
Copy link
Contributor

@Zoxc Zoxc commented Apr 18, 2018

This means we only look up the key in the query map once, and the query map lock is held minimally.

This also moves some function outside the query macro which has the benefit of improving librustc compile time.

Before: 730.189 seconds
  time: 0.534; rss: 106MB	parsing
  time: 0.000; rss: 106MB	recursion limit
  time: 0.000; rss: 106MB	crate injection
  time: 0.000; rss: 106MB	plugin loading
  time: 0.000; rss: 106MB	plugin registration
  time: 2.768; rss: 388MB	expansion
  time: 0.000; rss: 388MB	maybe building test harness
  time: 0.047; rss: 388MB	maybe creating a macro crate
  time: 0.141; rss: 388MB	creating allocators
  time: 0.148; rss: 388MB	AST validation
  time: 1.034; rss: 454MB	name resolution
  time: 0.396; rss: 454MB	complete gated feature checking
  time: 0.657; rss: 620MB	lowering ast -> hir
  time: 0.257; rss: 622MB	early lint checks
  time: 0.758; rss: 644MB	indexing hir
  time: 0.000; rss: 543MB	load query result cache
  time: 0.000; rss: 543MB	looking for entry point
  time: 0.001; rss: 543MB	looking for plugin registrar
  time: 0.056; rss: 543MB	loop checking
  time: 0.059; rss: 540MB	attribute checking
  time: 0.148; rss: 543MB	stability checking
  time: 0.735; rss: 632MB	type collecting
  time: 0.003; rss: 632MB	outlives testing
  time: 0.016; rss: 634MB	impl wf inference
  time: 0.180; rss: 642MB	coherence checking
  time: 0.003; rss: 642MB	variance testing
  time: 1.116; rss: 716MB	wf checking
  time: 0.228; rss: 720MB	item-types checking
  time: 22.159; rss: 896MB	item-bodies checking
  time: 1.994; rss: 943MB	rvalue promotion
  time: 0.887; rss: 945MB	privacy checking
  time: 0.075; rss: 945MB	intrinsic checking
  time: 45.573; rss: 1411MB	match checking
  time: 0.296; rss: 1414MB	liveness checking
  time: 18.668; rss: 1819MB	borrow checking
  time: 0.014; rss: 1822MB	MIR borrow checking
  time: 0.007; rss: 1822MB	MIR effect checking
  time: 0.180; rss: 1822MB	death checking
  time: 0.060; rss: 1822MB	unused lib feature checking
  time: 1.237; rss: 1825MB	lint checking
  time: 0.000; rss: 1825MB	dumping chalk-like clauses
  time: 0.000; rss: 1825MB	resolving dependency formats
    time: 2.268; rss: 1888MB	write metadata
    time: 10.008; rss: 2072MB	translation item collection
    time: 0.601; rss: 2137MB	codegen unit partitioning
    time: 0.000; rss: 2084MB	write allocator module
    time: 31.717; rss: 3771MB	translate to LLVM IR
    time: 0.000; rss: 3771MB	assert dep graph
    time: 0.000; rss: 3771MB	serialize dep graph
  time: 45.874; rss: 3771MB	translation
    time: 19.090; rss: 2373MB	llvm function passes [rustc0]
    time: 435.408; rss: 3577MB	llvm module passes [rustc0]
    time: 107.880; rss: 3414MB	codegen passes [rustc0]
  time: 575.725; rss: 606MB	LLVM passes
  time: 0.000; rss: 605MB	serialize work products
    time: 0.005; rss: 606MB	altering flate2-909c39c621ae9227.rlib
    time: 0.003; rss: 606MB	altering miniz_sys-d9a4e25fef34a2b9.rlib
    time: 0.002; rss: 606MB	altering libc-6c43a1424d17ab5f.rlib
    time: 0.003; rss: 606MB	altering backtrace-9f50ef733ae850a7.rlib
    time: 0.003; rss: 606MB	altering rustc_demangle-5cd3d06453184078.rlib
    time: 0.002; rss: 606MB	altering byteorder-00d0fe38d250d075.rlib
    time: 0.003; rss: 606MB	altering jobserver-be1e4b22258da1a7.rlib
    time: 0.002; rss: 606MB	altering lazy_static-b5f22389f43940b1.rlib
    time: 0.006; rss: 606MB	altering compiler_builtins-2f24554613399b2b.rlib
    time: 7.441; rss: 607MB	running linker
  time: 7.510; rss: 607MB	linking
After: 659.451 seconds
  time: 0.547; rss: 106MB	parsing
  time: 0.000; rss: 106MB	recursion limit
  time: 0.000; rss: 106MB	crate injection
  time: 0.000; rss: 106MB	plugin loading
  time: 0.000; rss: 106MB	plugin registration
  time: 2.510; rss: 369MB	expansion
  time: 0.000; rss: 369MB	maybe building test harness
  time: 0.042; rss: 369MB	maybe creating a macro crate
  time: 0.130; rss: 369MB	creating allocators
  time: 0.137; rss: 369MB	AST validation
  time: 0.934; rss: 432MB	name resolution
  time: 0.331; rss: 432MB	complete gated feature checking
  time: 0.596; rss: 588MB	lowering ast -> hir
  time: 0.236; rss: 590MB	early lint checks
  time: 0.711; rss: 611MB	indexing hir
  time: 0.000; rss: 536MB	load query result cache
  time: 0.000; rss: 536MB	looking for entry point
  time: 0.001; rss: 536MB	looking for plugin registrar
  time: 0.051; rss: 536MB	loop checking
  time: 0.054; rss: 531MB	attribute checking
  time: 0.148; rss: 534MB	stability checking
  time: 0.701; rss: 614MB	type collecting
  time: 0.003; rss: 614MB	outlives testing
  time: 0.016; rss: 616MB	impl wf inference
  time: 0.181; rss: 624MB	coherence checking
  time: 0.003; rss: 624MB	variance testing
  time: 1.054; rss: 691MB	wf checking
  time: 0.226; rss: 696MB	item-types checking
  time: 19.583; rss: 861MB	item-bodies checking
  time: 1.631; rss: 906MB	rvalue promotion
  time: 0.804; rss: 906MB	privacy checking
  time: 0.073; rss: 906MB	intrinsic checking
  time: 34.483; rss: 1268MB	match checking
  time: 0.281; rss: 1270MB	liveness checking
  time: 15.866; rss: 1639MB	borrow checking
  time: 0.013; rss: 1641MB	MIR borrow checking
  time: 0.007; rss: 1641MB	MIR effect checking
  time: 0.179; rss: 1642MB	death checking
  time: 0.056; rss: 1642MB	unused lib feature checking
  time: 1.096; rss: 1646MB	lint checking
  time: 0.000; rss: 1646MB	dumping chalk-like clauses
  time: 0.000; rss: 1646MB	resolving dependency formats
    time: 2.044; rss: 1704MB	write metadata
    time: 9.506; rss: 1871MB	translation item collection
    time: 0.555; rss: 1937MB	codegen unit partitioning
    time: 0.000; rss: 1884MB	write allocator module
    time: 30.290; rss: 3579MB	translate to LLVM IR
    time: 0.000; rss: 3579MB	assert dep graph
    time: 0.000; rss: 3579MB	serialize dep graph
  time: 43.576; rss: 3579MB	translation
    time: 17.838; rss: 2326MB	llvm function passes [rustc0]
    time: 395.481; rss: 3404MB	llvm module passes [rustc0]
    time: 98.658; rss: 3251MB	codegen passes [rustc0]
  time: 523.811; rss: 574MB	LLVM passes
  time: 0.000; rss: 572MB	serialize work products
    time: 0.005; rss: 574MB	altering flate2-909c39c621ae9227.rlib
    time: 0.003; rss: 574MB	altering miniz_sys-d9a4e25fef34a2b9.rlib
    time: 0.002; rss: 574MB	altering libc-6c43a1424d17ab5f.rlib
    time: 0.003; rss: 574MB	altering backtrace-9f50ef733ae850a7.rlib
    time: 0.003; rss: 574MB	altering rustc_demangle-5cd3d06453184078.rlib
    time: 0.003; rss: 574MB	altering byteorder-00d0fe38d250d075.rlib
    time: 0.003; rss: 574MB	altering jobserver-be1e4b22258da1a7.rlib
    time: 0.002; rss: 574MB	altering lazy_static-b5f22389f43940b1.rlib
    time: 0.006; rss: 574MB	altering compiler_builtins-2f24554613399b2b.rlib
    time: 8.732; rss: 575MB	running linker
  time: 8.792; rss: 575MB	linking

Which means it saves about 70.738 seconds, which isn't too bad.

r? @michaelwoerister

@Zoxc
Copy link
Contributor Author

Zoxc commented Apr 18, 2018

@bors try

@bors
Copy link
Contributor

bors commented Apr 18, 2018

⌛ Trying commit c8191a2 with merge a7eed60bdc55f5778f483805dc13edef027ba7a3...

@bors
Copy link
Contributor

bors commented Apr 19, 2018

☀️ Test successful - status-travis
State: approved= try=True

@Zoxc
Copy link
Contributor Author

Zoxc commented Apr 19, 2018

@Mark-Simulacrum I'd like a perf run here too

@michaelwoerister
Copy link
Member

Looks great! I'll try to do a real review later today.

Copy link
Member

@michaelwoerister michaelwoerister left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @Zoxc! That's much nicer, I think. Let's wait for the perf results, but except for that r=me. The librustc compile time reduction is great too!

// when accessing the ImplicitCtxt
let r = tls::with_related_context(tcx, move |icx| {
// Update the ImplicitCtxt to point to our new query job
let icx = tls::ImplicitCtxt {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you rename this to new_icx and the other one to current_icx or something like that? That would make it clearer.

type Value: Clone;

fn query(key: Self::Key) -> Query<'tcx>;
fn query_map<'a>(tcx: TyCtxt<'a, 'tcx, '_>) -> &'a Lock<QueryMap<'tcx, Self>>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a little word of warning here to not use this directly.

@pietroalbini pietroalbini added the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Apr 19, 2018
@michaelwoerister
Copy link
Member

Overall this approach seems to be slightly slower:
http://perf.rust-lang.org/compare.html?start=ac3c2288f9f9d977acb46406ba60033d65165a7b&end=a7eed60bdc55f5778f483805dc13edef027ba7a3&stat=instructions%3Au

I wonder why that is. From the diff it looks like performance should be the same.

@Zoxc
Copy link
Contributor Author

Zoxc commented Apr 25, 2018

I wrote some microbenchmarks for queries.

Before:

time: 9.354; rss: 35MB        hot query
time: 8.974; rss: 1109MB      cold query

After changes in this PR + moving code out from macros:

time: 13.589; rss: 35MB       hot query
time: 8.532; rss: 1109MB      cold query

@Zoxc
Copy link
Contributor Author

Zoxc commented Apr 25, 2018

Before (with incremental compilation):

time: 10.730; rss: 40MB       hot query

After (with incremental compilation):

time: 15.429; rss: 40MB       hot query

The cold query benchmark panicked with incremental compilation enabled.

@michaelwoerister
Copy link
Member

OK, I can see that we are doing more work for invocations that then hit the on-disk cache because we are allocating the job unconditionally. I'm still surprised about the difference in the non-incremental case.

@Zoxc
Copy link
Contributor Author

Zoxc commented Apr 25, 2018

I changed the benchmark to use a usize key instead of DefId, so it would work with incremental compilation.

Now it seems to be a speedup all cases:

master:
time: 8.066; rss: 35MB        hot query
time: 8.723; rss: 1109MB      cold query
time: 7.986; rss: 36MB        hot query
time: 8.771; rss: 1109MB      cold query
time: 8.221; rss: 36MB        hot query
time: 8.750; rss: 1109MB      cold query

master+incr:
time: 9.509; rss: 40MB        hot query
time: 33.272; rss: 5079MB     cold query
time: 9.497; rss: 40MB        hot query
time: 32.651; rss: 5079MB     cold query
time: 9.445; rss: 40MB        hot query
time: 32.319; rss: 5079MB     cold query

changes:
time: 4.432; rss: 35MB        hot query
time: 8.361; rss: 1109MB      cold query
time: 4.250; rss: 35MB        hot query
time: 8.191; rss: 1109MB      cold query
time: 4.360; rss: 35MB        hot query
time: 8.191; rss: 1109MB      cold query

changes+incr:
time: 5.462; rss: 40MB        hot query
time: 31.961; rss: 5078MB     cold query
time: 5.453; rss: 40MB        hot query
time: 31.390; rss: 5078MB     cold query
time: 5.528; rss: 40MB        hot query
time: 31.394; rss: 5079MB     cold query

@michaelwoerister
Copy link
Member

Now it seems to be a speedup all cases

OK, that's encouraging.

What's the status of this PR. Do you want to close it in favor of #50102?

@Zoxc
Copy link
Contributor Author

Zoxc commented Apr 25, 2018

I'd like to find out why this is slower for DefId keys. #50102 contains some other performance related changed I'd like to measures separately.

@Zoxc Zoxc closed this Apr 27, 2018
bors added a commit that referenced this pull request Apr 27, 2018
Move query code outside macros and store query jobs separately from query results

Based on #50067

r? @michaelwoerister
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants