Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Micro-optimize ty::relate::relate_substs by avoiding match #96020

Merged
merged 2 commits into from
Apr 19, 2022

Conversation

martingms
Copy link
Contributor

Was a top-20 hot function in a callgrind profile of compiling bitmaps-3.1.0.

Yields some small speedups on that crate and some others according to local benching:

Benchmark Profile Scenario % Change Significance Factor?
bitmaps-3.1.0 check full -1.88% 9.42x
bitmaps-3.1.0 debug full -1.80% 8.99x
bitmaps-3.1.0 opt full -1.70% 8.49x
bitmaps-3.1.0 check incr-full -1.54% 7.68x
deep-vector debug full 1.52% 7.61x
bitmaps-3.1.0 debug incr-full -1.45% 7.26x
bitmaps-3.1.0 opt incr-full -1.39% 6.95x
nalgebra-0.30.1 check full -0.68% 3.42x
nalgebra-0.30.1 debug full -0.64% 3.22x
nalgebra-0.30.1 opt full -0.64% 3.20x
projection-caching check full -0.61% 3.05x
nalgebra-0.30.1 check incr-full -0.56% 2.78x
nalgebra-0.30.1 opt incr-full -0.54% 2.72x
nalgebra-0.30.1 debug incr-full -0.54% 2.69x
projection-caching check incr-full -0.50% 2.51x
tt-muncher opt full -0.48% 2.42x
projection-caching opt full -0.47% 2.37x
projection-caching debug full -0.47% 2.35x
projection-caching opt incr-full -0.44% 2.21x
projection-caching debug incr-full -0.42% 2.08x
deeply-nested-multi check incr-full 0.37% 1.87x
wf-projection-stress-65510 opt full -0.37% 1.84x
deep-vector debug incr-patched: add vec item -0.32% 1.61x
projection-caching debug incr-unchanged -0.32% 1.60x
wf-projection-stress-65510 check full -0.31% 1.55x
projection-caching opt incr-unchanged -0.31% 1.53x
wf-projection-stress-65510 debug incr-full -0.30% 1.51x
wf-projection-stress-65510 opt incr-full -0.30% 1.51x

r? @nnethercote

@rustbot rustbot added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Apr 13, 2022
@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Apr 13, 2022
@compiler-errors
Copy link
Member

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Apr 13, 2022
@bors
Copy link
Contributor

bors commented Apr 13, 2022

⌛ Trying commit 16a91de2b22a1a01621a8fe074a97ace032341f5 with merge d32a84628f120a8b1aaedc3b83535b149f696e07...

@bors
Copy link
Contributor

bors commented Apr 13, 2022

☀️ Try build successful - checks-actions
Build commit: d32a84628f120a8b1aaedc3b83535b149f696e07 (d32a84628f120a8b1aaedc3b83535b149f696e07)

@rust-timer
Copy link
Collaborator

Queued d32a84628f120a8b1aaedc3b83535b149f696e07 with parent 0d13f6a, future comparison URL.

Copy link
Contributor

@nnethercote nnethercote left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Faster and simpler, always a good combination :)

I have a couple of minor questions and comments, but none of them will block this. Looking good.

compiler/rustc_middle/src/ty/relate.rs Outdated Show resolved Hide resolved
compiler/rustc_middle/src/ty/relate.rs Show resolved Hide resolved
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (d32a84628f120a8b1aaedc3b83535b149f696e07): comparison url.

Summary:

  • Primary benchmarks: mixed results
  • Secondary benchmarks: mixed results
Regressions 😿
(primary)
Regressions 😿
(secondary)
Improvements 🎉
(primary)
Improvements 🎉
(secondary)
All 😿 🎉
(primary)
count1 7 6 1 6 8
mean2 0.3% 2.3% -0.2% -0.7% 0.3%
max 0.4% 2.9% -0.2% -1.1% 0.4%

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf +perf-regression

Footnotes

  1. number of relevant changes

  2. the arithmetic mean of the percent change

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Apr 14, 2022
@nnethercote
Copy link
Contributor

The deeply-nested-multi regressions are a concern. Could that be related to the removal of cached_ty?

@martingms
Copy link
Contributor Author

martingms commented Apr 15, 2022

Ah my bad, I didn't consider the case where a_subst and b_subst are empty but variances is not, which is indeed very likely in deeply-nested-multi:

184065 counts
(  1)    98550 (53.5%, 53.5%): relate_substs: a_subst.len() = 0, b_subst.len() = 0, variances = Some(..)
(  2)    34099 (18.5%, 72.1%): relate_substs: a_subst.len() = 5, b_subst.len() = 5, variances = None
(  3)    26084 (14.2%, 86.2%): relate_substs: a_subst.len() = 1, b_subst.len() = 1, variances = None
(  4)    22681 (12.3%, 98.6%): relate_substs: a_subst.len() = 2, b_subst.len() = 2, variances = None
(  5)     1189 ( 0.6%, 99.2%): relate_substs: a_subst.len() = 1, b_subst.len() = 1, variances = Some(..)
(  6)      822 ( 0.4%, 99.7%): relate_substs: a_subst.len() = 0, b_subst.len() = 0, variances = None
(  7)      557 ( 0.3%,100.0%): relate_substs: a_subst.len() = 2, b_subst.len() = 2, variances = Some(..)
(  8)       77 ( 0.0%,100.0%): relate_substs: a_subst.len() = 3, b_subst.len() = 3, variances = None
(  9)        5 ( 0.0%,100.0%): relate_substs: a_subst.len() = 4, b_subst.len() = 4, variances = None
( 10)        1 ( 0.0%,100.0%): relate_substs: a_subst.len() = 1, b_subst.len() = 2, variances = None

I'll re-add the cache and we can do a new perf run.

There was no need to keep doing the match inside the iterator.
@martingms martingms force-pushed the optimize-relate_substs branch from 16a91de to 041121a Compare April 16, 2022 15:42
@martingms
Copy link
Contributor Author

Amended the commit now, should be an improvement over not using a cache:

Primary benchmarks

Benchmark Profile Scenario % Change Significance Factor?
cargo-0.60.0 check incr-full -0.37% 1.86x
ripgrep-13.0.0 check incr-full -0.34% 1.72x
cargo-0.60.0 check incr-patched: println -0.34% 1.70x
ripgrep-13.0.0 check full -0.34% 1.70x
cargo-0.60.0 check full -0.34% 1.68x
syn-1.0.89 check incr-full -0.33% 1.65x

Secondary benchmarks

Benchmark Profile Scenario % Change Significance Factor?
deeply-nested-multi check incr-full -0.50% 2.51x
deeply-nested-multi debug incr-full -0.42% 2.11x
deeply-nested-multi opt incr-full -0.38% 1.92x

I also tried adding a fast return

diff --git a/compiler/rustc_middle/src/ty/relate.rs b/compiler/rustc_middle/src/ty/relate.rs
index b1cfa3fa964..ff22d1d1902 100644
--- a/compiler/rustc_middle/src/ty/relate.rs
+++ b/compiler/rustc_middle/src/ty/relate.rs
@@ -7,7 +7,7 @@
 use crate::mir::interpret::{get_slice_bytes, ConstValue, GlobalAlloc, Scalar};
 use crate::ty::error::{ExpectedFound, TypeError};
 use crate::ty::subst::{GenericArg, GenericArgKind, Subst, SubstsRef};
-use crate::ty::{self, ImplSubject, Term, Ty, TyCtxt, TypeFoldable};
+use crate::ty::{self, ImplSubject, List, Term, Ty, TyCtxt, TypeFoldable};
 use rustc_hir as ast;
 use rustc_hir::def_id::DefId;
 use rustc_span::DUMMY_SP;
@@ -141,8 +141,11 @@ pub fn relate_substs<'tcx, R: TypeRelation<'tcx>>(
     a_subst: SubstsRef<'tcx>,
     b_subst: SubstsRef<'tcx>,
 ) -> RelateResult<'tcx, SubstsRef<'tcx>> {
-    let tcx = relation.tcx();
+    if a_subst.is_empty() || b_subst.is_empty() {
+        return Ok(List::empty());
+    }
 
+    let tcx = relation.tcx();
     let zipped = iter::zip(a_subst, b_subst);
     match variances {
         Some((ty_def_id, variances)) => {

with the above patch, but the local results, while improved for deeply-nested-multi, were mixed:

Benchmark Profile Scenario % Change Significance Factor?
deep-vector debug full 0.71% 3.57x
regression-31157 check full -0.39% 1.96x
coercions debug incr-patched: println -0.35% 1.73x
deep-vector debug incr-patched: add vec item 0.34% 1.69x
deeply-nested-multi check full -0.32% 1.61x
regression-31157 check incr-full -0.32% 1.61x

@nnethercote
Copy link
Contributor

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Apr 16, 2022
@bors
Copy link
Contributor

bors commented Apr 16, 2022

⌛ Trying commit 041121a with merge 60fe9fdfb1584b34704021611324acb33c5dc472...

@bors
Copy link
Contributor

bors commented Apr 16, 2022

☀️ Try build successful - checks-actions
Build commit: 60fe9fdfb1584b34704021611324acb33c5dc472 (60fe9fdfb1584b34704021611324acb33c5dc472)

@rust-timer
Copy link
Collaborator

Queued 60fe9fdfb1584b34704021611324acb33c5dc472 with parent 2fa9789, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (60fe9fdfb1584b34704021611324acb33c5dc472): comparison url.

Summary:

  • Primary benchmarks: 🎉 relevant improvement found
  • Secondary benchmarks: 😿 relevant regressions found
Regressions 😿
(primary)
Regressions 😿
(secondary)
Improvements 🎉
(primary)
Improvements 🎉
(secondary)
All 😿 🎉
(primary)
count1 0 6 1 0 1
mean2 N/A 1.8% -0.3% N/A -0.3%
max N/A 2.4% -0.3% N/A -0.3%

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf -perf-regression

Footnotes

  1. number of relevant changes

  2. the arithmetic mean of the percent change

@rustbot rustbot removed S-waiting-on-perf Status: Waiting on a perf run to be completed. perf-regression Performance regression. labels Apr 16, 2022
One for the case with variances, and one without.
All callers use an explicit Option for the variable anyway.
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (114440f4e7b9ec843d38a79631c132534ed409c2): comparison url.

Summary:

  • Primary benchmarks: no relevant changes found
  • Secondary benchmarks: mixed results
Regressions 😿
(primary)
Regressions 😿
(secondary)
Improvements 🎉
(primary)
Improvements 🎉
(secondary)
All 😿 🎉
(primary)
count1 0 9 0 6 0
mean2 N/A 1.5% N/A -0.4% N/A
max N/A 2.3% N/A -0.4% N/A

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf +perf-regression

Footnotes

  1. number of relevant changes

  2. the arithmetic mean of the percent change

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Apr 17, 2022
@nnethercote
Copy link
Contributor

I did some local measurements, everything was improved or unchanged:

-----------------------------------------------------------------------------
Rel0
-----------------------------------------------------------------------------
bitmaps-3.1.0        9,377,327,686 (100.0%)  PROGRAM TOTALS
ctfe-stress-5        24,496,246,657 (100.0%)  PROGRAM TOTALS
deep-vector          5,207,740,010 (100.0%)  PROGRAM TOTALS
deeply-nested-multi  3,064,725,809 (100.0%)  PROGRAM TOTALS
match-stress         8,193,950,727 (100.0%)  PROGRAM TOTALS
nalgebra-0.30.1      98,667,024,324 (100.0%)  PROGRAM TOTALS
projection-caching   1,322,894,706 (100.0%)  PROGRAM TOTALS

-----------------------------------------------------------------------------
Rel1: Rel0 + move match outside iterator in relate_substs
-----------------------------------------------------------------------------
bitmaps-3.1.0        9,217,737,637 (100.0%)  PROGRAM TOTALS
ctfe-stress-5        24,495,780,146 (100.0%)  PROGRAM TOTALS
deep-vector          5,219,937,388 (100.0%)  PROGRAM TOTALS
deeply-nested-multi  3,066,395,074 (100.0%)  PROGRAM TOTALS
match-stress         8,194,731,843 (100.0%)  PROGRAM TOTALS
nalgebra-0.30.1      97,802,698,656 (100.0%)  PROGRAM TOTALS
projection-caching   1,312,375,711 (100.0%)  PROGRAM TOTALS

-----------------------------------------------------------------------------
Rel2: Rel1 + split relate_substs into two funcs
-----------------------------------------------------------------------------
bitmaps-3.1.0        9,133,639,438 (100.0%)  PROGRAM TOTALS
ctfe-stress-5        24,494,392,051 (100.0%)  PROGRAM TOTALS
deep-vector          5,200,594,917 (100.0%)  PROGRAM TOTALS
deeply-nested-multi  3,061,037,540 (100.0%)  PROGRAM TOTALS
match-stress         8,193,471,227 (100.0%)  PROGRAM TOTALS
nalgebra-0.30.1      97,482,012,496 (100.0%)  PROGRAM TOTALS
projection-caching   1,306,663,149 (100.0%)  PROGRAM TOTALS

@nnethercote
Copy link
Contributor

nnethercote commented Apr 19, 2022

If you click on a percent change result for a single benchmark run you end up at a page like this one. That gives some profile_local commands you can run that will download the versions from CI, and then run and diff them locally. I did this, the imporant part of the Cachegrind diff looks like this:

--------------------------------------------------------------------------------
Ir
--------------------------------------------------------------------------------
57,422,125  PROGRAM TOTALS

--------------------------------------------------------------------------------
Ir             file:function
--------------------------------------------------------------------------------
 -242,552,118  ???:<rustc_middle::ty::Ty as rustc_middle::ty::fold::TypeFoldable>::try_fold_with::<rustc_middle::ty::fold::RegionFolder>
  236,328,334  ???:<rustc_middle::ty::Ty as rustc_middle::ty::fold::TypeFoldable>::super_fold_with::<rustc_middle::ty::fold::RegionFolder>
   65,533,405  ???:rustc_middle::ty::util::fold_list::<rustc_middle::ty::fold::RegionFolder, rustc_middle::ty::Ty, <&rustc_middle::ty::list::List<rustc_middle::ty::Ty> as rustc_middle::ty::fold::T
ypeFoldable>::try_super_fold_with<rustc_middle::ty::fold::RegionFolder>::{closure

This isn't really related to your change. relate_substs just isn't very hot for deeply-nested-multi. I suspect you have hit some case where some hot path on code generation has been perturbed in an unlucky and unpredictable way. Especially because #96022 appears to have hit exactly the same issue. (Seriously, look at the raw data, the numbers are incredibly similar.)

@nnethercote
Copy link
Contributor

I'm going to approve this, based on the following:

  • The code is clearly better. It avoids various conditional branches.
  • There are clear improvements on multiple benchmarks for both @martingms and me when doing local measurements.
  • deeply-nested-multi is a secondary benchmark, and is behaving weirdly in Inline ty::Const::ty() and ty::Const::val() getters #96022 as well, which suggests that it's currently prone to this particular regression on CI more or less randomly.

@martingms: once this merges, we should do another CI perf run for #96022.

@bors r+

@bors
Copy link
Contributor

bors commented Apr 19, 2022

📌 Commit 19dedf3 has been approved by nnethercote

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 19, 2022
@bors
Copy link
Contributor

bors commented Apr 19, 2022

⌛ Testing commit 19dedf3 with merge eea74015f20022c0cfc414b8ed9ecdb2eb0197e7...

@rust-log-analyzer
Copy link
Collaborator

The job x86_64-gnu-aux failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)
running `/checkout/obj/build/x86_64-unknown-linux-gnu/stage2-tools/x86_64-unknown-linux-gnu/release/cargo build`
running `/checkout/obj/build/x86_64-unknown-linux-gnu/stage2-tools/x86_64-unknown-linux-gnu/release/cargo build`
thread 'workspaces::relative_rustc' panicked at '
test failed running `/checkout/obj/build/x86_64-unknown-linux-gnu/stage2-tools/x86_64-unknown-linux-gnu/release/cargo build`
error: process exited with code 101 (expected 0)

--- stderr
--- stderr
error: could not execute process `/checkout/obj/build/x86_64-unknown-linux-gnu/stage2-tools/x86_64-unknown-linux-gnu/tmp/cit/t2430/lib/./foo -vV` (never executed)
Caused by:
  Text file busy (os error 26)
', src/tools/cargo/tests/testsuite/workspaces.rs:2147:42

---
test result: FAILED. 2456 passed; 1 failed; 7 ignored; 0 measured; 0 filtered out; finished in 114.23s

error: test failed, to rerun pass '--test testsuite'
Build completed unsuccessfully in 0:24:15
make: *** [check-aux] Error 1
Makefile:44: recipe for target 'check-aux' failed

@bors
Copy link
Contributor

bors commented Apr 19, 2022

💔 Test failed - checks-actions

@bors bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Apr 19, 2022
@nnethercote
Copy link
Contributor

@bors retry

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 19, 2022
@bors
Copy link
Contributor

bors commented Apr 19, 2022

⌛ Testing commit 19dedf3 with merge c102c5c...

@bors
Copy link
Contributor

bors commented Apr 19, 2022

☀️ Test successful - checks-actions
Approved by: nnethercote
Pushing c102c5c to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Apr 19, 2022
@bors bors merged commit c102c5c into rust-lang:master Apr 19, 2022
@rustbot rustbot added this to the 1.62.0 milestone Apr 19, 2022
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (c102c5c): comparison url.

Summary:

  • Primary benchmarks: no relevant changes found
  • Secondary benchmarks: mixed results
Regressions 😿
(primary)
Regressions 😿
(secondary)
Improvements 🎉
(primary)
Improvements 🎉
(secondary)
All 😿 🎉
(primary)
count1 0 6 0 8 0
mean2 N/A 1.8% N/A -0.6% N/A
max N/A 2.3% N/A -1.2% N/A

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Next Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please open an issue or create a new PR that fixes the regressions, add a comment linking to the newly created issue or PR, and then add the perf-regression-triaged label to this PR.

@rustbot label: +perf-regression

Footnotes

  1. number of relevant changes

  2. the arithmetic mean of the percent change

@nnethercote
Copy link
Contributor

@rustbot label: +perf-regression-triaged

For the reasons described above.

@rustbot rustbot added the perf-regression-triaged The performance regression has been triaged. label Apr 19, 2022
dario23 added a commit to dario23/rust-semverver that referenced this pull request May 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. perf-regression Performance regression. perf-regression-triaged The performance regression has been triaged. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants