Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce deduced parameter attributes, and use them for deducing readonly on indirect immutable freeze by-value function parameters. #103172

Merged
merged 1 commit into from
Oct 22, 2022

Conversation

pcwalton
Copy link
Contributor

@pcwalton pcwalton commented Oct 18, 2022

Introduce deduced parameter attributes, and use them for deducing readonly on
indirect immutable freeze by-value function parameters.

Right now, rustc only examines function signatures and the platform ABI when
determining the LLVM attributes to apply to parameters. This results in missed
optimizations, because there are some attributes that can be determined via
analysis of the MIR making up the function body. In particular, readonly
could be applied to most indirectly-passed by-value function arguments
(specifically, those that are freeze and are observed not to be mutated), but
it currently is not.

This patch introduces the machinery that allows rustc to determine those
attributes. It consists of a query, deduced_param_attrs, that, when
evaluated, analyzes the MIR of the function to determine supplementary
attributes. The results of this query for each function are written into the
crate metadata so that the deduced parameter attributes can be applied to
cross-crate functions. In this patch, we simply check the parameter for
mutations to determine whether the readonly attribute should be applied to
parameters that are indirect immutable freeze by-value. More attributes could
conceivably be deduced in the future: nocapture and noalias come to mind.

Adding readonly to indirect function parameters where applicable enables some
potential optimizations in LLVM that are discussed in issue 103103 and PR
103070
around avoiding stack-to-stack memory copies that appear in functions
like core::fmt::Write::write_fmt and core::panicking::assert_failed. These
functions pass a large structure unchanged by value to a subfunction that also
doesn't mutate it. Since the structure in this case is passed as an indirect
parameter, it's a pointer from LLVM's perspective. As a result, the
intermediate copy of the structure that our codegen emits could be optimized
away by LLVM's MemCpyOptimizer if it knew that the pointer is readonly nocapture noalias in both the caller and callee. We already pass nocapture noalias, but we're missing readonly, as we can't determine whether a
by-value parameter is mutated by examining the signature in Rust. I didn't have
much success with having LLVM infer the readonly attribute, even with fat
LTO; it seems that deducing it at the MIR level is necessary.

No large benefits should be expected from this optimization now; LLVM needs
some changes (discussed in PR 103070) to more aggressively use the noalias nocapture readonly combination in its alias analysis. I have some LLVM patches
for these optimizations and have had them looked over. With all the patches
applied locally, I enabled LLVM to remove all the memcpys from the following
code:

fn main() {
    println!("Hello {}", 3);
}

which is a significant codegen improvement over the status quo. I expect that if this optimization kicks in in multiple places even for such a simple program, then it will apply to Rust code all over the place.

@rustbot rustbot added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Oct 18, 2022
@rustbot
Copy link
Collaborator

rustbot commented Oct 18, 2022

Some changes occurred to MIR optimizations

cc @rust-lang/wg-mir-opt

@rust-highfive
Copy link
Collaborator

r? @compiler-errors

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Oct 18, 2022
@pcwalton pcwalton force-pushed the deduced-param-attrs branch 5 times, most recently from 9cdbc31 to 2d670be Compare October 18, 2022 03:20
@pcwalton pcwalton marked this pull request as draft October 18, 2022 03:28
@rust-log-analyzer

This comment has been minimized.

@pcwalton pcwalton force-pushed the deduced-param-attrs branch from 2d670be to 9603c77 Compare October 18, 2022 04:40
@pcwalton
Copy link
Contributor Author

The patch is updated to use the Visitor to detect mutations of parameters. I'll mark it as non-draft if there are no more comments once the tests pass locally.

@pcwalton pcwalton force-pushed the deduced-param-attrs branch from 9603c77 to 0e8a4e6 Compare October 18, 2022 04:50
@rust-log-analyzer

This comment has been minimized.

@pcwalton pcwalton marked this pull request as ready for review October 18, 2022 05:42
@pcwalton
Copy link
Contributor Author

pcwalton commented Oct 18, 2022

This seems ready.

Those two failures confuse me—I don't mutate the MIR at all, and these aren't codegen tests…

@pcwalton pcwalton force-pushed the deduced-param-attrs branch from 0e8a4e6 to bf18d56 Compare October 18, 2022 06:06
@rust-log-analyzer

This comment has been minimized.

@oli-obk
Copy link
Contributor

oli-obk commented Oct 18, 2022

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Oct 18, 2022
@bors
Copy link
Contributor

bors commented Oct 18, 2022

⌛ Trying commit bf18d564e3d54e08ba4372e1d4b20ef1b6e3afaa with merge 9077e397fbfc2a1a5945228575b6ce77985fd1f8...

compiler/rustc_mir_transform/src/deduce_param_attrs.rs Outdated Show resolved Hide resolved
compiler/rustc_mir_transform/src/deduce_param_attrs.rs Outdated Show resolved Hide resolved
compiler/rustc_mir_transform/src/deduce_param_attrs.rs Outdated Show resolved Hide resolved
compiler/rustc_mir_transform/src/deduce_param_attrs.rs Outdated Show resolved Hide resolved
compiler/rustc_mir_transform/src/deduce_param_attrs.rs Outdated Show resolved Hide resolved
compiler/rustc_ty_utils/src/abi.rs Outdated Show resolved Hide resolved
compiler/rustc_mir_transform/src/deduce_param_attrs.rs Outdated Show resolved Hide resolved
@pcwalton pcwalton force-pushed the deduced-param-attrs branch from bf18d56 to 11fc0a7 Compare October 18, 2022 09:00
@pcwalton
Copy link
Contributor Author

Updated the PR to address comments. I added a new test to ensure that we don't mark non-freeze types as readonly. I also added a comment explaining why I don't think that the fact that moves semantically store undef to the moved-from value invalidates the optimization.

@bors
Copy link
Contributor

bors commented Oct 21, 2022

☀️ Try build successful - checks-actions
Build commit: a5cf94e7f6c6d3272682f3eeeb831ec529decd2f (a5cf94e7f6c6d3272682f3eeeb831ec529decd2f)

@rust-timer
Copy link
Collaborator

Queued a5cf94e7f6c6d3272682f3eeeb831ec529decd2f with parent dcb3761, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (a5cf94e7f6c6d3272682f3eeeb831ec529decd2f): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean1 range count2
Regressions ❌
(primary)
0.3% [0.3%, 0.3%] 1
Regressions ❌
(secondary)
0.5% [0.1%, 1.4%] 3
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-0.2% [-0.2%, -0.2%] 3
All ❌✅ (primary) 0.3% [0.3%, 0.3%] 1

Max RSS (memory usage)

This benchmark run did not return any relevant results for this metric.

Cycles

This benchmark run did not return any relevant results for this metric.

Footnotes

  1. the arithmetic mean of the percent change

  2. number of relevant changes

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Oct 21, 2022
@oli-obk
Copy link
Contributor

oli-obk commented Oct 21, 2022

@bors delegate+

code and perf lgtm now. r=me with commits squashed

@bors
Copy link
Contributor

bors commented Oct 21, 2022

✌️ @pcwalton can now approve this pull request

…adonly` on

indirect immutable freeze by-value function parameters.

Right now, `rustc` only examines function signatures and the platform ABI when
determining the LLVM attributes to apply to parameters. This results in missed
optimizations, because there are some attributes that can be determined via
analysis of the MIR making up the function body. In particular, `readonly`
could be applied to most indirectly-passed by-value function arguments
(specifically, those that are freeze and are observed not to be mutated), but
it currently is not.

This patch introduces the machinery that allows `rustc` to determine those
attributes. It consists of a query, `deduced_param_attrs`, that, when
evaluated, analyzes the MIR of the function to determine supplementary
attributes. The results of this query for each function are written into the
crate metadata so that the deduced parameter attributes can be applied to
cross-crate functions. In this patch, we simply check the parameter for
mutations to determine whether the `readonly` attribute should be applied to
parameters that are indirect immutable freeze by-value.  More attributes could
conceivably be deduced in the future: `nocapture` and `noalias` come to mind.

Adding `readonly` to indirect function parameters where applicable enables some
potential optimizations in LLVM that are discussed in [issue 103103] and [PR
103070] around avoiding stack-to-stack memory copies that appear in functions
like `core::fmt::Write::write_fmt` and `core::panicking::assert_failed`. These
functions pass a large structure unchanged by value to a subfunction that also
doesn't mutate it. Since the structure in this case is passed as an indirect
parameter, it's a pointer from LLVM's perspective. As a result, the
intermediate copy of the structure that our codegen emits could be optimized
away by LLVM's MemCpyOptimizer if it knew that the pointer is `readonly
nocapture noalias` in both the caller and callee. We already pass `nocapture
noalias`, but we're missing `readonly`, as we can't determine whether a
by-value parameter is mutated by examining the signature in Rust. I didn't have
much success with having LLVM infer the `readonly` attribute, even with fat
LTO; it seems that deducing it at the MIR level is necessary.

No large benefits should be expected from this optimization *now*; LLVM needs
some changes (discussed in [PR 103070]) to more aggressively use the `noalias
nocapture readonly` combination in its alias analysis. I have some LLVM patches
for these optimizations and have had them looked over. With all the patches
applied locally, I enabled LLVM to remove all the `memcpy`s from the following
code:

```rust
fn main() {
    println!("Hello {}", 3);
}
```

which is a significant codegen improvement over the status quo. I expect that
if this optimization kicks in in multiple places even for such a simple
program, then it will apply to Rust code all over the place.

[issue 103103]: rust-lang#103103

[PR 103070]: rust-lang#103070
@pcwalton pcwalton force-pushed the deduced-param-attrs branch from e4e37f0 to da630ac Compare October 21, 2022 09:34
@pcwalton
Copy link
Contributor Author

@bors: r=oli-obk

@rustbot label: +perf-regression-triaged These are minor inconsistent performance changes in service of better optimization once LLVM alias analysis improves.

@rustbot
Copy link
Collaborator

rustbot commented Oct 21, 2022

Error: Label These can only be set by Rust team members

Please file an issue on GitHub at triagebot if there's a problem with this bot, or reach out on #t-infra on Zulip.

@bors
Copy link
Contributor

bors commented Oct 21, 2022

📌 Commit da630ac has been approved by oli-obk

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Oct 21, 2022
@bors
Copy link
Contributor

bors commented Oct 22, 2022

⌛ Testing commit da630ac with merge eecde58...

@Aaron1011
Copy link
Member

Aaron1011 commented Oct 22, 2022

cc @rust-lang/wg-unsafe-code-guidelines

From my understanding, this optimization won't change the behavior of any sound programs. If we create a pointer/reference to a function argument (e.g. fn foo(mut arg: SomeType) { let ptr = &mut arg }), then the argument will be considered 'mutable', and this optimization will be skipped. The only way for a program to modify such an argument would be to perform an out-of-bounds write on a different mutable pointer (which happens to have the same address as the argument storage). This is already undefined behavior.

However, I haven't seen any mention of unsafe in the PR discussion, so I thought it would to get confirmation about my reasoning from more knowledgeable people.

@bors
Copy link
Contributor

bors commented Oct 22, 2022

☀️ Test successful - checks-actions
Approved by: oli-obk
Pushing eecde58 to master...

1 similar comment
@bors
Copy link
Contributor

bors commented Oct 22, 2022

☀️ Test successful - checks-actions
Approved by: oli-obk
Pushing eecde58 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Oct 22, 2022
@bors bors merged commit eecde58 into rust-lang:master Oct 22, 2022
@rustbot rustbot added this to the 1.66.0 milestone Oct 22, 2022
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (eecde58): comparison URL.

Overall result: ❌✅ regressions and improvements - no action needed

@rustbot label: -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean1 range count2
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
1.3% [1.3%, 1.3%] 1
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-0.2% [-0.2%, -0.2%] 4
All ❌✅ (primary) - - 0

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean1 range count2
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
4.2% [4.2%, 4.2%] 1
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) - - 0

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean1 range count2
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-4.6% [-5.0%, -4.3%] 2
All ❌✅ (primary) - - 0

Footnotes

  1. the arithmetic mean of the percent change 2 3

  2. number of relevant changes 2 3

@rustbot rustbot removed the perf-regression Performance regression. label Oct 22, 2022
@RalfJung
Copy link
Member

RalfJung commented Oct 22, 2022

@Aaron1011 is there a summary of what happens here for someone who doesn't live and breathe LLVM IR?^^ My question is basically the same as in #103103: What do requirements do we need to impose on the MIR level to justify this attribute?

"indirect immutable freeze by-value function parameter" is using a lot of terms that don't exist in MIR so I don't understand what this means. How can a parameter be both indirect and by-value?!?

PlaceContext::MutatingUse(..)
| PlaceContext::NonMutatingUse(NonMutatingUseContext::Move) => {
// This is a mutation, so mark it as such.
self.mutable_args.insert(local.index() - 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NonMutatingUseContext::Move is a mutation? Looks like naming went wrong somewhere...?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a mutation for borrowck purposes, since you don't need let mut to move out. The opsem might disagree with that, but we should decide the opsem first and then consider renaming something here

@RalfJung
Copy link
Member

If we create a pointer/reference to a function argument (e.g. fn foo(mut arg: SomeType) { let ptr = &mut arg }), then the argument will be considered 'mutable', and this optimization will be skipped.

What is &, addr_of, or addr_of_mut are being used?

@JakobDegen
Copy link
Contributor

@RalfJung as far as I can tell, this only affects byval parameters in LLVM, which are used in some cases but only ever for parameters that are passed by value in MIR. As such, I don't think this imposes any additional restrictions on MIR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-query-system Area: The rustc query system (https://rustc-dev-guide.rust-lang.org/query.html) A-testsuite Area: The testsuite used to check the correctness of rustc merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.