Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a new lint to check for defining of read-only arrays on stack #12854

Closed
wants to merge 10 commits into from

Conversation

tesuji
Copy link
Contributor

@tesuji tesuji commented May 26, 2024

@rustbot
Copy link
Collaborator

rustbot commented May 26, 2024

r? @y21

rustbot has assigned @y21.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties label May 26, 2024
@tesuji tesuji force-pushed the let-const-arr branch 3 times, most recently from 37b1518 to d62e1d8 Compare May 26, 2024 05:45
Copy link
Member

@y21 y21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't looked at the actual lint impl yet, some initial questions

/// ```
#[clippy::version = "1.80.0"]
pub LET_ARR_CONST,
perf,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this should be (at most) a pedantic lint, at least for such a small threshold of 16 bytes. It seems like a micro optimization in cases like this one where it's just two string slices.

It would also have a fair number of false positives when MaybeUninit is involved. E.g.

struct Buffer([MaybeUninit<u8>; 40]);
impl Buffer {
  pub fn new() -> Self {
    let bytes = [MaybeUninit::uninit(); 40];
    Self(bytes)
  }
}

I don't think a static would really help here. There's no initialization code involved (taken from the itoa crate)

Copy link
Contributor Author

@tesuji tesuji May 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for letting me know. I haven't considered this case. I would try to exclude this in code.

And for threshold of 16 bytes, I would try to come up with a micro benchmark.

Comment on lines 20 to 21
/// `let array` puts array on the stack which might make the generated binary file
/// bigger and slower.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should go into the "Why is this bad?" section. "Known problems" is for problems with this lint (ie false positives or false negatives).

But also, this should explain more concretely why this makes the binary slower or faster.

For example, what initially confused me from reading this was that *&<array literal> looked like it would still have to copy it onto the stack, so you'd have the array end up on the stack either way (semantically, ignoring opts).

The way I understand it is that the address-of operator makes rustc static-promote the array literal, which means that it can avoid emitting individual movs or memsets that would be required to initialize the array. Then dereferencing it can emit a memcpy without having to mutate the copy (unlike before where it would need stores to initialize the array), which gives LLVM an easier time to reason about it and optimize that copy away if you only e.g. directly index into it?


if should {
let snippet = snippet_with_applicability(cx, init.span, "_", &mut applicability);
let msg = format!("{generic_msg} or try: `*&{snippet}`");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's as simple as just static promoting the array literal and then copying it, I feel like it would make more sense for this to just be a MIR optimization in the compiler instead.

Having to write *&<array literal> feels awkward and I imagine it's not at all obvious to someone skimming over code why that's needed.

Copy link
Contributor Author

@tesuji tesuji May 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having to write *& feels awkward

Yeah, that's my thoughts as well. Like you said, it could better be a MIR opt.
But I don't have any experiences for writing one. Could you give me any pointers/documents?

Copy link
Contributor Author

@tesuji tesuji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your quick and thorough reviews.
All of your points is correct.
I would try to address all after I have a micro benchmarks with/without memcpy of an local array.

Alternative I also would like to write a MIR opt but I don't have any guidance.
It you (or anyone) could give me any pointers, I'm very thankful.

/// ```
#[clippy::version = "1.80.0"]
pub LET_ARR_CONST,
perf,
Copy link
Contributor Author

@tesuji tesuji May 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for letting me know. I haven't considered this case. I would try to exclude this in code.

And for threshold of 16 bytes, I would try to come up with a micro benchmark.


if should {
let snippet = snippet_with_applicability(cx, init.span, "_", &mut applicability);
let msg = format!("{generic_msg} or try: `*&{snippet}`");
Copy link
Contributor Author

@tesuji tesuji May 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having to write *& feels awkward

Yeah, that's my thoughts as well. Like you said, it could better be a MIR opt.
But I don't have any experiences for writing one. Could you give me any pointers/documents?

@tesuji
Copy link
Contributor Author

tesuji commented May 27, 2024

So I wrote a micro benchmark for measuring the performance
differences of let array: [u8; N] = [<expr>, ..]; and
let array: [u8; N] = *&[<expr>, ..];.
The benchmark is simple enough to bench this template code,
with N in [32, 64, 128, 256], and x < N.

pub fn indexing(x: usize) -> u8 {
	let array: [u8; N] = [1, 3, ...];
	// or
	let array: [u8; N] = *&[1, 3, ...];
	array[x]
}
machine details

% rustc -Vv
rustc 1.80.0-nightly (bdbbb6c6a 2024-05-26)
binary: rustc
commit-hash: bdbbb6c6a718d4d196131aa16bafb60e850311d9
commit-date: 2024-05-26
host: x86_64-unknown-linux-gnu
release: 1.80.0-nightly
LLVM version: 18.1.6
% uname -a
Linux cfarm420 6.9.1-arch1-1 #1 SMP PREEMPT_DYNAMIC Fri, 17 May 2024 16:56:38 +0000 x86_64 GNU/Linux

Output

report.zip

image

With repeat expressions:

They have the same performance with/out *&. LLVM optimizes them the same way.

With non-repeat expressions:

All versions of *& has the same performance. As LLVM
optimizes them the same way: as a single lea instruction on x86.
And they are significantly faster than ergonomic versions without *&.

My thoughts

It seems that this lint could be useful with non-repeat version.
As the cost of non-ergonomic way to type and see *&.

If this lint is not accepted as a warning-by-default lint,
as least, do not fire deref_addrof lint on let array.

@y21 y21 added the I-nominated Issue: Nominated to be discussed at the next Clippy meeting label May 27, 2024
@tesuji tesuji changed the title Add new lint that checks for defining of read-only arrays on stack. Add a new lint to check for defining of read-only arrays on stack May 28, 2024
@y21
Copy link
Member

y21 commented May 28, 2024

We discussed this in today's Clippy meeting (sort of). See this thread (feel free to comment on that as well).

We (the Clippy team, those that were there at least) generally seem to agree that this would be better implemented as a MIR optimization and less as a lint, since it's still semantically equivalent and is a very mechanical suggestion that could easily be done automatically.
One of the linked issues points out that Clang also has this as an optimization where they constant-promote arrays so LLVM can better see through them.

I don't know about the specifics of MIR optimizations, but I would recommend opening a new thread in the t-compiler/wg-mir-opt channel to get that started. They'd surely know better than me. :)
The rustc dev guide also has some documentation on MIR passes.

@y21 y21 removed the I-nominated Issue: Nominated to be discussed at the next Clippy meeting label May 28, 2024
/// Checks for defining of read-only arrays on stack.
///
/// ### Why is this bad?
/// `let array` puts array on the stack. As as intended, the compiler will
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// `let array` puts array on the stack. As as intended, the compiler will
/// `let array` puts the array on the stack. As directed, the compiler will

@tesuji
Copy link
Contributor Author

tesuji commented May 29, 2024

Thank you. I respect the team decisions. I split the part that ignoring array from deref_addressof lint
into #12864.

Closing.

@tesuji tesuji closed this May 29, 2024
@tesuji tesuji deleted the let-const-arr branch May 29, 2024 02:29
bors added a commit that referenced this pull request May 30, 2024
ignore array from `deref_addrof` lint

Split from #12854

changelog: ignore array from `deref_addrof` lint

r? y21
bors added a commit to rust-lang-ci/rust that referenced this pull request Jul 14, 2024
gvn: Promote/propagate const local array

Rewriting of rust-lang#125916 which used `PromoteTemps` pass.

This allows promoting constant local arrays as anonymous constants. So that's in codegen for
a local array, rustc outputs `llvm.memcpy` (which is easy for LLVM to optimize) instead of a series
of `store` on stack (a.k.a in-place initialization). This makes rustc on par with clang on this specific case.
See more in rust-lang#73825 or [zulip][opsem] for more info.

[Here is a simple micro benchmark][bench] that shows the performance differences between promoting arrays or not.

[Prior discussions on zulip][opsem].

This patch [saves about 600 KB][perf] (~0.5%) of `librustc_driver.so`.
![image](https://github.com/rust-lang/rust/assets/15225902/0e37559c-f5d9-4cdf-b7e3-a2956fd17bc1)

Fix rust-lang#73825

r? cjgillot

### Unresolved questions
- [ ] Should we ignore nested arrays?
    I think that promoting nested arrays is bloating codegen.
- [ ] Should stack_threshold be at least 32 bytes? Like the benchmark showed.
    If yes, the test should be updated to make arrays larger than 32 bytes.
- [x] ~Is this concerning that  `call(move _1)` is now `call(const [array])`?~
  It reverted back to `call(move _1)`

[opsem]: https://rust-lang.zulipchat.com/#narrow/stream/136281-t-opsem/topic/Could.20const.20read-only.20arrays.20be.20const.20promoted.3F
[bench]: rust-lang/rust-clippy#12854 (comment)
[perf]: https://perf.rust-lang.org/compare.html?start=f9515fdd5aa132e27d9b580a35b27f4b453251c1&end=7e160d4b55bb5a27be0696f45db247ccc2e166d9&stat=size%3Alinked_artifact&tab=artifact-size
bors added a commit to rust-lang-ci/rust that referenced this pull request Jul 14, 2024
gvn: Promote/propagate const local array

Rewriting of rust-lang#125916 which used `PromoteTemps` pass.

This allows promoting constant local arrays as anonymous constants. So that's in codegen for
a local array, rustc outputs `llvm.memcpy` (which is easy for LLVM to optimize) instead of a series
of `store` on stack (a.k.a in-place initialization). This makes rustc on par with clang on this specific case.
See more in rust-lang#73825 or [zulip][opsem] for more info.

[Here is a simple micro benchmark][bench] that shows the performance differences between promoting arrays or not.

[Prior discussions on zulip][opsem].

This patch [saves about 600 KB][perf] (~0.5%) of `librustc_driver.so`.
![image](https://github.com/rust-lang/rust/assets/15225902/0e37559c-f5d9-4cdf-b7e3-a2956fd17bc1)

Fix rust-lang#73825

r? cjgillot

### Unresolved questions
- [ ] Should we ignore nested arrays?
    I think that promoting nested arrays is bloating codegen.
- [ ] Should stack_threshold be at least 32 bytes? Like the benchmark showed.
    If yes, the test should be updated to make arrays larger than 32 bytes.
- [x] ~Is this concerning that  `call(move _1)` is now `call(const [array])`?~
  It reverted back to `call(move _1)`

[opsem]: https://rust-lang.zulipchat.com/#narrow/stream/136281-t-opsem/topic/Could.20const.20read-only.20arrays.20be.20const.20promoted.3F
[bench]: rust-lang/rust-clippy#12854 (comment)
[perf]: https://perf.rust-lang.org/compare.html?start=f9515fdd5aa132e27d9b580a35b27f4b453251c1&end=7e160d4b55bb5a27be0696f45db247ccc2e166d9&stat=size%3Alinked_artifact&tab=artifact-size
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants