-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a new lint to check for defining of read-only arrays on stack #12854
Conversation
37b1518
to
d62e1d8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't looked at the actual lint impl yet, some initial questions
/// ``` | ||
#[clippy::version = "1.80.0"] | ||
pub LET_ARR_CONST, | ||
perf, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this should be (at most) a pedantic lint, at least for such a small threshold of 16 bytes. It seems like a micro optimization in cases like this one where it's just two string slices.
It would also have a fair number of false positives when MaybeUninit
is involved. E.g.
struct Buffer([MaybeUninit<u8>; 40]);
impl Buffer {
pub fn new() -> Self {
let bytes = [MaybeUninit::uninit(); 40];
Self(bytes)
}
}
I don't think a static would really help here. There's no initialization code involved (taken from the itoa crate)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for letting me know. I haven't considered this case. I would try to exclude this in code.
And for threshold of 16 bytes, I would try to come up with a micro benchmark.
clippy_lints/src/let_arr_const.rs
Outdated
/// `let array` puts array on the stack which might make the generated binary file | ||
/// bigger and slower. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should go into the "Why is this bad?" section. "Known problems" is for problems with this lint (ie false positives or false negatives).
But also, this should explain more concretely why this makes the binary slower or faster.
For example, what initially confused me from reading this was that *&<array literal>
looked like it would still have to copy it onto the stack, so you'd have the array end up on the stack either way (semantically, ignoring opts).
The way I understand it is that the address-of operator makes rustc static-promote the array literal, which means that it can avoid emitting individual movs or memsets that would be required to initialize the array. Then dereferencing it can emit a memcpy without having to mutate the copy (unlike before where it would need store
s to initialize the array), which gives LLVM an easier time to reason about it and optimize that copy away if you only e.g. directly index into it?
clippy_lints/src/let_arr_const.rs
Outdated
|
||
if should { | ||
let snippet = snippet_with_applicability(cx, init.span, "_", &mut applicability); | ||
let msg = format!("{generic_msg} or try: `*&{snippet}`"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's as simple as just static promoting the array literal and then copying it, I feel like it would make more sense for this to just be a MIR optimization in the compiler instead.
Having to write *&<array literal>
feels awkward and I imagine it's not at all obvious to someone skimming over code why that's needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having to write *& feels awkward
Yeah, that's my thoughts as well. Like you said, it could better be a MIR opt.
But I don't have any experiences for writing one. Could you give me any pointers/documents?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your quick and thorough reviews.
All of your points is correct.
I would try to address all after I have a micro benchmarks with/without memcpy of an local array.
Alternative I also would like to write a MIR opt but I don't have any guidance.
It you (or anyone) could give me any pointers, I'm very thankful.
/// ``` | ||
#[clippy::version = "1.80.0"] | ||
pub LET_ARR_CONST, | ||
perf, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for letting me know. I haven't considered this case. I would try to exclude this in code.
And for threshold of 16 bytes, I would try to come up with a micro benchmark.
clippy_lints/src/let_arr_const.rs
Outdated
|
||
if should { | ||
let snippet = snippet_with_applicability(cx, init.span, "_", &mut applicability); | ||
let msg = format!("{generic_msg} or try: `*&{snippet}`"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having to write *& feels awkward
Yeah, that's my thoughts as well. Like you said, it could better be a MIR opt.
But I don't have any experiences for writing one. Could you give me any pointers/documents?
So I wrote a micro benchmark for measuring the performance pub fn indexing(x: usize) -> u8 {
let array: [u8; N] = [1, 3, ...];
// or
let array: [u8; N] = *&[1, 3, ...];
array[x]
} machine details
% rustc -Vv
rustc 1.80.0-nightly (bdbbb6c6a 2024-05-26)
binary: rustc
commit-hash: bdbbb6c6a718d4d196131aa16bafb60e850311d9
commit-date: 2024-05-26
host: x86_64-unknown-linux-gnu
release: 1.80.0-nightly
LLVM version: 18.1.6
% uname -a
Linux cfarm420 6.9.1-arch1-1 #1 SMP PREEMPT_DYNAMIC Fri, 17 May 2024 16:56:38 +0000 x86_64 GNU/Linux OutputWith repeat expressions: They have the same performance with/out With non-repeat expressions: All versions of My thoughtsIt seems that this lint could be useful with non-repeat version. If this lint is not accepted as a warning-by-default lint, |
We discussed this in today's Clippy meeting (sort of). See this thread (feel free to comment on that as well). We (the Clippy team, those that were there at least) generally seem to agree that this would be better implemented as a MIR optimization and less as a lint, since it's still semantically equivalent and is a very mechanical suggestion that could easily be done automatically. I don't know about the specifics of MIR optimizations, but I would recommend opening a new thread in the t-compiler/wg-mir-opt channel to get that started. They'd surely know better than me. :) |
/// Checks for defining of read-only arrays on stack. | ||
/// | ||
/// ### Why is this bad? | ||
/// `let array` puts array on the stack. As as intended, the compiler will |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// `let array` puts array on the stack. As as intended, the compiler will | |
/// `let array` puts the array on the stack. As directed, the compiler will |
Thank you. I respect the team decisions. I split the part that ignoring array from deref_addressof lint Closing. |
ignore array from `deref_addrof` lint Split from #12854 changelog: ignore array from `deref_addrof` lint r? y21
gvn: Promote/propagate const local array Rewriting of rust-lang#125916 which used `PromoteTemps` pass. This allows promoting constant local arrays as anonymous constants. So that's in codegen for a local array, rustc outputs `llvm.memcpy` (which is easy for LLVM to optimize) instead of a series of `store` on stack (a.k.a in-place initialization). This makes rustc on par with clang on this specific case. See more in rust-lang#73825 or [zulip][opsem] for more info. [Here is a simple micro benchmark][bench] that shows the performance differences between promoting arrays or not. [Prior discussions on zulip][opsem]. This patch [saves about 600 KB][perf] (~0.5%) of `librustc_driver.so`. ![image](https://github.com/rust-lang/rust/assets/15225902/0e37559c-f5d9-4cdf-b7e3-a2956fd17bc1) Fix rust-lang#73825 r? cjgillot ### Unresolved questions - [ ] Should we ignore nested arrays? I think that promoting nested arrays is bloating codegen. - [ ] Should stack_threshold be at least 32 bytes? Like the benchmark showed. If yes, the test should be updated to make arrays larger than 32 bytes. - [x] ~Is this concerning that `call(move _1)` is now `call(const [array])`?~ It reverted back to `call(move _1)` [opsem]: https://rust-lang.zulipchat.com/#narrow/stream/136281-t-opsem/topic/Could.20const.20read-only.20arrays.20be.20const.20promoted.3F [bench]: rust-lang/rust-clippy#12854 (comment) [perf]: https://perf.rust-lang.org/compare.html?start=f9515fdd5aa132e27d9b580a35b27f4b453251c1&end=7e160d4b55bb5a27be0696f45db247ccc2e166d9&stat=size%3Alinked_artifact&tab=artifact-size
gvn: Promote/propagate const local array Rewriting of rust-lang#125916 which used `PromoteTemps` pass. This allows promoting constant local arrays as anonymous constants. So that's in codegen for a local array, rustc outputs `llvm.memcpy` (which is easy for LLVM to optimize) instead of a series of `store` on stack (a.k.a in-place initialization). This makes rustc on par with clang on this specific case. See more in rust-lang#73825 or [zulip][opsem] for more info. [Here is a simple micro benchmark][bench] that shows the performance differences between promoting arrays or not. [Prior discussions on zulip][opsem]. This patch [saves about 600 KB][perf] (~0.5%) of `librustc_driver.so`. ![image](https://github.com/rust-lang/rust/assets/15225902/0e37559c-f5d9-4cdf-b7e3-a2956fd17bc1) Fix rust-lang#73825 r? cjgillot ### Unresolved questions - [ ] Should we ignore nested arrays? I think that promoting nested arrays is bloating codegen. - [ ] Should stack_threshold be at least 32 bytes? Like the benchmark showed. If yes, the test should be updated to make arrays larger than 32 bytes. - [x] ~Is this concerning that `call(move _1)` is now `call(const [array])`?~ It reverted back to `call(move _1)` [opsem]: https://rust-lang.zulipchat.com/#narrow/stream/136281-t-opsem/topic/Could.20const.20read-only.20arrays.20be.20const.20promoted.3F [bench]: rust-lang/rust-clippy#12854 (comment) [perf]: https://perf.rust-lang.org/compare.html?start=f9515fdd5aa132e27d9b580a35b27f4b453251c1&end=7e160d4b55bb5a27be0696f45db247ccc2e166d9&stat=size%3Alinked_artifact&tab=artifact-size
Motivations:
Benchmark: #12854 (comment)
changelog: [
let_arr_const
]: checks for defining of read-only arrays on stack.