Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use approx_ty_size for large_enum_variant #9400

Merged
merged 1 commit into from
Sep 3, 2022

Conversation

lukaslueg
Copy link
Contributor

@lukaslueg lukaslueg commented Aug 30, 2022

This builds upon #9373 to use the approximate size of each variant for large_enum_variant. This allows us to lint in situations where an enum contains generics but is still guaranteed to have a large variant on an at-least basis, e.g. with (T, [u8; 512]).

  • I've changed the wording from "is ... bytes" to "contains at least" because
    • the size is now an approximate lower bound (e.g. 512 in the example above). The actual size is larger due to T, including due to T's memory layout.
    • the discriminant is not taken into account in the message. This comes up with variants like A(T), which are "is at least 0 bytes" otherwise, which may be misleading.
  • If the second-largest variant has no fields, there is a special case "carries no data" instead of "is at least 0 bytes".
  • A variant like A(T) is "at least 0 bytes", which is technically true, yet we don't distinguish between "indeterminate" and truly "ZST".
  • The generics-tests that were there before now lint while they didn't lint before. AFAICS this is correct.

I guess the above is correct-ish. However, I use the SubstsRef that I got via cx.tcx.type_of(item.def_id) to solve for generics in the variants. Is this even applicable, since we start from an - [ ] ItemKind?

changelog: none

@rust-highfive
Copy link

r? @llogiq

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties label Aug 30, 2022
@llogiq
Copy link
Contributor

llogiq commented Aug 30, 2022

I'm just a bit worried that the lint may become really noisy with this change.

Anyone up for running lintcheck against this branch?

Otherwise this looks good to go.

@lukaslueg
Copy link
Contributor Author

cargo lintcheck comes up empty wrt large_enum_variant on this branch (as on master).

I'm a little worried about noise as well. Consider enum Foo<const U: usize> { foo: [u8; 200], bar: [u8; U] }, which would always lint because bar is considered to be at least 0 bytes, making it threshold-bytes smaller than foo. One can argue that this is not wrong, though, because as far as Foo is concerned there is a suspiciously out-of-whack-large variant.

If this is gtg (please review my question wrt SubstsRef above), I'll need to update the lint description before merging.

@Jarcho
Copy link
Contributor

Jarcho commented Aug 31, 2022

Running this on a few thousand crates. I'll post the results tonight.

@Jarcho
Copy link
Contributor

Jarcho commented Sep 1, 2022

Results from roughly the top 5000 crates: https://gist.github.com/Jarcho/57c0e4927769c044b0a56544d429027a

Don't know how many of these have generic types.

@lukaslueg
Copy link
Contributor Author

Only 42 lints, imho all valid or at least debatable. I'm surprised the number is so low.

This PR clearly helps with

warning: large size difference between variants
   --> src/iterator.rs:324:4
    |
324 |             Descend(Result<(OwnedNode<DBValue>, Option<O>), O, E>),
    |             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ this variant contains at least 464 bytes
    |
    = note: requested on the command line with `-W clippy::large-enum-variant`
note: and the second-largest variant carries no data at all
   --> src/iterator.rs:321:4
    |
321 |             YieldNode,
    |             ^^^^^^^^^

Unrelated to this PR:

memcache-0.16.0: 1 warnings

warning: large size difference between variants
  --> src/protocol/mod.rs:14:1
   |
14 | #[enum_dispatch]
   | ^^^^^^^^^^^^^^^^ this variant contains at least 2120 bytes
   |
   = note: requested on the command line with `-W clippy::large-enum-variant`
note: and the second-largest variant contains at least 64 bytes:
  --> src/protocol/mod.rs:14:1
   |
14 | #[enum_dispatch]
   | ^^^^^^^^^^^^^^^^
   = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#large_enum_variant
   = note: this warning originates in the attribute macro `enum_dispatch` (in Nightly builds, run with -Z macro-backtrace for more info

... which comes from

#[enum_dispatch]
pub enum Protocol {
    Ascii(AsciiProtocol<Stream>),
    Binary(BinaryProtocol),
}

@lukaslueg lukaslueg changed the title [WIP] Use approx_ty_size for large_enum_variant Use approx_ty_size for large_enum_variant Sep 1, 2022
@lukaslueg
Copy link
Contributor Author

Force-pushed a new version, the lint is now

error: large size difference between variants
  --> $DIR/large_enum_variant.rs:120:1
   |
LL | / enum SomeGenericPossiblyCopyEnum<T> {
LL | |     A(bool, std::marker::PhantomData<T>),
   | |     ------------------------------------ the second-largest variant contains at least 1 bytes
LL | |     B([u64; 4000]),
   | |     -------------- the largest variant contains at least 32000 bytes
LL | | }
   | |_^ the entire enum is at least 32008 bytes
   |
note: boxing a variant would require the type no longer be `Copy`
  --> $DIR/large_enum_variant.rs:120:6
   |
LL | enum SomeGenericPossiblyCopyEnum<T> {
   |      ^^^^^^^^^^^^^^^^^^^^^^^^^^^
help: consider boxing the large fields to reduce the total size of the enum
  --> $DIR/large_enum_variant.rs:122:5
   |
LL |     B([u64; 4000]),
   |     ^^^^^^^^^^^^^^

which is better imho.

LGTM now, happy for comments.

I may get to the #[enum_dispatch]-problem in a separate PR. I may also add some tooling to approx_ty_size to discern between "at least {} bytes" and "exactly {} bytes", so one can special-case "indeterminate" from "at least 0 bytes".

@llogiq
Copy link
Contributor

llogiq commented Sep 1, 2022

Even if that number is low, for a warn by default lint we should have quite high standards. I'm not opposed to merging this, but I don't feel comfortable r+ing this alone without further discussion.

What do you folks think @rust-lang/clippy ?

@llogiq llogiq added the S-needs-discussion Status: Needs further discussion before merging or work can be started label Sep 1, 2022
@Manishearth
Copy link
Member

We already warn on non generic enums, right? I think this ought to be fine?

@Jarcho
Copy link
Contributor

Jarcho commented Sep 1, 2022

The old behaviour is to ignore variants with generics correct? That means the false positives are for:

  • People who don't care (not a new group)
  • The distribution of created variants is biased towards the large variant (not a new problem)
  • The generics actually substituted would even out the sizes (I think this is a new class of false positives)

I would think for anyone who cares about this lint, a comment explaining why a certain case isn't a problem is reasonable. For everyone else the lint would be an annoyance even before this PR.

@llogiq
Copy link
Contributor

llogiq commented Sep 2, 2022

I have suggested a workable explanation of the possible problems to the lint docs. r=me with this applied.

@lukaslueg
Copy link
Contributor Author

@llogiq Included the comment wrt false positives. Rebased to include ad72aee

@llogiq
Copy link
Contributor

llogiq commented Sep 2, 2022

Thank you!

@bors r+

@bors
Copy link
Contributor

bors commented Sep 2, 2022

📌 Commit 584000a has been approved by llogiq

It is now in the queue for this repository.

@bors
Copy link
Contributor

bors commented Sep 2, 2022

⌛ Testing commit 584000a with merge c8ceb58...

bors added a commit that referenced this pull request Sep 2, 2022
Use `approx_ty_size` for `large_enum_variant`

This builds upon #9373 to use the approximate size of each variant for `large_enum_variant`. This allows us to lint in situations where an `enum` contains generics but is still guaranteed to have a large variant on an at-least basis, e.g. with `(T, [u8; 512])`.

* I've changed the wording from "is ... bytes" to "contains at least" because
  * the size is now an approximate lower bound (e.g. `512` in the example above). The actual size is larger due to `T`, including due to `T`'s memory layout.
  * the discriminant is not taken into account in the message. This comes up with variants like `A(T)`, which are "is at least 0 bytes" otherwise, which may be misleading.
* If the second-largest variant has no fields, there is a special case "carries no data" instead of "is at least 0 bytes".
* A variant like `A(T)` is "at least 0 bytes", which is technically true, yet we don't distinguish between "indeterminate" and truly "ZST".
* The generics-tests that were there before now lint while they didn't lint before. AFAICS this is correct.

I guess the above is correct-ish. However, I use the `SubstsRef` that I got via `cx.tcx.type_of(item.def_id)` to solve for generics in the variants. Is this even applicable, since we start from an `ItemKind`?
@bors
Copy link
Contributor

bors commented Sep 2, 2022

💔 Test failed - checks-action_test

@llogiq
Copy link
Contributor

llogiq commented Sep 3, 2022

@bors retry

@bors
Copy link
Contributor

bors commented Sep 3, 2022

⌛ Testing commit 584000a with merge 99ab5fe...

@bors
Copy link
Contributor

bors commented Sep 3, 2022

☀️ Test successful - checks-action_dev_test, checks-action_remark_test, checks-action_test
Approved by: llogiq
Pushing 99ab5fe to master...

@bors bors merged commit 99ab5fe into rust-lang:master Sep 3, 2022
@lukaslueg lukaslueg deleted the approx_large_enum branch September 3, 2022 09:06
@lukaslueg
Copy link
Contributor Author

@llogiq Thanks for the review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-needs-discussion Status: Needs further discussion before merging or work can be started S-waiting-on-review Status: Awaiting review from the assignee but also interested parties
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants