Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New lints: Adding new fields to union types #950

Open
2 tasks
Tracked by #5
obi1kenobi opened this issue Sep 26, 2024 · 6 comments
Open
2 tasks
Tracked by #5

New lints: Adding new fields to union types #950

obi1kenobi opened this issue Sep 26, 2024 · 6 comments
Labels
A-lint Area: new or existing lint E-help-wanted Call for participation: Help is requested to fix this issue. E-mentor Call for participation: Mentorship is available for this issue.

Comments

@obi1kenobi
Copy link
Owner

obi1kenobi commented Sep 26, 2024

Removing a pub field from a union type is clearly a major breaking change. But how about adding a new field? I chatted with @joshlf about this, and the current plan is like this:

Any public API 1 field of a public API union may be read or written by downstream callers. There are two risks here:

  • The main risk is that if all fields are public API, a user may plausibly observe that all fields are bit-compatible with some type and thus use unsafe to read a field of that type. They would justify that this is sound because no matter what field was previously written, it's bit-compatible. Adding a new field that is not bit-compatible breaks that logic, regardless of whether the field is pub or not.
    • We are currently not able to determine bit-compatibility in cargo-semver-checks, so we'd trigger the lint for any field additions. If/when bit-pattern data is available, we can make the lint more fine-grained.
  • If non-public-API fields exist, then technically the union has not per se promised any particular bit-compatibility (unless the union's docs promise it in prose). But Hyrum's Law says that users may still assume bit-compatibility even though they shouldn't. As this is a secondary risk, a warn-by-default lint for field additions in this case would be good to have.

Two lints, then:

  • public API repr(C) union with no non-public-API fields adds new field, regardless of its visibility (major, deny by default)
  • public API repr(C) union with at least one non-public-API field adds new field, regardless of its visibility (major, warn by default)

Notes to folks looking to contribute these lints:

Footnotes

  1. "Public API" in cargo-semver-checks is defined as: pub and (not #[doc(hidden)] or #[deprecated]). In other words, deprecated pub items are public API regardless of #[doc(hidden)], and #[doc(hidden)] pub items are not public API unless deprecated. More info here: https://predr.ag/blog/checking-semver-for-doc-hidden-items/

@obi1kenobi obi1kenobi added A-lint Area: new or existing lint E-help-wanted Call for participation: Help is requested to fix this issue. E-mentor Call for participation: Mentorship is available for this issue. labels Sep 26, 2024
@djkoloski
Copy link

Wouldn't this not be the case for repr(Rust) unions since the compiler makes no guarantees about how fields overlap? I think the inference can only apply to repr(C) unions.

@obi1kenobi
Copy link
Owner Author

I've been thinking about this too. I think it might be possible to trigger this in repr(Rust) if there's no wiggle room for the fields' placement, for example:

  pub union Example {
      pub x: i64,
      pub y: u64,
+     pub z: [bool; 8],
  }

I believe that this addition is not sensitive to repr(Rust) layout because the overlap between all fields is total. I believe this is still breaking. What do you think?

I don't know what the semantics or guarantees (if any) might be for a union of unevenly-sized types, like:

pub union Uneven {
    pub first: i64,
    pub second: bool,
}

If anyone has any idea where the bool is supposed to end up in the union under repr(Rust), I'd love to know. My current impression is that it might not be well-defined without repr(C) here, but I'm not confident about it.

@bjorn3
Copy link

bjorn3 commented Sep 27, 2024

With -Zrandomize-layout I did expect Example to have x or y start at a non-zero offset even when z doesn't exist.

@obi1kenobi
Copy link
Owner Author

Interesting! Is encountering a repr(Rust) union an immediate code smell then? Should there be a clippy lint for that?

I'm happy to make our lints also require that the union be repr(C) before they fire, but I'm not sure if that truly solves the problem you both have highlighted here.

@djkoloski
Copy link

repr(Rust) unions are still useful, but have a more restricted set of sound operations. For example, it's still perfectly sound to use external data to determine which field is set. So a union might check some globally-configured value to determine which of its fields should be accessed or modified. Maybe you want to choose between storing a value inline versus boxed.

The main thing that you can't do with repr(Rust) unions is to write one field and read another. That can only be sound if your union has a well-defined layout so you know the exact semantics of the transmute.

@obi1kenobi
Copy link
Owner Author

Nice, that addresses my questions. I updated the lint descriptions above to only fire on repr(C) unions.

Thank you both for walking me through this, I appreciate it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-lint Area: new or existing lint E-help-wanted Call for participation: Help is requested to fix this issue. E-mentor Call for participation: Mentorship is available for this issue.
Projects
None yet
Development

No branches or pull requests

3 participants