Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transparent Unions and Enums #2645

Merged
merged 27 commits into from
Apr 30, 2019
Merged
Changes from 23 commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
991ea90
Transparent Unions RFC
mjbshaw Feb 25, 2019
e08e4a2
Update text/0000-transparent-unions.md
Centril Feb 25, 2019
f7c332e
Update text/0000-transparent-unions.md
Centril Feb 25, 2019
cb3f671
Update text/0000-transparent-unions.md
Centril Feb 25, 2019
381fcf6
Update text/0000-transparent-unions.md
Centril Feb 25, 2019
eb37cce
Update text/0000-transparent-unions.md
Centril Feb 25, 2019
45be61d
Update text/0000-transparent-unions.md
Centril Feb 25, 2019
582fd15
Update text/0000-transparent-unions.md
Centril Feb 25, 2019
d532738
Update text/0000-transparent-unions.md
Centril Feb 25, 2019
4529a95
Update text/0000-transparent-unions.md
Centril Feb 25, 2019
bb73505
Update text/0000-transparent-unions.md
Centril Feb 25, 2019
f9b9d8e
Update text/0000-transparent-unions.md
Centril Feb 25, 2019
1bce145
Update text/0000-transparent-unions.md
Centril Feb 25, 2019
f751ad3
Update text/0000-transparent-unions.md
Centril Feb 25, 2019
7250c02
Update text/0000-transparent-unions.md
Centril Feb 25, 2019
bea3b9f
Revise mentions of transparent enums
mjbshaw Feb 25, 2019
038829b
Remove unnecessary "do nothing" alternative
mjbshaw Feb 25, 2019
c532471
Clarify generic ZST transparent unions
mjbshaw Feb 25, 2019
541934e
Add missing "union" word
mjbshaw Feb 25, 2019
8c92dec
Add some initial text about transparent enums
mjbshaw Feb 26, 2019
18ecba9
s/method/function/
mjbshaw Feb 27, 2019
26d47b7
Fix indentation
mjbshaw Feb 27, 2019
4c88177
Document the behavior of nonnull-style optimization
mjbshaw Mar 9, 2019
ce8b569
Fix typo: union -> enum
Centril Mar 10, 2019
1488266
Mention that FFI can lead to UB if values are uninit
mjbshaw Mar 17, 2019
2afd342
Merge branch 'transparent_unions' of github.com:mjbshaw/rfcs into tra…
mjbshaw Mar 17, 2019
bf09395
RFC 2645
Centril Apr 30, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
229 changes: 229 additions & 0 deletions text/0000-transparent-unions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,229 @@
- Feature Name: transparent_enunions
- Start Date: 2019-02-13
- RFC PR:
- Rust Issue:

# Summary
[summary]: #summary

Allow `#[repr(transparent)]` on `union`s and univariant `enum`s that have exactly one non-zero-sized field (just like `struct`s).

# Motivation
[motivation]: #motivation

Some `union` types are thin newtype-style wrappers around another type, like `MaybeUninit<T>` (and [once upon a time](https://doc.rust-lang.org/1.28.0/src/core/mem.rs.html#955), `ManuallyDrop<T>`). This type is intended to be used in the same places as `T`, but without being `#[repr(transparent)]` the actual compatibility between it and `T` is left unspecified.

Likewise, some `enum` types only have a single variant, and are similarly thin wrappers around another type.

Making types like these `#[repr(transparent)]` would be useful in certain cases. For example, making the type `Wrapper<T>` (which is a `union` or univariant `enum` with a single field of type `T`) transparent:

- Clearly expresses the intent of the developer.
- Protects against accidental violations of that intent (e.g., adding a new variant or non-ZST field will result in a compiler error).
Copy link
Contributor

@gnzlbg gnzlbg Mar 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth mentioning that we already have a lint (improper ctypes) intended to catch similar issues. A compile-time test like:

extern "C" { fn _unused(u: MyUnion); }

might be enough to enforce that the union is safe to use in C FFI. This approach might not be as nice as repr(transparent) and it might also have other limitations. It would be nice if the RFC explored these limitations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure there's much to explore here since this is pretty different from repr(transparent). MyUnion needs to be repr(C) (if not repr(transparent)) to be FFI-safe, but the developer can add additional sized fields to MyUnion (something that repr(transparent) would have prevented) (and there's no good way to prevent additional sized fields from being added to a union).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. Is there an easy way to write a compile-time test to check that a union is indeed transparent ? If not, that's an argument that might be worth mentioning (e.g. without explicit transparent there is no easy way to write a test to check it).

- Makes a clear API guarantee that a `Wrapper<T>` can be transmuted to a `T` or substituted for a `T` in an FFI function's signature.
mjbshaw marked this conversation as resolved.
Show resolved Hide resolved

Transparent `union`s and univariant `enum`s are a nice complement to transparent `struct`s, and this RFC rounds out the `#[repr(transparent)]` feature.

# Guide-level explanation
[guide-level-explanation]: #guide-level-explanation

A `union` may be `#[repr(transparent)]` in exactly the same conditions in which a `struct` may be `#[repr(transparent)]`. An `enum` may be `#[repr(transparent)]` if it has exactly one variant, and that variant matches the same conditions which `struct` requires for transparency. Some concrete illustrations follow.

A union may be `#[repr(transparent)]` if it has exactly one non-zero-sized field:

```rust
// This union has the same representation as `f32`.
#[repr(transparent)]
union SingleFieldUnion {
field: f32,
}

// This union has the same representation as `usize`.
#[repr(transparent)]
union MultiFieldUnion {
field: usize,
nothing: (),
}

// This enum has the same representation as `f32`.
#[repr(transparent)]
enum SingleFieldEnum {
Variant(f32)
}

// This enum has the same representation as `usize`.
#[repr(transparent)]
enum MultiFieldEnum {
Variant { field: usize, nothing: () },
}
```

For consistency with transparent `struct`s, `union`s and `enum`s must have exactly one non-zero-sized field. If all fields are zero-sized, the `union` or `enum` must not be `#[repr(transparent)]`:

```rust
// This (non-transparent) union is already valid in stable Rust:
pub union GoodUnion {
pub nothing: (),
}

// This (non-transparent) enum is already valid in stable Rust:
pub enum GoodEnum {
Nothing,
}

// Error: transparent union needs exactly one non-zero-sized field, but has 0
#[repr(transparent)]
pub union BadUnion {
pub nothing: (),
}

// Error: transparent enum needs exactly one non-zero-sized field, but has 0
#[repr(transparent)]
pub enum BadEnum {
Nothing(()),
}

// Error: transparent enum needs exactly one non-zero-sized field, but has 0
#[repr(transparent)]
pub enum BadEmptyEnum {
Nothing,
}
```

The one exception is if the `union` or `enum` is generic over `T` and has a field of type `T`, it may be `#[repr(transparent)]` even if `T` is a zero-sized type:

```rust
// This union has the same representation as `T`.
#[repr(transparent)]
pub union GenericUnion<T: Copy> { // Unions with non-`Copy` fields are unstable.
pub field: T,
pub nothing: (),
}

// This enum has the same representation as `T`.
#[repr(transparent)]
pub enum GenericEnum<T> {
Variant(T, ()),
}

// This is okay even though `()` is a zero-sized type.
mjbshaw marked this conversation as resolved.
Show resolved Hide resolved
pub const THIS_IS_OKAY: GenericUnion<()> = GenericUnion { field: () };
pub const THIS_IS_OKAY_TOO: GenericEnum<()> = GenericEnum::Variant((), ());
```

Transparent `enum`s have the addtional restriction that they require exactly one variant:

```rust
// Error: transparent enum needs exactly one variant, but has 0
#[repr(transparent)]
pub enum TooFewVariants {
}

// Error: transparent enum needs exactly one variant, but has 2
#[repr(transparent)]
pub enum TooManyVariants {
First(usize),
Second(usize),
}
```

# Reference-level explanation
[reference-level-explanation]: #reference-level-explanation

The logic controlling whether a `union` of type `U` may be `#[repr(transparent)]` should match the logic controlling whether a `struct` of type `S` may be `#[repr(transparent)]` (assuming `U` and `S` have the same generic parameters and fields). An `enum` of type `E` may be `#[repr(transparent)]` if it has exactly one variant, and that variant follows all the rules and logic controlling whether a `struct` of type `S` may be `#[repr(transparent)]` (assuming `E` and `S` have the same generic parameters, and `E`'s variant and `S` have the same and fields).

Like transarent `struct`s, a transparent `union` of type `U` and transparent `enum` of type `E` have the same layout, size, and ABI as their single non-ZST field. If they are generic over a type `T`, and all their fields are ZSTs except for exactly one field of type `T`, then they have the same layout and ABI as `T` (even if `T` is a ZST when monomorphized).

Like transparent `struct`s, transparent `union`s and `enum`s are FFI-safe if and only if their underlying representation type is also FFI-safe.

A `union` may not be eligible for the same nonnull-style optimizations that a `struct` or `union` (with the same fields) are eligible for. Adding `#[repr(transparent)]` to `union` does not change this. To give a more concrete example, it is unspecified whether `size_of::<T>()` is equal to `size_of::<Option<T>>()`, where `T` is a `union` (regardless of whether it is transparent). The Rust compiler is free to perform this optimization if possible, but is not required to, and different compiler versions may differ in their application of these optimizations.
mjbshaw marked this conversation as resolved.
Show resolved Hide resolved

# Drawbacks
[drawbacks]: #drawbacks

`#[repr(transparent)]` on a `union` or `enum` is of limited use. There are cases where it is useful, but they're not common and some users might unnecessarily apply `#[repr(transparent)]` to a type in a cargo-cult fashion.

# Rationale and alternatives
[alternatives]: #alternatives

It would be nice to make `MaybeUninit<T>` `#[repr(transparent)]`. This type is a `union`, and thus this RFC is required to allow making it transparent. One example in which a transparent representation would be useful is for unused parameters in an FFI-function:

```rust
#[repr(C)]
struct Context {
// Imagine there a few fields here, defined by an external C library.
}

extern "C" fn log_event(message: core::ptr::NonNull<libc::c_char>,
context: core::mem::MaybeUninit<Context>) {
// Log the message here, but ignore the context since we don't need it.
}

fn main() {
extern "C" {
fn set_log_handler(handler: extern "C" fn(core::ptr::NonNull<libc::c_char>,
Context));
}

// Set the log handler so the external C library can call log_event.
unsafe {
// Transmuting is safe since MaybeUninit<Context> and Context
// have the same ABI.
set_log_handler(core::mem::transmute(log_event as *const ()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that repr(transparent) is not necessary for making the transmute safe.

For example, Option is not repr(transparent) yet we do guarantee that the ABI of Option<fn(...)->...> and Option<&T> match that of C pointers.

In the current implementation the ABI of MaybeUninit matches that of its non-zero-sized variant, so this code does currently work correctly, even though we do not guarantee that it does. An alternative design here would be to guarantee that this code works correctly, just like we do for Option (which is neither repr(transparent) nor repr(C)).

Copy link
Contributor Author

@mjbshaw mjbshaw Mar 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That (guaranteeing/documenting MaybeUninit as being effectively transparent without actually making it repr(transparent)) is what rkruppe suggested, but I disagree with that idea because:

  1. repr(transaprent) is clear, explicit documentation that is also machine readable. Tools like RLS can provide this kind of useful documentation to IDEs. Without this, these tools have to hard-code which types are effectively transparent (or forego the useful information altogether).
  2. I think explicit is better than implicit. repr(transaprent) is an explicit representation and guarantee to users that exists in the code and is part of a type's documentation. Documenting a type as being effectively transparent is still explicit (from some perspectives), but not in the code. I personally prefer representations and guarantees to be explicit (especially when they're not overly verbose, and repr(transparent) doesn't suffer from any kind of problematic verbosity).

Copy link
Contributor

@gnzlbg gnzlbg Mar 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem that implicit transparent solves is removing some unnecessary undefined behavior from the Rust language. In this particular case:

  • there is very likely only one way in which any implementation can lay out these types (*),
  • users are going to exploit their knowledge of this whether we like it or not, and
  • there does not appear to be any value in leaving the layout of these types undefined, e.g., in terms of layout optimizations that doing so would allow.

In my opinion, there is just no good reason to leave this type of behavior as undefined, which is why I think it is going to end up being defined eventually.

That is why the arguments that you provide sound more in favor of explicit transparent than against implicit transparent to me. Implicit transparent solves a different problem that is not addressed by this RFC, it just so happens that solving that problem also tangentially improves on some of the issues identified by this RFC.

Also, it is not an either / or thing: we can have both implicit and explicit transparent here. That might be a bit redundant, but I believe that the problem implicit transparent solves is worth solving, and I also believe that explicit transparent helps on top of implicit transparent.

(*) edit: i'm talking here about unions with only one non-zero-sized field,

}

// We can call it too. And since we don't care about the context and
// we're using MaybeUninit, we don't have to pay any extra cost for
// initializing something that's unused.
log_event(core::ptr::NonNull::new(b"Hello, world!\x00".as_ptr() as *mut _).unwrap(),
core::mem::MaybeUninit::uninitialized());
}
```

It is also useful for consuming pointers to uninitialized memory:

```rust
#[repr(C)]
struct Cryptor {
// Imagine there a few fields here, defined by an external C library.
}

// This function may be called from C (or Rust!), and matches the C
// function signature: bool(Cryptor *cryptor)
pub extern "C" fn init_cryptor(cryptor: &mut core::mem::MaybeUninit<Cryptor>) -> bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because MaybeUninit is behind a pointer, for this code to be correct it would suffice to guarantee that it is layout compatible with Cryptor, which the current implementation satisfies.

The only reason not to guarantee this would be to leave room for layout optimizations that lay the only non-ZST field at a non-zero offset. I don't know how important those layout optimizations are (or whether they make sense at all) cc @eddyb @joshtriplett @petrochenkov

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which the current implementation satisfies

That's an implementation detail that isn't explicitly guaranteed. Perhaps the guarantee was intended but never made explicitly. repr(transparent) will guarantee this, though.

Also, if MaybeUninit is ever made repr(C) (instead of repr(transparent)), then that guarantee no longer holds since C makes no guarantees here (though I don't think any sane implementation would add padding to a union with a single sized field).

Copy link
Contributor

@gnzlbg gnzlbg Mar 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's an implementation detail that isn't explicitly guaranteed.

This was identified in the unsafe code guidelines repo as something that's probably worth guaranteeing. AFAICT nothing is blocking this - somebody would just need to write an RFC for this that works out the details.

Also, if MaybeUninit is ever made repr(C)

Note that MaybeUninit contains a ZST, and C does not support ZSTs, so to make MaybeUninit repr(C) we would need to specify what does it mean for a ZST to be repr(C). As you mention, there aren't many ways of doing that, which is a good thing. If there is only a single sane way of handling ZSTs in repr(C) or C FFI, then that's something that we could guarantee and IMO we should - we'd just need an RFC working out the details here.


My point is not that we don't need repr(transparent) for unions, but rather that there are some cross-cutting concerns between this RFC, and some other problems that we'd also like to solve (repr(C) ZSTs, union fields at offset 0 in some cases, etc.). Sadly, solutions to these other problems have not been explored in RFCs yet, so it is a bit hard to tell how all of these issues fit together.

// Initialize the cryptor and return whether we succeeded
}
```

# Prior art
[prior-art]: #prior-art

See [the discussion on RFC #1758](https://github.com/rust-lang/rfcs/pull/1758) (which introduced `#[repr(transparent)]`) for some discussion on applying the attribute to a `union` or `enum`. A summary of the discussion:

[nagisa_1]: https://github.com/rust-lang/rfcs/pull/1758#discussion_r80436621
> + **[nagisa][nagisa_1]:** "Why not univariant unions and enums?"
> + **nox:** "I tried to be conservative for now given I don't have a use case for univariant unions and enums in FFI context."

[eddyb_1]: https://github.com/rust-lang/rfcs/pull/1758#issuecomment-254872520
> + **[eddyb][eddyb_1]:** "I found another important usecase: for `ManuallyDrop<T>`, to be useful in arrays (i.e. small vector optimizations), it needs to have the same layout as `T` and AFAICT `#[repr(C)]` is not guaranteed to do the right thing"
> + **retep998:** "So we'd need to be able to specify `#[repr(transparent)]` on unions?"
> + **eddyb:** "That's the only way to be sure AFAICT, yes."

[joshtriplett_1]: https://github.com/rust-lang/rfcs/pull/1758#issuecomment-274670231
> + **[joshtriplett][joshtriplett_1]:** "In terms of interactions with other features, I think this needs to specify what happens if you apply it to a union with one field, a union with multiple fields, a struct (tuple or otherwise) with multiple fields, a single-variant enum with one field, an enum struct variant where the enum uses `repr(u32)` or similar. The answer to some of those might be "compile error", but some of them (e.g. the union case) may potentially make sense in some contexts."

[pnkfelix_1]: https://github.com/rust-lang/rfcs/pull/1758#issuecomment-290757356
> + **[pnkfelix][pnkfelix_1]:** "However, I personally do not think we need to expand the scope of the feature. So I am okay with leaving it solely defined on `struct`, and leave `union`/`enum` to a follow-on RFC later. (Much the same with a hypothetical `newtype` feature.)"

In summary, many of the questions regarding `#[repr(transparent)]` on a `union` or `enum` were the same as applying it to a multi-field `struct`. These questions have since been answered, so there should be no problems with applying those same answers to `union` univariant `enum`.

# Unresolved questions
[unresolved]: #unresolved-questions

The role of `#[repr(transparent)]` in nonnull-style optimizations is not entirely clear. Specifically, it is unclear whether the user can rely on these optimizations to be performed when they make a type transparent. [Transparent `union`s somewhat complicate the matter](https://github.com/rust-lang/rfcs/pull/2645#issuecomment-470699497). General concensus seems to be that the compiler is free to decide where and when to perform nonnull-style optimizations on `union`s (regardless of whether or not the `union` is transaprent), and no guarantees are made to the user about when and if those optimizations will be applied. It is still an open question exactly what guarantees (if any) Rust makes about transparent `struct`s (and `enum`s) and nonnull-style optimizations.

This RFC doesn't propose any changes to transparent `struct`s, and so does not strictly depend on this question being resolved. But since this RFC is attempting to round out the `#[repr(transparent)]` feature, it seems reasonable to dedicate some time to attempting to round out the guarantees about `#[repr(transparent)]` on `struct`s.

# Future possibilities
[future-possibilities]: #future-possibilities

If a `union` has multiple non-ZST fields, a future RFC could propose a way to choose the representation of that `union` ([example](https://internals.rust-lang.org/t/pre-rfc-transparent-unions/9441/6)).