Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

repr(tag = ...) for type aliases #3659

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
111 changes: 111 additions & 0 deletions text/0000-repr-type-aliases.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
- Feature Name: `repr_type_aliases`
- Start Date: 2024-06-14
- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000)
- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000)

# Summary
[summary]: #summary

Primitive representations on enums now accept type aliases, meaning that in addition to primitives like `#[repr(u32)]`, `#[repr(discriminant = core::ffi::c_int)]` and `#[repr(discriminant = my_type)]` are now accepted.

# Motivation
[motivation]: #motivation

For the same reasons why type aliases are useful, having type aliases in `repr` attributes would also be useful. A few examples:

* Types depend on an external API whose exact size may be uncertain. (e.g. `core::ffi::c_int`, `gl::types::GLsizei`)
* An internal API might want to be able to easily change a type later.
* The intent behind a type alias may be clearer than simply using the primitive directly.

# Guide-level explanation
[guide-level-explanation]: #guide-level-explanation

Enums allow `#[repr(discriminant = type)]` attributes to offer an explicit discriminant type. (`type` can be any primitive integer type, like `u8`, `i32`, or `usize`, but not `char`.) If all variants of the enum are unit variants, this means that the enum will be easily castable to `type` using `as`. Otherwise, the discriminant will still be of the specified type, but unsafe code is required to actually access it. (Future RFCs like [#3607] might change this.)

[#3607]: https://github.com/rust-lang/rfcs/pull/3607

To ensure compatibility, the `#[repr(discriminant = type)]` form is required if the type is a type alias instead of a known primitive type. While this may cause additional issues compared to the native `#[repr(type)]` syntax, it is possible to work around these issues as explained below:

> Since `type` is resolved as a path instead of being a hard-coded value, it's possible to shadow existing primitive types and cause weird results. For example, if a scope contains `type u32 = u8`, then `#[repr(discriminant = u32)` means `#[repr(u8)]`, not `#[repr(u32)]`. You can always refer to these types directly, however, and `#[repr(discriminant = ::std::u32)]` will always mean `#[repr(u32)]`.
>
> It's also worth noting that this syntax allows using type aliases with names of other `repr` arguments, like `C`. For example, `type C = u8` in the scope means that `#[repr(discriminant = C)]` means `#[repr(u8)]`, not `#[repr(C)]`.

You can use any type alias in the `repr` attribute, but it *must* be an alias to an accepted primitive type like `u8` or `i32`, and cannot be a pointer, reference, struct, etc. See the [future possibilities] section for some potential alternatives that may be allowed in the future.

[future possibilities]: #Future_possibilities

# Reference-level explanation
[reference-level-explanation]: #reference-level-explanation

The `repr` attribute now accepts a `discriminant = ...` argument to indicate an arbitrary type instead of a well-known primitive type. It accepts both fully-resolved types (including all necessary generics) as well as macros that expand to types.

If the resulting type resolves to a type alias to a valid primitive type which can be used in the `repr` attribute, that will be used as the actual discriminant representation. Otherwise, the result is an error which states that only primitive types and type aliases to them are allowed.

An automatically-applicable lint should be added that warns a user if a `repr` argument is invalid but references in an-scope type alias without the `discriminant = ` prefix. (For example, `#[repr(MyType)]` becomes `#[repr(discriminant = MyType)]`, but `#[repr(C)]` does not warn even if an alias `C` is in scope.)

To aid macro authors, an allow-by-default lint should be added that warns a user if they use `discriminant = ...` with a relative path instead of an absolute one, since the resolved type depends on the existing scope. For example, `discriminant = u32` or `discriminant = my_type` could potentially be shadowed by another type in the scope, but `discriminant = ::std::u32` and `discriminant = $crate::my_type` will always work as expected. This could potentially be relegated to a `clippy::pedantic` lint instead of a `rustc` lint.

# Drawbacks
clarfonthey marked this conversation as resolved.
Show resolved Hide resolved
[drawbacks]: #drawbacks
clarfonthey marked this conversation as resolved.
Show resolved Hide resolved

The requirement for `discriminant =` is unfortunate, but it feels like the best way to ensure that adding new representations isn't a breaking change going forward. Even if we were to decide it weren't a "breaking change," it would still break things anyway, being de-facto breaking. Some people have expressed desire for a dedicated syntax instead, which would fix this problem, but this can still be added after the fact.

Some proc macros like [`zerocopy::FromBytes`] may also have to complicate their logic if they depend on checking the existing `#[repr]` attribute for validity. In particular, since they can no longer exactly know which primitive type is being used for the representation, they would have to instead depend on associated constants on the types like `BITS` to verify which type is being used. Instead of being able to emit an error at macro expansion time, this error will have to be triggered at constant evaluation time or runtime instead, which is unfortunate.

Complicating proc macros in this way is considered acceptable for the following reasons:

* Since errors can still be included in constants now that `const` panics are stable, errors can still be caught at compile time.
* Users of special `repr` attributes are more likely to have more knowledge about specifics of how these proc macros work, and are thus more equipped to deal with weird errors like these.
* Ultimately, the benefits of adding the feature outweigh this negative.

[`zerocopy::FromBytes`]: https://docs.rs/zerocopy/latest/zerocopy/derive.FromBytes.html

# Rationale and alternatives
[rationale-and-alternatives]: #rationale-and-alternatives

We could always not do this.

But more realistically, here are some alternative designs that were rejected.

## `type`

The second proposal for this RFC used `type = ...` instead of `discriminant = ...`. Initially, this decision was chosen because `discriminant` was long to type, but it was ultimately decided that `discriminant` is more clear and that the extra typing is worth it. Additionally, RFC [#3607] proposes using `discriminant` in its proposed syntax, so, this would be further in line with that as well.
Copy link
Member

@jswrenn jswrenn Jul 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we intend for repr(discriminant = u8) to be equivalent to repr(u8), then I'd like to propose that we perhaps shouldn't use repr(discriminant = ...) since it might complicate a future possibility for enums.

That future possibility — alluded to in this RFC — is that we someday might want to expand the accepted set of discriminant types of enums (e.g., to include things implementing StructuralEq). And, if we had such functionality, it might also be nice to declare up front what the discriminant type is. A natural syntax for this could be #[repr(discriminant = ...)].

However, repr(discriminant = ...), as proposed by this RFC, does not only set the discriminant type of an enum, but also affects its memory layout. This is not desirable for all enums. If this RFC is adopted as-is, we'd need to find an alternative way to say "the discriminant type of this enum is X, but I don't care what the enum's in-memory representation is".

Given this, I'd like to suggest that this RFC consider repr(tag = ...). Doing so has three advantages:

  1. it makes it clearer that the annotation has an affect on the enum's in-memory representation, not just its logical representation
  2. it keeps repr(discriminant = ...) syntactically reserved for the aforementioned future possibility in which the user explicitly provides the type of the discriminant
  3. it's significantly shorter than "discriminant"

If not repr(tag = ...) than perhaps something else.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused how labeling this a tag makes it less confusing. It's a shorter word, but that's about it.

We've already established that we're changing the memory layout by using repr.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering that these are often called "tagged unions" in contexts where memory layout matters, I do think the name is clearer. It's also going to be consistent with other in-flight changes to Rust's documentation: rust-lang/reference#1454 (comment)

What are the advantages of discriminant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are often called this, but we also explicitly use the term discriminant in places where it has an API meaning. See: https://doc.rust-lang.org/nightly/std/mem/fn.discriminant.html

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the lang meeting today, we decided on tag. We felt that the existing mem::discriminant API was unfortunate, and unloved in enough other ways, that we didn't need to weigh that heavily as precedent. See also here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very fair. I think that it would be worth proposing to deprecate & rename that API if this is the route we're going, to ensure uniformity.

Copy link
Contributor Author

@clarfonthey clarfonthey Jul 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Also, while the decision on the rename is resolved, I'm going to unresolve this until I update the RFC to rename the attribute, just so it's easier to track. Plus, it would be helpful to clarify all the finer details on how they should be written into the RFC before this gets marked as resolved.)


## `self::`

We could, instead of using `type =`, require that all types contain a double-colon to indicate they're a path, effectively preventing collisions with arguments that aren't paths. This would require using `self::` for types that are imported in the local scope, and was actually the first proposal of this RFC, but wasn't very well-received.
clarfonthey marked this conversation as resolved.
Show resolved Hide resolved

## Shadowing attributes

Until a future edition, the current set of valid representations could be solidified as taking precedence over any shadowed identifiers. For example, if someone defines `type transparent = u32`, then `repr(transparent)` still means `repr(transparent)` and not `repr(u32)`.

In future editions, we could either:

* Let type aliases shadow all valid representations. This isn't ideal since there is no way to override the shadowing besides nesting your code in a new module and then re-exporting it outside that module, which is very messy.
* Expand the list of unshadowable representations every edition where necessary.

# Prior art
[prior-art]: #prior-art

This was previously suggested in [#1605] and was rejected at the time. The primary reasoning at the time was due to the lack of advanced syntax in attributes which were able to support this feature, which is no longer the case; the compiler has improved a lot in 8 years!

[#1605]: https://github.com/rust-lang/rfcs/pull/1605

# Unresolved questions
[unresolved-questions]: #unresolved-questions

None currently.

# Future possibilities
[future-possibilities]: #future-possibilities

In the future, more potential discriminant types could be added which might be useful:

* `repr(transparent)` structs over valid types
* Floating-point primitives
* `char` or other restricted primitives like `NonZeroU*` and `NonZeroI*`
* Arbitrary `StructuralEq` types

Of course, all of these would require larger changes to the compiler than the ones proposed, and are currently left as future extensions.

[#3607]: https://github.com/rust-lang/rfcs/pull/3607