Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unions interacting with Enum layout optimization #36394

Open
Mark-Simulacrum opened this issue Sep 11, 2016 · 16 comments
Open

Unions interacting with Enum layout optimization #36394

Mark-Simulacrum opened this issue Sep 11, 2016 · 16 comments
Labels
A-codegen Area: Code generation A-enum 🤓 akshually it's an "ADT" or "algebraic data type" F-untagged_unions `#![feature(untagged_unions)]` T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-lang Relevant to the language team, which will review and decide on the PR/issue.

Comments

@Mark-Simulacrum
Copy link
Member

Mark-Simulacrum commented Sep 11, 2016

IRC (#rust) short discussion log, since I don't actually know much about this:

19:30 Amanieu: bluss: I had a look at nodrop-union, shouldn't you add another field to union with type () to inhibit the enum layout optimization?
19:31 Amanieu: admittedly the exact interactions between enums and unions is still somewhat poorly specified
19:31 bluss: Amanieu: good question. As it is now, that optimization is not used for unions
19:31 bluss: Amanieu: but I haven't checked the RFC discussions to see how it's going to be
19:32 Amanieu: I don't think this point was covered in the discussions
19:33 Amanieu: well, someone should probably open an issue about it
19:33 Amanieu: it's 2am here so I'll let someone else do that :P
19:34 bluss: it's 3am here, so I'm passing it on

cc @Amanieu @bluss

@Amanieu
Copy link
Member

Amanieu commented Sep 11, 2016

Some specific examples:

union A {
    x: Box<i32>,
    y: Box<i64>,
}

union B {
    x: Box<i32>,
    y: Box<i64>,
    z: (),
}

The question here is essentially, what is the size of Option<A>? Should the enum layout optimization apply, in which case it is just 1 pointer length? Or should it act like Option<B>, where the () inhibits this optimization and forces the use of a separate enum tag. The key point here is that the optimization requires all union variants to be NonZero, which is not the case for ().

@bluss
Copy link
Member

bluss commented Sep 11, 2016

Previous discussion (11 April - 28 May) rust-lang/rfcs#1444 (comment)

@bluss
Copy link
Member

bluss commented Sep 11, 2016

union B could use the layout optimization just as well as union A, since B's () member has padding bytes where the discriminant would want to be placed.

@nagisa
Copy link
Member

nagisa commented Sep 11, 2016

Layout of non-#[repr(C)] unions is unspecified, just like the layout of any other ADT. If you want any layout guarantees, I’d recommend sticking to #[repr(C)] for now.

@bluss
Copy link
Member

bluss commented Sep 11, 2016

This is another open question for the tracking issue: #32836

@bluss
Copy link
Member

bluss commented Oct 8, 2016

@nagisa While repr(C) fixes the representation of a type Foo, it does not say anything (that I know to be documented) about the representation of None::<Foo>, i.e types composed of Foo.

@joshtriplett
Copy link
Member

joshtriplett commented Jan 2, 2017

@Amanieu unions don't have a discriminant, though, so all three unions should have the size of a pointer.

Adding a () to a union should never add any size to the union, because it adds zero bits of information. You can always interpret a union as any type you want, and you need 0 bits to identify the 1 possible value of type ().

I don't think the enum layout optimization needs to apply here at all; the size of a union should always match the maximum size of any field type.

@nagisa
Copy link
Member

nagisa commented Feb 2, 2017

While repr(C) fixes the representation of a type Foo, it does not say anything (that I know to be documented) about the representation of None::<Foo>, i.e types composed of Foo.

Why do you care about representation of Option<T>? It is repr(Rust) and therefore unpecified.

@Mark-Simulacrum
Copy link
Member Author

I think that this discussion has come to the conclusion that unions interact "as all other things" with enum layout optimization, and I'm going to close.

@SimonSapin
Copy link
Contributor

I think this issue should be re-opened.

It is originally about the nodrop-union crate, which is pretty much the same as std::mem::ManuallyDrop. The problem is that both of these types:

  • Are generic. They may or may not contain something that implements Drop. They may or may not contain NonZero<_> (or something else that might qualify for future enum layout optimizations).
  • One of their use case is to be used with uninitialized data, for example in arrayvec.

They need not only a way to inhibit automatic drop glue (which union is already guaranteed to do) but also stop enum layout optimization “from the outside” from peeking “inside” them (which union happens to do in the current implementation, but maintaining that doesn’t seem agreed-upon).

Concretely, ManuallyDrop needs to be written in some way (whether that’s #[repr(C)], adding a () variant, or something else) such that this code is guaranteed not to read uninitialized memory:

Some(ManuallyDrop::new(uninitialized::<[String; 100]>())).is_some()

More generally, we need to have principles for what to do to use std::mem::uninitialized safely in a generic context. std::mem::ManuallyDrop is probably part of that story.

@Mark-Simulacrum
Copy link
Member Author

Hm, okay. I'll reopen; this was (or is? Was it merged?) an unresolved question in the RFC.

@SimonSapin
Copy link
Contributor

Could you point to a specific part of the RFC? I can’t find it in rust-lang/rfcs#1897.

@Mark-Simulacrum
Copy link
Member Author

I don't think the original union RFC mentioned it (RFC 1444), but there was some discussion in the thread itself: rust-lang/rfcs#1444 (comment) and rust-lang/rfcs#1444 (comment) (but also just search from "enum layout")

@SimonSapin
Copy link
Contributor

Ah, found rust-lang/rfcs#1444 (comment) which concludes that adding a () field/variant inhibits enum layout optimization. But then #36394 (comment) says the opposite.

@Mark-Simulacrum Mark-Simulacrum added the T-lang Relevant to the language team, which will review and decide on the PR/issue. label Jun 23, 2017
@Mark-Simulacrum Mark-Simulacrum added the C-bug Category: This is a bug. label Jul 26, 2017
@petertodd
Copy link
Contributor

Ah, found rust-lang/rfcs#1444 (comment) which concludes that adding a () field/variant inhibits enum layout optimization. But then #36394 (comment) says the opposite.

Just an update: at present MaybeUninit<T> depends on () inhibiting enum layout optimization to make constructs like the following safe:

assert!(Some(MaybeUninit::<&u8>::zeroed()).is_some());

@RalfJung
Copy link
Member

Also see rust-lang/unsafe-code-guidelines#73: discussion of the invariant that a union has to satisfy at all times. This directly informs which layout optimizations are possible.

Centril added a commit to Centril/rust that referenced this issue May 19, 2019
…r=nikomatsakis,Centril

Test interaction of unions with non-zero/niche-filling optimization

Notably this nails down part of the behavior that MaybeUninit assumes, e.g. that a Option<MaybeUninit<&u8>> does not take advantage of non-zero optimization, and thus is a safe construct.

It also verifies the status quo: that even unions that could theoretically take advantage of niches don't. (relevant: rust-lang#36394)
@Centril Centril added A-codegen Area: Code generation T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. F-untagged_unions `#![feature(untagged_unions)]` and removed C-bug Category: This is a bug. labels Oct 21, 2019
@workingjubilee workingjubilee added the A-enum 🤓 akshually it's an "ADT" or "algebraic data type" label Nov 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-codegen Area: Code generation A-enum 🤓 akshually it's an "ADT" or "algebraic data type" F-untagged_unions `#![feature(untagged_unions)]` T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-lang Relevant to the language team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

10 participants