-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validity of unions #73
Comments
My personal preference is to allow any bit pattern, mostly to keep things simple. Unions are already complex enough, and they only occur in unsafe code, we should make this as simple to use for the programmer as we can. If we ever desparately need to layout-optimize |
I feel like @joshtriplett and @cramertj have both, in the past, expressed strong opinions about this. I think I agree regarding allowing any bit pattern. That said, I can definitely imagine wanting to be able to create a |
It seems like this would be true only if the Rust compiler decided to lay the fields out at offset zero, right? Personally, I sort of think we should just guarantee that the Rust compiler will do so. Particularly if we decide that unions are an opaque "bag of bits" from the perspective of the compiler, what is the motivation for the compiler to add extra padding into that bag? (The same applies to your second example.) |
Yes, I've been assuming that to be the base.
Yeah I remember that as well. Also @petrochenkov expressed the opposite opinion, namely that unions should have a non-trivial invariant.
I can imagine wanting to do many things. :) But I feel such needs are better served by an opt-in attribute, than enabled per default. |
Yep, I would prefer this for non- |
Could you spell out what you mean by "this"? |
I also think that we should guarantee this, but @joshtriplett mentioned some reasons about why we might not want to do that in the discussion about the layout of unions (#13 (comment)). It's unclear to me whether that interchange achieved some consensus, but maybe we should open a different issue to discuss whether we might want to guarantee this particular thing ? That would need amending the layout of unions in the repo. EDIT: for |
That question can't really be separated from what we're discussing here: these kinds of layout optimizations are only possible if unions have a non-trivial validity invariant relating to the validity invariants of the fields. If we achieve consensus here that unions are just bags of bits, then it should be uncontroversial that there's no reason to place union fields at nonzero offsets. Conversely, the desire for layout optimizations of unions is a reason to want some non-trivial validity invariant for unions. I believe that's why nothing was settled during the previous discussion of union layout. |
Yeah-- I'd like the opportunity to optimize based on known-invalid bitpatterns for |
@cramertj So what do you think about code like this, which violates the principle that at any time, some variant of the union is valid? union Mix {
f1: (bool, u8),
f2: (u8, bool),
}
let m = Mix { f1: (false, 3) };
m.f2.0 = 3; From what I recall, this is something we explicitly want to support. It is also rather hard to argue that this is UB because it never actually performs an operation that "sees" a bad value. Given that unions are basically sugar for transmutes, I am really worried about automatically assuming that any of their bit patterns are invalid. |
oh wow what a mess, good point! I'd still argue that the niches left over from overlapping all variants (e.g. |
I thought I had written down what the validity invariant could be to justify layout optimizations such as what @cramertj is asking for, but it seems not here... it would be something like: Bit/byte i of the union is allowed to have value v iff there is a variant of the union such that bit/byte i of the variant is allowed to have value v. We assume all variants to be "filled up" to the same size with padding, which may have any value. Whether it should be bits or bytes is unclear, as are a few other things. For example, this kind of implicitly assumes that validity talks only about the bits, but it might also talk about contents of the memory and then things become even more messy. Also this is very hard to check for in an implementation of our dynamic semantics. One thing that everyone seems to agree on though (including the above definition) is that if the union has a field of size 0 (such as is the case for |
This property certainly is necessary for the current implementation of // 100% safe code!
let opt = Some(MaybeUninit::<NonZeroU8>::zeroed());
assert!(opt.is_some(),
"Can be false if union with () is not a layout optimization barrier"); |
IIUC, something that follows/is required from/by @RalfJung's definition, is that for any layout-optimization to be possible:
That is, the
This appears to be forward-compatible with all other guarantees I can imagine. If all agree, maybe we can put this part already in wording, so that we can just focus on considering the validity of unions without zero-sized fields afterwards. |
If we wanted to enable the layout optimizations mentioned by @cramertj , we could do that by extending this:
with:
This second point would break #73 (comment) , but without it, layout optimizations do not appear possible to me. |
That is equivalent to what I said.
Now I am confused. If we require this, the complicated definition I proposed is not necessary. But also this rules out the use case for |
But you said at least one field must be valid? Or is that not what you meant? Or did you just intend to state a consequence of that definition? Sure, just by virtue of |
@RalfJung My point is that if we require an Maybe there is an even more complex definition than yours that would allow optimizing I've the feeling that automatic layout optimizations for |
It seems like this is a question that is not that obvious. For instance, having union Mix2 {
f1: (NonNullU8, u8),
f2: (u8, NonNullU8),
} it would be nice if we could deduce that Back to our "Schrödinger union pattern",
the question is whether we want the above example to require the the union be defined with a zero-sized field for the code not to be UB, so that more layout optimizations are possible, or if such benefit does not outweight the dangers of a potentially easy to miss UB scenario. |
Please read the first post of this thread. This is the same as my
No, that's impossible. If we want to accept the code in my OP, then that type cannot have a niche. |
Addressing that very first post was the whole point of my post: I was just pointing out that accepting |
Oh I see. So you are basically saying what I said in the OP but framing it differently. Fair :) I don't think the ship has sailed in the sense that we have an explicit RFC-based consensus, but I do feel that many people would consider this one footgun too much for already tricky union code. So far nobody objected when I argued we should support that code. Personally I think any layout optimizations around union are one footgun too much. ;) I think we should instead give the user the option to explicitly declare a niche on their type, if they want to get layout optimizations. Basically, I am arguing for explicit being better than implicit here. |
I agree. Having to opt-out (by adding a zero-sized field) of a layout optimization leading to "easy" UB doesn't seem like a great idea. Could there be a |
Coming back to this with fresh eyes, of my two "for completeness" options above, the first involves extra complexity and I can't see any benefits to it. The second does technically provide some benefits (niches) but is much more complex. I think I see three reasonable paths forward:
To pick between these, my opinions are:
So therefore I prefer the third option for now. But it's a relatively weak opinion. |
I don't think what you have described is necessarily a different That is, my preference is for |
This discussion seems to go that direction as well, but I noticed in rust-lang/rust#113344 (comment) that this would be a usecase for having unions not be unconditionally marked as "may be uninit" (either by some "no bytes are padding" check, or with a different repr). |
More like, sometimes having them marked as "never uninit". "may be uninit" is the natural default state of bytes in memory, work has to be done to get anything else. ;) I wouldn't say that is where the discussion seems to go? @CAD97 made a proposal above but that makes heavy use of More realistically, we'd end up with a situation where the type of a union is described by a list of constraints on each byte, and we leave unspecified how exactly rustc computes that list of constraints. enum Type {
...
Union {
fields: Fields,
bytes: List<UnionByte>,
}
}
enum UnionByte {
/// This byte may be anything, even uninit.
Any,
/// This byte must be initialized.
Init,
/// This byte is padding, and not preserved on typed copies.
Padding,
} |
Naïvely, I've always liked the "validity for unions is that at least one field must be valid", since it has the straight-forward But I guess if |
That is not an option, since it can be violated in safe code. An example is literally in the OP of this thread. :) |
Is there even a meaningful rule aside from "Yes", tbh? |
Are you tracking (in minirust repo or elsewhere) the constraints that must be satisfied by a compiler-produced layout scheme? Ideally in executable style similar to minirust itself. Stuff like "field offsets must be nonoverlapping and contained in the type layout" would go there, in addition to "a field value type must ensure that all its bytes are consistent with the UnionBytes overlapping it" for this new scheme. |
"Yes" to what?
I don't think there is a way to make that executable. "Can the n-th byte of representations of this type be uninit" is not a question that can be answered operationally (except for exhaustive enumeration). Specifying the set of legal choices for layout will be part of the Rust-to-MiniRust lowering (similar to how the existing Rust-to-MiniRust translator computes the "chunks"). Field offsets can of course overlap for unions, the only constraint is that the field fits in the union size. |
I'm not sure this is true. Given a grammar of type-defining layouts, I would expect it to be fairly easy to recursively determine whether the nth byte can have uninit as a value. It's not like we have to worry about arbitrary types with arbitrary safety invariants here, only the things the compiler supports: enums, unions, and structs around primitive types with specified validity invariants. That is, any conforming compiler is required to build layouts out of some basic building blocks provided by the spec (like "struct with this layout" or "enum with this discrimination tree to read the discriminant"), and for all of those building blocks evaluating their properties should be straightforward. |
Sure, we can write such an algorithm as part of the Rust-to-MiniRust lowering. As I said, that's what we already do for constructing the "chunks". But in MiniRust, actually checking whether the type representation for a given type satisfies this property is not possible. I thought that's what you were asking. |
Let me put it this way: rustc picks some implementation-defined strategy, maybe including randomness or the phase of the moon, to determine for each type what its layout is. This layout is described by a grammar, which is already written down in minirust - this is We already have Is there something about |
We have Sure, we could add some more syntactic structure for this. I don't see a good reason to do that though since MiniRust doesn't care. This is entirely on the frontend. The Rust-to-MiniRust translation will specify the set of possible layouts to choose for any given union, and that is where we will need such an analysis -- but not inside MiniRust itself. |
Yes, the bytes are valid. Or, the validity predicate for a union type is I can't see any rule that could assign a validity invariant that doesn't involve a massive Decision tree. And even then, only something trivial like: pub union Foo{
x: NonZeroU32,
y: (NonZeroU16, NonZeroU16)
} would actually get assigned any invalid bit patterns. At the most, we could limit |
@RalfJung I don't have any concrete issue. It's just that it's a scalar pair, so today it's passed as |
Oh, wait, it's worse than I thought -- if I used a |
That led to me doing some more experimentation and discovering more ways For the specific case of (Apologies if this was asserted upthread; I don't recall it being directly stated, at least not so plainly.) |
My hope was always that we'd eventually get the ability to annotate types to add the attributes we want, instead of somehow having to preserve them through a union. (E.g. you'd probably also have situations where you would want to preserve the nonnull.)
A
Well, maybe it would, it's not like LLVM specifies this. Doesn't LLVM itself even sometimes turn ptr loads into i64 loads? The much-discussed "byte" type in LLVM would fix this. So we could also document this as a known issue (unlikely to cause trouble in practice) with the LLVM backend caused by an LLVM limitation, and hope LLVM can one day give us a clear answer for how to express "a type of a given fixed size that gets passed in registers and can carry provenance". I think currently they would say i64 is the type to use for that. |
I must have misunderstood your point here, because I can't make a two-field transparent union:
|
Ah, interesting. |
I don't think that would really help me in #113344, though. The union would be convenient -- and way less MIR -- if instead of let $len = unsafe { &mut *ptr::addr_of_mut!($this.end_or_len).cast::<usize>() }; let $end = unsafe { &mut *ptr::addr_of_mut!($this.end_or_len).cast::<NonNull<T>>() }; I could just write let $len = unsafe { &mut $this.end_or_len.len }; let $end = unsafe { &mut $this.end_or_len.ptr }; But sticking a pair into a transparent union doesn't make things any easier than just having the I guess what this boils down to is that I want an "unsafe enum", not really a union. (Not that I know what to do with that observation.) |
Ah sorry, what I said makes no sense. I guess this is a layout computation thing... a union where all fields are scalar of the same size, could be passed around as a scalar. |
Discussing the validity invariant of unions.
One possible choice here is "none, any bit pattern is allowed no matter which types the fields have, and including uninitialized bits".
We could also decide that e.g. a
must start with the first byte being either the bit-pattern of
false
or the bit-pattern oftrue
, because all fields agree on that invariant.Notice that we cannot require the union to be valid for some field: for a union like
we want to allow a bit pattern like
0x3 0x3
, which can occur from code likeThere is no demonstrated benefit from disallowing such code, and this kind of code seems perfectly reasonable around unions.
Given that, any validity invariant that wants to restrict the set of allowed bit patterns will be rather complicated. However, such an invariant would enable us to e.g. layout-optimize
Option<Foo>
, whereas the "anything goes"-invariant would prohibit any kind of layout optimization around unions.The text was updated successfully, but these errors were encountered: