-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unions with unsized variants #3041
Comments
The problem is that the meta-data in the fat pointer won't always be valid, i.e. what should this do? union Foo<T: ?Sized> {
value: T,
uninit: (),
}
let foo = Foo::<i32> { uninit: () };
let foo_ref = &foo as &Foo<dyn Any>; // the fat pointer meta-data isn't valid for the data actually stored Also, if you ban accessing the data, it makes it harder to implement something like Custom DSTs. Custom DSTs are important because they would allow |
Yes, and? Afaik, the only parts of the metadata that MUST be valid are size and alignment so that the layout can be computed at runtime. The other parts (drop_in_place and trait methods) cannot be accessed without first accessing the unsized field. Accessing the wrong field is already undefined behavior so that the metadata being invalid does not add any additional unsafety. |
I'm not clear on why CStr can't already be a thin pointer. |
Any map I think previous discussions indicate that truly unsized types should panic whenever accessing their size or alignment, so If I understood, Rust could add some |
that would make |
Please let's stop this academic discussion around CStr. CStr will never be changed simply because it would
People who actually want a thin pointer to null terminated strings have written such a thing themselves years ago and have no need for a thin CStr. Feel free to talk about new forms of metadata in general. |
I don't know how to even implement this. Imagine union Example {
a: str,
b: String,
} Now, given Note that not even enums can have unsized fields currently, and that seems like a strictly simpler problem since for enums we know which variant they are in. So I think we should specify, implement, and gather experience with unsized enums before venturing into the land of unsized unions. |
How do you construct
(Rounded up to the correct alignment.) Automatic unsizing of
I think enums are significantly more complicated in this case. The |
That doesn't matter for my question. No matter how it is constructed, you have to be able compile code like fn example1(e: &Example) -> usize { mem::size_of_val(e) }
fn example2(e: Box<Example>) { drop(e) } in a uniform way, without knowing how the arguments got created. |
So you are suggesting that the metadata is |
We're talking about unions and not enums. The variant does not matter for unsizing and the compiler does not have to know which variant was last written to.
Which part is unclear? Given union X<T> {
t: ManuallyDrop<T>,
v: ManuallyDrop<String>,
}
Accessing the wrong field is simply undefined behavior. E.g. let x: X<u8> = X { v: ManuallyDrop::new(String::new()) };
let x: &X<dyn Clone> = &x;
unsafe {
x.t.clone(); // UB
} (The current rules for sized fields are a bit more relaxed but for unsized fields we can simply make up the new rule that accessing an "inactive" unsized field is unconditionally UB.) |
Btw, since you mentioned that earlier -- note that unsizing-by-
Oh, I see. That makes sense. Basically where This also works for When nesting
There is no such thing as an "active" or "inactive" union field, and I think adding that notion would be a mistake. (It would also be quite complicated.) It also seems unnecessary -- your code is equivalent to reading The one issue that remains is the same as what I mentioned in rust-lang/rust#80158: this proposal relies on determining the size of |
|
The code is careful to spell out that it is valid only when the "unsized tail" of |
The code I was thinking of is If I understand you correctly, you're saying that The only other unsized types I'm aware of are extern types and I'm not aware of any way to unsize any type to an extern type. Therefore this statement would be trivially true in the current language and also hold for unions. But even if there were special logic (maybe some marker trait?) that allowed |
That code is obviously wrong since you are using But I didn't know/remember that |
I should not have added that unsound unsafe block because now it's not clear if you meant that calling the safe function as_prt is wrong in itself or that dereferencing that pointer is wrong. The following gist without that block is sufficient to demonstrate the issue: https://play.rust-lang.org/?version=beta&mode=debug&edition=2018&gist=be21e33e1d74345e13d90947147617c0 |
Yep, that me. Though to be clear, this was just
This is... actually true, as you've demonstrated. I guess I need to reword the safety docs around that, and may be to blame for limitations on unsized values in the future 😅 The important part and what I argued at stabilization has to do with validity (as in validity invariant) of the pointer metadata. Even in the unsize coercion of a dangling This is the case even without my stabilization, though:
Here we drop a |
Great example. Of course, RcBox could in theory store the size and alignment of its data proactively which would make the metadata calls unnecessary. But then again so could Box. |
Ah, because
There is a potential fix here by requiring that for custom DSTs, determining the size must be possible even after the drop glue ran (but before the underlying memory got deallocated). However, that still sounds quite awful and would seriously restrict which custom DSTs are possible.
That is another interesting idea... custom DSTs could be required to provide a function that extracts all the data now such that, later, this data plus the ptr metadata is enough to reconstruct size and alignment. This extra data would be However... Wouldn't this mean |
To be a bit more concrete, the case I am thinking about is something like union X<T> {
t: ManuallyDrop<T>,
v: ManuallyDrop<String>,
}
type Problem = X<ThinDst>;
IOW, The discussion above, as I understand it, is about figuring out to what extend |
Yes.
I do not think that is possible. https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=f6532229a7706836246360fb1af18db3 Code like
I think there are two ideas here.
By that comment I meant that, if we are willing to pay a blanked 16 byte overhead for all
If size+alignment are always calculated by container constructors which work only with sized types by design, then there is no problem because the vtable is not needed for size calculations. If size+alignment calculations use the "access after drop/before deallocate" idea you suggested, then |
Note: that code as written is unsound because zeroing references/other nonnull pointers is undefined behavior. The fix is to use
Because the |
I disagree. Invalidating the state of ManuallyDrop is a supported operation as long as the ManuallyDrop is not accessed afterwards. See ManuallyDrop::drop. Overwriting it is no different since it's not accessed afterwards.
It's not about dropping the contents. It's about calculating the size and alignment of the contents. |
From the
It's unsound to zero Also see: rust-lang/unsafe-code-guidelines#245
Right, I misunderstood |
Is there some good reason for
It's clear both 1 and 2 are non-zero cost for almost all use cases, for what amounts to an extraordinarily niche feature. Instead metadata should be specified explicitly by pointer types, so either make traits specify their metadata perhaps via In this, one could not ever convert a As an aside, if one wants more complex metadata then really types like |
Yes it looks like I was wrong. I'll see if I can come up with another example. |
I understand. What I am trying to figure out is which protocol |
That particular statement was not intended to show that there is already a problem with DSTs. (I think that's what you mean by your question which I didn't understand the first time.) |
Noting this here before I forget: even without my additional pub fn main() {
let weak: Weak<[u8]> = Weak::<[u8; 32]>::new();
drop(weak); // <-- HERE
} I think this means that my stabilization didn't add any new requirements to the impl, as they already existed in EDIT: wait no dangling weak don't have to deallocate anything what was I thinking whoops |
See rust-lang/rust#80407 for follow-up on the dangling unsized |
Right, but in a world with your unsized unions and custom thin DST, |
I'm sorry but it looks like I still don't understand your question. Are you asking where the unsoundness with Box+thin Dst+unsized unions is? You've already spelled it out in this issue. Are you asking where an existing unsoundness with Box+thin Dst is? Like I said, this particular sentence had nothing to with showing that there is existing unsoundness. That sentence was about showing a way in which Box+thin Dst+unsized unions could be made sound. |
Yes.
I've spelled out what goes wrong, operationally. But I'd like to understand better what happens. Since something goes wrong, clearly a proof of soundness for this system will fail. What I haven't understood yet is where it will fail. It's one thing to say "having So here's my rambling thoughts on this, typing them out pretty much as they come to my mind:
Unsized unions essentially mean we have to remove So if we were to add unsized unions, then we'd have to accept that |
You make a compelling point. We want
What about stopping the problem at /// - `[T; N]` is `Unsize<[T]>`
/// - `T` is `Unsize<dyn Trait>` when `T: Trait`
/// - `Foo<..., T, ...>` is `Unsize<Foo<..., U, ...>>` if:
/// - `T: Unsize<U>`
/// - Foo is a struct
/// - Only the last field of `Foo` has a type involving `T`
/// - `T` is not part of the type of any other fields
/// - `Bar<T>: Unsize<Bar<U>>`, if the last field of `Foo` has type `Bar<T>` All implementations of
The last point being the difference to struct. This property holds for traits and slices and custom DSTs could opt into this property by implementing some unsafe trait. Note that we would allow union U<T: ?Sized> {
t: ManuallyDrop<T>,
u: (),
} to compile even though the behavior would be undefined if you could call |
So basically, "fat DST" would work in unions but thin DST would not, by controlling which unsizing coercions the compiler permits? Yeah, something like this could work. There might be semver concerns here since now it matters whether an unsized type parameter is used below a union or not -- but maybe that is acceptable. |
I believe it would be technically feasible (easy, even) to have unions with at most one unsized variant.
Unsizing should be safe:
T -> dyn U
: The vtable contains 2 values relevant for the unsized union:Note that
drop_in_place
is not relevant because unions do not drop their contents.[T; N] -> [T]
: The metadata consists of the array length which is statically known and does not require accessing the variant.@eddyb: In the last paragraph here you wrote that there is a problem with unsized unions but I haven't been able to figure it out. Maybe you were talking about unions with more than one unsized variant?
The text was updated successfully, but these errors were encountered: