From 7e595329c31ae66607cdf09fe0a2a030bab4b13c Mon Sep 17 00:00:00 2001 From: Daniel Henry-Mantilla Date: Wed, 25 May 2022 14:01:03 +0200 Subject: [PATCH] Update the now stale warning about `PhantomData` and dropck --- src/phantom-data.md | 177 +++++++++++++++++++++++++++++++++++++++----- 1 file changed, 160 insertions(+), 17 deletions(-) diff --git a/src/phantom-data.md b/src/phantom-data.md index 726f7ba0..f3218fd8 100644 --- a/src/phantom-data.md +++ b/src/phantom-data.md @@ -42,43 +42,186 @@ struct Iter<'a, T: 'a> { and that's it. The lifetime will be bounded, and your iterator will be covariant over `'a` and `T`. Everything Just Works. -Another important example is Vec, which is (approximately) defined as follows: +## Generic parameters and drop-checking + +In the past, there used to be another thing to take into consideration. + +This very documentation used to say: + +> Another important example is Vec, which is (approximately) defined as follows: +> +> ```rust +> struct Vec { +> data: *const T, // *const for variance! +> len: usize, +> cap: usize, +> } +> ``` +> +> Unlike the previous example, it *appears* that everything is exactly as we +> want. Every generic argument to Vec shows up in at least one field. +> Good to go! +> +> Nope. +> +> The drop checker will generously determine that `Vec` does not own any values +> of type T. This will in turn make it conclude that it doesn't need to worry +> about Vec dropping any T's in its destructor for determining drop check +> soundness. This will in turn allow people to create unsoundness using +> Vec's destructor. +> +> In order to tell the drop checker that we *do* own values of type T, and +> therefore may drop some T's when *we* drop, we must add an extra `PhantomData` +> saying exactly that: +> +> ```rust +> use std::marker; +> +> struct Vec { +> data: *const T, // *const for variance! +> len: usize, +> cap: usize, +> _owns_T: marker::PhantomData, +> } +> ``` + +But ever since [RFC 1238](https://rust-lang.github.io/rfcs/1238-nonparametric-dropck.html), +**this is no longer true nor necessary**. + +If you were to write: ```rust struct Vec { - data: *const T, // *const for variance! + data: *const T, // `*const` for variance! len: usize, cap: usize, } + +# #[cfg(any())] +impl Drop for Vec { /* … */ } ``` -Unlike the previous example, it *appears* that everything is exactly as we -want. Every generic argument to Vec shows up in at least one field. -Good to go! +then the existence of that `impl Drop for Vec` makes it so Rust will consider +that that `Vec` _owns_ values of type `T` (more precisely: may use values of type `T` +in its `Drop` implementation), and Rust will thus not allow them to _dangle_ should a +`Vec` be dropped. + +**Adding an extra `_owns_T: PhantomData` field is thus _superflous_ and accomplishes nothing**. + +___ + +But this situation can sometimes lead to overly restrictive code. That's why the +standard library uses an unstable and `unsafe` attribute to opt back into the old +"unchecked" drop-checking behavior, that this very documentation warned about: the +`#[may_dangle]` attribute. -Nope. +### An exception: the special case of the standard library and its unstable `#[may_dangle]` -The drop checker will generously determine that `Vec` does not own any values -of type T. This will in turn make it conclude that it doesn't need to worry -about Vec dropping any T's in its destructor for determining drop check -soundness. This will in turn allow people to create unsoundness using -Vec's destructor. +This section can be skipped if you are only writing your own library code; but if you are +curious about what the standard library does with the actual `Vec` definition, you'll notice +that it still needs to use a `_marker: PhantomData` field for soundness. -In order to tell the drop checker that we *do* own values of type T, and -therefore may drop some T's when *we* drop, we must add an extra `PhantomData` -saying exactly that: +
Click here to see why + +Consider the following example: ```rust -use std::marker; +fn main() { + let mut v: Vec<&str> = Vec::new(); + let s: String = "Short-lived".into(); + v.push(&s); + drop(s); +} // <- `v` is dropped here +``` + +with a classical `impl Drop for Vec {` definition, the above [is denied]. + +[is denied]: https://rust.godbolt.org/z/ans15Kqz3 + +Indeed, in this case we have a `Vec` vector of `'s`-lived references +to `str`ings, but in the case of `let s: String`, it is dropped before the `Vec` is, and +thus `'s` **is expired** by the time the `Vec` is dropped, and the +`impl<'s> Drop for Vec<&'s str> {` is used. + +This means that if such `Drop` were to be used, it would be dealing with an _expired_, or +_dangling_ lifetime `'s`. But this is contrary to Rust principles, where by default all +Rust references involved in a function signature are non-dangling and valid to dereference. + +Hence why Rust has to conservatively deny this snippet. + +And yet, in the case of the real `Vec`, the `Drop` impl does not care about `&'s str`, +_since it has no drop glue of its own_: it only wants to deallocate the backing buffer. + +In other words, it would be nice if the above snippet was somehow accepted, by special +casing `Vec`, or by relying on some special property of `Vec`: `Vec` could try to +_promise not to use the `&'s str`s it holds when being dropped_. + +This is the kind of `unsafe` promise that can be expressed with `#[may_dangle]`: + +```rust ,ignore +unsafe impl<#[may_dangle] 's> Drop for Vec<&'s str> { /* … */ } +``` + +or, more generally: + +```rust ,ignore +unsafe impl<#[may_dangle] T> Drop for Vec { /* … */ } +``` + +is the `unsafe` way to opt out of this conservative assumption that Rust's drop +checker makes about type parameters of a dropped instance not being allowed to dangle. + +And when this is done, such as in the standard library, we need to be careful in the +case where `T` has drop glue of its own. In this instance, imagine replacing the +`&'s str`s with a `struct PrintOnDrop<'s> /* = */ (&'s str);` which would have a +`Drop` impl wherein the inner `&'s str` would be dereferenced and printed to the screen. + +Indeed, `Drop for Vec {`, before deallocating the backing buffer, does have to transitively +drop each `T` item when it has drop glue; in the case of `PrintOnDrop<'s>`, it means that +`Drop for Vec>` has to transitively drop the `PrintOnDrop<'s>`s elements before +deallocating the backing buffer. + +So when we said that `'s` `#[may_dangle]`, it was an excessively loose statement. We'd rather want +to say: "`'s` may dangle provided it not be involved in some transitive drop glue". Or, more generally, +"`T` may dangle provided it not be involved in some transitive drop glue". This "exception to the +exception" is a pervasive situation whenever **we own a `T`**. That's why Rust's `#[may_dangle]` is +smart enough to know of this opt-out, and will thus be disabled _when the generic parameter is held +in an owned fashion_ by the fields of the struct. + +Hence why the standard library ends up with: + +```rust +# #[cfg(any())] +// we pinky-swear not to use `T` when dropping a `Vec`… +unsafe impl<#[may_dangle] T> Drop for Vec { + fn drop(&mut self) { + unsafe { + if mem::needs_drop::() { + /* … except here, that is, … */ + ptr::drop_in_place::<[T]>(/* … */); + } + // … + dealloc(/* … */) + // … + } + } +} struct Vec { - data: *const T, // *const for variance! + // … except for the fact that a `Vec` owns `T` items and + // may thus be dropping `T` items on drop! + _owns_T: core::marker::PhantomData, + + ptr: *const T, // `*const` for variance (but this does not express ownership of a `T` *per se*) len: usize, cap: usize, - _marker: marker::PhantomData, } ``` +
+ +___ + Raw pointers that own an allocation is such a pervasive pattern that the standard library made a utility for itself called `Unique` which: