Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking Issue for pointer metadata APIs #81513

Open
1 of 4 tasks
KodrAus opened this issue Jan 29, 2021 · 171 comments
Open
1 of 4 tasks

Tracking Issue for pointer metadata APIs #81513

KodrAus opened this issue Jan 29, 2021 · 171 comments
Labels
C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. S-tracking-design-concerns Status: There are blocking ❌ design concerns. T-lang Relevant to the language team, which will review and decide on the PR/issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@KodrAus
Copy link
Contributor

KodrAus commented Jan 29, 2021

This is a tracking issue for the RFC 2580 "Pointer metadata & VTable" (rust-lang/rfcs#2580).
The feature gate for the issue is #![feature(ptr_metadata)].

About tracking issues

Tracking issues are used to record the overall progress of implementation.
They are also used as hubs connecting to other relevant issues, e.g., bugs or open design questions.
A tracking issue is however not meant for large scale discussion, questions, or bug reports about a feature.
Instead, open a dedicated issue for the specific matter and add the relevant feature gate label.

Steps

Unresolved Questions

Language-level:

  • Is it, or should it be UB (through validity or safety invariants) to have a raw trait object wide pointer with an dangling vtable pointer? A null vtable pointer? If not, DynMetadata methods like size may need to be unsafe fn. Or maybe something like *const () should be metadata of trait objects instead of DynMetadata.
    Right now, there is some inconsistency here: size_of_val_raw(ptr) is unsafe, but metadta(ptr).size_of() does the same thing and is safe.
    Update (2024-10-04): It is definitely the case that the safety invariant for raw trait objects requires a valid vtable. So metadta(ptr).size_of() being safe is fine. size_of_val_raw(ptr) must be unsafe because of slices, so there is no inconsistency here.
  • should Metadata be required to be Freeze

API level:

  • Is *const () appropriate for the data component of pointers? Or should it be *const u8? Or *const Opaque with some new Opaque type? (Respectively *mut () and NonNull<()>)
  • Should ptr::from_raw_parts and friends be unsafe fn?
  • Should Thin be added as a supertrait of Sized? Or could it ever make sense to have fat pointers to statically-sized types?
  • Should DynMetadata not have a type parameter? This might reduce monomorphization cost, but would force that the size, alignment, and destruction pointers be in the same location (offset) for every vtable. But keeping them in the same location is probaly desirable anyway to keep code size small.
  • ACP: replace use of Pointee trait with a ptr::Metadata type libs-team#246
  • DynMetadata::size_of does not always return the same value as size_of_val since the former only reads the size from the vtable, but the latter computes the size of the entire type. That seems like a pretty bad footgun?

API bikesheds:

  • Name of new items: Pointee (v.s. Referent?), Thin (ThinPointee?), DynMetadata (VTablePtr?), etc
  • Location of new items in core::ptr. For example: should Thin be in core::marker instead?

Implementation history

Tracked APIs

Last updated for #81172.

pub trait Pointee {
    /// One of `()`, `usize`, or `DynMetadata<dyn SomeTrait>`
    type Metadata;
}

pub trait Thin = Pointee<Metadata = ()>;

pub const fn metadata<T: ?Sized>(ptr: *const T) -> <T as Pointee>::Metadata {}

pub const fn from_raw_parts<T: ?Sized>(*const (), <T as Pointee>::Metadata) -> *const T {}
pub const fn from_raw_parts_mut<T: ?Sized>(*mut (), <T as Pointee>::Metadata) -> *mut T {}

impl<T: ?Sized> NonNull<T> {
    pub const fn from_raw_parts(NonNull<()>, <T as Pointee>::Metadata) -> NonNull<T> {}

    /// Convenience for `(ptr.cast(), metadata(ptr))`
    pub const fn to_raw_parts(self) -> (NonNull<()>, <T as Pointee>::Metadata) {}
}

impl<T: ?Sized> *const T {
    pub const fn to_raw_parts(self) -> (*const (), <T as Pointee>::Metadata) {}
}

impl<T: ?Sized> *mut T {
    pub const fn to_raw_parts(self) -> (*mut (), <T as Pointee>::Metadata) {}
}

/// `<dyn SomeTrait as Pointee>::Metadata == DynMetadata<dyn SomeTrait>`
pub struct DynMetadata<Dyn: ?Sized> {
    // Private pointer to vtable
}

impl<Dyn: ?Sized> DynMetadata<Dyn> {
    pub fn size_of(self) -> usize {}
    pub fn align_of(self) -> usize {}
    pub fn layout(self) -> crate::alloc::Layout {}
}

unsafe impl<Dyn: ?Sized> Send for DynMetadata<Dyn> {}
unsafe impl<Dyn: ?Sized> Sync for DynMetadata<Dyn> {}
impl<Dyn: ?Sized> Debug for DynMetadata<Dyn> {}
impl<Dyn: ?Sized> Unpin for DynMetadata<Dyn> {}
impl<Dyn: ?Sized> Copy for DynMetadata<Dyn> {}
impl<Dyn: ?Sized> Clone for DynMetadata<Dyn> {}
impl<Dyn: ?Sized> Eq for DynMetadata<Dyn> {}
impl<Dyn: ?Sized> PartialEq for DynMetadata<Dyn> {}
impl<Dyn: ?Sized> Ord for DynMetadata<Dyn> {}
impl<Dyn: ?Sized> PartialOrd for DynMetadata<Dyn> {}
impl<Dyn: ?Sized> Hash for DynMetadata<Dyn> {}
@KodrAus KodrAus added T-lang Relevant to the language team, which will review and decide on the PR/issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. labels Jan 29, 2021
@SimonSapin SimonSapin changed the title Tracking Issue for ptr-meta Tracking Issue for pointer metadata APIs Jan 29, 2021
@matthieu-m
Copy link
Contributor

After experimenting with custom implementations of Box, I think there is a strong case for having strongly typed meta-data for all kinds of pointers.

The pre-allocator representation of Box is:

struct Box<T: ?Sized> { ptr: NonNull<T>, }

The post-allocator representation is very similar:

struct Box<T: ?Sized, A: Allocator = Global> {
    allocator: A,
    ptr: NonNull<T>,
}

Both automatically implements CoerceUnsized<Box<U>> where T: Unsize<U>, and all is well.

If one wants to make Box generic over its storage, then the representation becomes:

pub struct RawBox<T: ?Sized + Pointee, S: SingleElementStorage> {
    storage: S,
    handle: S::Handle<T>,
}

If S::Handle<T> == NonNull<T>, then Box is still coercible; however, in the case of inline storage, that is:

  • neither possible: when the Box is moved, so is the storage, and therefore any pointer into the storage is invalidated.
  • nor desirable: in the case of inline storage, the pointer is redundant, wasting 8 bytes.

Hence, in the case of inline storage, S::Handle<T> is best defined as <T as Pointee>::Metadata.

In order to have Box<T> : CoerceUnsized<Box<U>> where T: Unsize<U>:

  • We need: S::Handle<T>: CoerceUnsized<S::Handle<U>> where T: Unsize<U>,
  • Which means: <T as Pointee>::Metadata: CoerceUnsized<<U as Pointee>::Metadata>> where T: Unsize<U>.

And of course, Box being coercible is very much desirable.


As a result, I believe a slight change of course is necessary:

  1. All metadata should be strongly typed -- be it Metadata<dyn Debug>, Metadata<[u8]> or Metadata<[u8; 3]> -- no more () or usize.
  2. The compiler should automatically implement Metadata<T>: CoerceUnsized<Metadata<U>> where T: Unsize<U>.

I would note that having a single Metadata<T> type rather than SizedMetadata<T>, SliceMetadata<[T]>, DynMetadata<dyn T> is not necessary, only the coercion is, and since the compiler is generating those, it's perfectly free to create them "cross type". I just used the same name as a short-cut.


Addendum: What's all that jazz about inline storage?

At a high-level, Box is not so much about where memory comes from, it's a container which allows:

  • Dynamically Sized Types.
  • And therefore Type Erasure.

Having the memory inline in the Box type preserves those 2 key properties whilst offering a self-contained type (not tied to any lifetime, nor any thread). It's allocation-less type-erasure.

A motivating example is therefore fn foo<T>() -> Box<dyn Future<T>, SomeInlineStorage>: it returns a stack-allocated container which contains any future type (fitting in the storage) which can evaluate to T.

@SimonSapin
Copy link
Contributor

Box<dyn Future<T>, SomeInlineStorage> would have to be dynamically-sized itself, right? So in order to manipulate it without another lifetime or heap-allocated indirection you’d need the unsized locals language feature. And if you have that you can manipulate dyn Future<T> directly, so what’s the point of a box with inline storage?

IMO this is different from the case of Vec, which provides useful functionality on top of its storage so that ArrayVec (a.k.a. Vec with inline storage) makes sens. But Box pretty much is its storage.

@matthieu-m
Copy link
Contributor

Box<dyn Future<T>, SomeInlineStorage> would have to be dynamically-sized itself, right?

No, that's the whole point of it actually.

In C++, you have std::string and std::function implementation typically using the "short string optimization", that is a sufficiently small payload is just embedded inside, and larger ones require a heap-allocation.

This is exactly the same principle:

  • libstdc++'s std::string can contain up to 15 non-NUL characters without heap allocation on 64-bits platform.
  • sizeof(std::string) == 24, regardless of whether it's empty, contains a single character, or contains 15.

So, here, SomeInlineStorage is generally speak over-reserving. You settle on a fixed alignment and size, and then you may get mem::size_of::<Box<dyn Future, SomeInlineStorage>>() == 128 regardless of what's stored inside.

If you stored a single pointer (+v-table), well, you're paying a bit too much, but that's the price for flexibility. It's up to you size it appropriately for the largest variant.

In any case, unsized locals is strictly unnecessary, as can be seen in the tests of storage-poc's RawBox.

@SimonSapin
Copy link
Contributor

Oh I see, so this is more like SmallVec than ArrayVec and "inline" really means inline up to a certain size chosen a compile-time, and heap-allocated for values that turn out at run-time to be larger?

Back to pointer metadata though, I have a bit of a hard time following the CoerceUnsized discussion. But could you manage what you want if the handle for storage-generic Box<T> is not T::Metadata directly but another generic struct that contains that together with PhandomData<T>?

@matthieu-m
Copy link
Contributor

Oh I see, so this is more like SmallVec than ArrayVec and "inline" really means inline up to a certain size chosen a compile-time, and heap-allocated for values that turn out at run-time to be larger?

It's up to you, you can have either a purely inline storage, or you can have "small" inline storage with heap fallback.

The main point is that the "inline" portion is always of fixed size and alignment (up to the storage) and therefore RawBox itself is always Sized.

(You can an equivalent of ArrayVec instantiated in this test-suite: RawVec<T, inline::SingleRange<...>>)

Back to pointer metadata though, I have a bit of a hard time following the CoerceUnsized discussion. But could you manage what you want if the handle for storage-generic Box<T> is not T::Metadata directly but another generic struct that contains that together with PhandomData<T>?

I don't think so, given the language from the documentation of CoerceUnsized:

For custom types, the coercion here works by coercing Foo<T> to Foo<U> provided an impl of CoerceUnsized<Foo<U>> for Foo<T> exists.

Such an impl can only be written if Foo<T> has only a single non-phantomdata field involving T.

If the type of that field is Bar<T>, an implementation of CoerceUnsized<Bar<U>> for Bar<T> must exist. The coercion will work by coercing the Bar<T> field into Bar<U> and filling in the rest of the fields from Foo<T> to create a Foo<U>. This will effectively drill down to a pointer field and coerce that.

It appears that PhantomData fields are ignored for the purpose of coercion.

bors added a commit to rust-lang-ci/rust that referenced this issue Feb 18, 2021
Implement RFC 2580: Pointer metadata & VTable

RFC: rust-lang/rfcs#2580

~~Before merging this PR:~~

* [x] Wait for the end of the RFC’s [FCP to merge](rust-lang/rfcs#2580 (comment)).
* [x] Open a tracking issue: rust-lang#81513
* [x] Update `#[unstable]` attributes in the PR with the tracking issue number

----

This PR extends the language with a new lang item for the `Pointee` trait which is special-cased in trait resolution to implement it for all types. Even in generic contexts, parameters can be assumed to implement it without a corresponding bound.

For this I mostly imitated what the compiler was already doing for the `DiscriminantKind` trait. I’m very unfamiliar with compiler internals, so careful review is appreciated.

This PR also extends the standard library with new unstable APIs in `core::ptr` and `std::ptr`:

```rust
pub trait Pointee {
    /// One of `()`, `usize`, or `DynMetadata<dyn SomeTrait>`
    type Metadata: Copy + Send + Sync + Ord + Hash + Unpin;
}

pub trait Thin = Pointee<Metadata = ()>;

pub const fn metadata<T: ?Sized>(ptr: *const T) -> <T as Pointee>::Metadata {}

pub const fn from_raw_parts<T: ?Sized>(*const (), <T as Pointee>::Metadata) -> *const T {}
pub const fn from_raw_parts_mut<T: ?Sized>(*mut (),<T as Pointee>::Metadata) -> *mut T {}

impl<T: ?Sized> NonNull<T> {
    pub const fn from_raw_parts(NonNull<()>, <T as Pointee>::Metadata) -> NonNull<T> {}

    /// Convenience for `(ptr.cast(), metadata(ptr))`
    pub const fn to_raw_parts(self) -> (NonNull<()>, <T as Pointee>::Metadata) {}
}

impl<T: ?Sized> *const T {
    pub const fn to_raw_parts(self) -> (*const (), <T as Pointee>::Metadata) {}
}

impl<T: ?Sized> *mut T {
    pub const fn to_raw_parts(self) -> (*mut (), <T as Pointee>::Metadata) {}
}

/// `<dyn SomeTrait as Pointee>::Metadata == DynMetadata<dyn SomeTrait>`
pub struct DynMetadata<Dyn: ?Sized> {
    // Private pointer to vtable
}

impl<Dyn: ?Sized> DynMetadata<Dyn> {
    pub fn size_of(self) -> usize {}
    pub fn align_of(self) -> usize {}
    pub fn layout(self) -> crate::alloc::Layout {}
}

unsafe impl<Dyn: ?Sized> Send for DynMetadata<Dyn> {}
unsafe impl<Dyn: ?Sized> Sync for DynMetadata<Dyn> {}
impl<Dyn: ?Sized> Debug for DynMetadata<Dyn> {}
impl<Dyn: ?Sized> Unpin for DynMetadata<Dyn> {}
impl<Dyn: ?Sized> Copy for DynMetadata<Dyn> {}
impl<Dyn: ?Sized> Clone for DynMetadata<Dyn> {}
impl<Dyn: ?Sized> Eq for DynMetadata<Dyn> {}
impl<Dyn: ?Sized> PartialEq for DynMetadata<Dyn> {}
impl<Dyn: ?Sized> Ord for DynMetadata<Dyn> {}
impl<Dyn: ?Sized> PartialOrd for DynMetadata<Dyn> {}
impl<Dyn: ?Sized> Hash for DynMetadata<Dyn> {}
```

API differences from the RFC, in areas noted as unresolved questions in the RFC:

* Module-level functions instead of associated `from_raw_parts` functions on `*const T` and `*mut T`, following the precedent of `null`, `slice_from_raw_parts`, etc.
* Added `to_raw_parts`
@petertodd
Copy link
Contributor

Note how a SliceLen<T> would be the ideal metadata for a [T] slice, as it could express the fact that the range of valid lengths for a slice reference depends on the size of T.

However, as there's quite a few ways of manipulating slice pointers without unsafe, eg via ptr::slice_from_raw_parts, I don't know if such types can actually enforce all that much at compile time.

@SimonSapin
Copy link
Contributor

there's quite a few ways of manipulating slice pointers without unsafe, eg via ptr::slice_from_raw_parts

Yes that’s the idea behind this RFC: generalize slice_from_raw_parts to other kinds of DSTs

@Manishearth
Copy link
Member

Manishearth commented Mar 5, 2021

So a thing that seems to be missing here is a stable layout for DynMetadata itself.

A really annoying thing is that currently you cannot opaquely pass trait objects across FFI without doing a second allocation, because Box<dyn Trait> has unknown layout. Are there plans to make this feasible? To me this has been the main use case for work on DST APIs

@SimonSapin
Copy link
Contributor

Would it make sense to add conversions between DynMetada and some raw pointer type? Would that help the FFI use case?

@Manishearth
Copy link
Member

Yes, it would. It would be annoying to use, but it would suffice.

@Manishearth
Copy link
Member

Manishearth commented Mar 5, 2021

Doesn't even need to be a pointer type, jsut an opaque type with a well-defined layout. Though pointers makes it easier for other tools to use, definitely.

@SimonSapin
Copy link
Contributor

We could document that DynMetadata itself has pointer size and ABI. (And introduce some other metadata type if there’s ever a new kind of DST that needs something else.)

@Manishearth
Copy link
Member

We could document that DynMetadata itself has pointer size and ABI. (And introduce some other metadata type if there’s ever a new kind of DST that needs something else.)

That would be nice, and it would be nice if it had explicit conversion functions to *const !

@SimonSapin
Copy link
Contributor

*const (), sure, why not.

A *const ! pointer though would always be UB to dereference. So it… encodes in the type system that it is always dangling? That doesn’t seem a good fit for vtables.

@Manishearth
Copy link
Member

That works too yeah

@RalfJung
Copy link
Member

RalfJung commented Mar 13, 2021

It was brought to my attention that this feature has some very subtle interaction with unsafe code. Specifically, the following function is currently sound in the sense that safe code cannot cause any UB even when it calls this function:

pub fn make_weird_raw_ptr() -> *const dyn Send {
    unsafe { std::mem::transmute((0x100usize, 0x100usize)) }
}

This RFC is a breaking change in that it makes the above function unsound:

let ptr = make_weird_raw_ptr();
let meta = metadata(ptr);
let size = meta.size(); // *oops* UB

At the very least, this should be listed as an open concern to be resolved.

Maybe metadata should only be safe on references, not raw pointers?

@RalfJung
Copy link
Member

Should DynMetadata not have a type parameter? This might reduce monomorphization cost, but would force that the size, alignment, and destruction pointers be in the same location (offset) for every vtable. But keeping them in the same location is probaly desirable anyway to keep code size small.

Don't size and align already have to be in the same location? Certainly Miri assumes this in its implementations of size_of_val and align_of_val for trait objects -- and I don't see a way to implement this without that information being at a consistent location.

For drop, I don't understand why it is mentioned here as also having to be in the same location.

@petertodd
Copy link
Contributor

petertodd commented Mar 13, 2021 via email

@SimonSapin
Copy link
Contributor

@RalfJung Yes, this is an important point. It came up in RFC discussions but I forgot to incorporate it in unresolved questions then. I’ve done so in the issue description here.

My understanding was that make_weird_raw_ptr is sound in current compilers but that related language rules are still mostly undecided. Has that changed?

Don't size and align already have to be in the same location?

I’m also skeptical that it could be any other way, this point is mostly copied from a comment on the RFC

Maybe metadata should only be safe on references, not raw pointers?

I think that would be a serious limitation. I don’t see a reason extracting the components of any raw pointer and putting them back together shouldn’t be sound.

However if we end up deciding that raw trait object pointers shouldn’t have any validity invariant for their vtable pointer then DynMetada methods like size could be made unsafe fns.

@RalfJung
Copy link
Member

RalfJung commented Mar 13, 2021

I’ve done so in the issue description here.

Thanks. Notice however that for this RFC to work, it is not required to make "valid vtable" part of the validity invariant. Making it part of the safety invariant would be sufficient, since this is a library-level concern.

My understanding was that make_weird_raw_ptr is sound in current compilers but that related language rules are still mostly undecided. Has that changed?

Indeed, neither the validity invariant nor the safety invariant of raw pointers are pinned down exactly. I think it is safe to say though that people would expect these invaraints to be as weak as is at all possible, i.e., to require as little as possible. Even the fact that the vtable pointer is non-NULL is tripping people up and we might want to change that.

size_of_val_raw and similar APIs are unsafe fn for this exact reason.

However if we end up deciding that raw trait object pointers shouldn’t have any validity invariant for their vtable pointer then DynMetada methods like size could be made unsafe fns.

That would be another option, yes. (There could also be SafeDynMetadata as the metadata of dyn Trait references where these methods could still be safe.)

@dead-claudia

This comment was marked as off-topic.

@RalfJung

This comment was marked as off-topic.

@dead-claudia

This comment was marked as off-topic.

@RalfJung

This comment was marked as off-topic.

@dolev146

This comment was marked as resolved.

@adamreichold

This comment was marked as resolved.

@dolev146

This comment was marked as resolved.

@Rua
Copy link
Contributor

Rua commented May 24, 2024

Is there a reason why these functions are not const?

@WaffleLapkin
Copy link
Member

@Rua most functions described in the tracking issue are const. If you are talking about DynMetadata::{size_of,align_of,layout}, then no, I think those could be const and I don't see a reason why they are not yet other than "no one bothered to implement the intrinsics in CTFE".

@Rua
Copy link
Contributor

Rua commented May 24, 2024

Hmm strange, they are shown as const here, but not in the Rust documentation:
https://doc.rust-lang.org/std/ptr/fn.from_raw_parts.html

@slanterns
Copy link
Contributor

slanterns commented May 24, 2024

Maybe it's just because the UI is a bit... confusing.
Screenshot_2024-05-25-01-58-08-522_com android chrome

And like https://doc.rust-lang.org/std/ptr/fn.copy.html, the function signature will not show const if it is const-unstable.

@dead-claudia
Copy link

Created #125511 to track that rustdoc bug, so it doesn't pollute this issue any longer.

@dead-claudia
Copy link

Just filed an API question related to this: #125517

@JohnDowson
Copy link

Has there been any progress on stabilizing this feature over the last year, perhaps in the depths of zulip?

@nikomatsakis
Copy link
Contributor

So...I'm dropping a line here because I've realized that I would like to leverage this trait as a better name for Unsized in a design like this one. Given that it's still unstable this seems achievable.

@Rua
Copy link
Contributor

Rua commented Sep 10, 2024

I think #125517 needs to be looked at before anything is stabilised here. Either that, or stabilise only for dyn but leave slice metadata open?

@Kixunil
Copy link
Contributor

Kixunil commented Sep 10, 2024

@nikomatsakis good point and I strongly suspect this also ought to have a way to implement custom DynSized because CStr claims it wants to become a thin pointer in the future (with extern types I suppose) but because it currently isn't its size can be dynamically calculated and people might be relying on it.

@traviscross
Copy link
Contributor

Note that the naming of the APIs here would be affected by the ongoing proposed FCP in:

@RalfJung
Copy link
Member

RalfJung commented Oct 4, 2024

Over here, @HeroicKatora suggested some functions that would belong in this tracking issue. In slightly adjusted form, those would be:

impl<T: ?Sized> *const T {
    pub fn metadata(self) -> <T as Pointee>::Metadata;
    pub fn with_metadata<U: ?Sized>(self, other: <U as Pointee>::Metadata) -> *const U;
}

impl<T: ?Sized> *mut T {
    pub fn metadata(self) -> <T as Pointee>::Metadata;
    pub fn with_metadata<U: ?Sized>(self, other: <U as Pointee>::Metadata) -> *mut U;
}

Also of note is rust-lang/libs-team#246 which would replace all <T as Pointee>::Metadata in all these functions with Metadata<T>, and thus would avoid exposing the Pointee trait.

@RalfJung
Copy link
Member

RalfJung commented Oct 4, 2024

Also, this comment in the tracking issue is not quite accurate:

Right now, there is some inconsistency here: size_of_val_raw(ptr) is unsafe, but metadta(ptr).size_of() does the same thing and is safe.

size_of_val_raw must be unsafe because it can be used on slices where it is possible, in safe code, to create raw slices whose size computation overflows. metadta(ptr).size_of() only works for dyn Trait pointers, thus avoiding the problem. So I don't think this is inconsistent.

By now it is clear that the safety invariant of a dyn trait raw pointer requires vtable validity. So metadta(ptr).size_of() being safe is entirely justified.

@RalfJung
Copy link
Member

In fact, metadata(ptr).size_of() is different in another way:

#![feature(ptr_metadata)]
struct S<T: ?Sized> {
    x: i32,
    y: T,
}

fn main() {
    let x = S { x: 0, y: 0 };
    let xref: &S<dyn Send> = &x;
    dbg!(std::ptr::metadata(xref).size_of()); // 4
    dbg!(std::mem::size_of_val(xref)); // 8
}

DynMetadata::size_of just returns the size of the dyn part, even when it is the metadata of a pointer that has a prefix before the dyn part!

At the very least, this needs to be more clearly documented. But I think ideally we'd not provide a size_of method with such confusing behavior.

@joshlf
Copy link
Contributor

joshlf commented Oct 15, 2024

TL;DR: Pointee is very useful and should be stabilized. It would be good if, for slice DSTs, it added the ability to extract the byte offset of the trailing slice field and the size of the trailing slice's element type.

I'd like to provide some (not-so-prior) prior art and use cases. I hope that this motivates that it'd be very useful to stabilize this feature, but also that some changes may be in order (at least to support the use cases described here).

I'm going to be basing these examples on zerocopy 0.8.5, which is at this Git commit.

In zerocopy, we have the KnownLayout trait, which is roughly a polyfill for Pointee. It permits us to make a number of APIs support both sized types and slice DSTs in a generic manner. As an example of this, see FromBytes::ref_from_bytes.

Just like Pointee, KnownLayout has an associated PointerMetadata type. We use it in some bounds to restrict types to those which are slice DSTs (which includes actual slices). Usually, these are APIs which accept an explicit element count for the type's trailing slice field - for example, FromBytes::ref_from_bytes_with_elems.

Here's the full source code of KnownLayout (edited for brevity):

pub unsafe trait KnownLayout {
    /// The type of metadata stored in a pointer to `Self`.
    ///
    /// This is `()` for sized types and `usize` for slice DSTs.
    type PointerMetadata: PointerMetadata;

    /// The layout of `Self`.
    #[doc(hidden)]
    const LAYOUT: DstLayout;

    #[doc(hidden)]
    fn raw_from_ptr_len(bytes: NonNull<u8>, meta: Self::PointerMetadata) -> NonNull<Self>;

    /// Extracts the metadata from a pointer to `Self`.
    fn pointer_to_metadata(ptr: *mut Self) -> Self::PointerMetadata;

    /// Computes the length of the byte range addressed by `ptr`.
    ///
    /// Returns `None` if the resulting length would not fit in an `usize`.
    #[doc(hidden)]
    fn size_of_val_raw(ptr: NonNull<Self>) -> Option<usize> {
        let meta = Self::pointer_to_metadata(ptr.as_ptr());
        // SAFETY: `size_for_metadata` promises to only return `None` if the
        // resulting size would not fit in a `usize`.
        meta.size_for_metadata(Self::LAYOUT)
    }
}

/// The metadata associated with a [`KnownLayout`] type.
#[doc(hidden)]
pub trait PointerMetadata: Copy + Eq + Debug {
    /// Constructs a `Self` from an element count.
    ///
    /// If `Self = ()`, this returns `()`. If `Self = usize`, this returns
    /// `elems`. No other types are currently supported.
    fn from_elem_count(elems: usize) -> Self;

    /// Computes the size of the object with the given layout and pointer
    /// metadata.
    ///
    /// # Safety
    ///
    /// `size_for_metadata` promises to only return `None` if the resulting size
    /// would not fit in a `usize`.
    fn size_for_metadata(&self, layout: DstLayout) -> Option<usize>;
}

Mostly, KnownLayout just polyfills items already exposed by this or other features:

However, there are some items which would be useful to add to Pointee or related APIs:

  • PointerMetadata::size_for_metadata is used e.g. here to compute what the size would be of a pointer with the given metadata. This is useful because it avoids needing to synthesize a raw pointer just to call (e.g.) KnownLayout::size_of_val_raw
  • PointerMetadata::from_elem_count is used here to construct a type from a usize metadata regardless of whether the type in question is actually a slice DST. This is probably a pretty niche use case.
  • KnownLayout::LAYOUT encodes more information about a type's layout than, to my knowledge, Pointee exposes

KnownLayout::LAYOUT is of type DstLayout. Here's the full source code (edited for brevity):

pub struct DstLayout {
    align: NonZeroUsize,
    size_info: SizeInfo,
}

enum SizeInfo {
    Sized { size: usize },
    SliceDst(TrailingSliceLayout),
}

struct TrailingSliceLayout {
    // The offset of the first byte of the trailing slice field. Note that this
    // is NOT the same as the minimum size of the type. For example, consider
    // the following type:
    //
    //   struct Foo {
    //       a: u16,
    //       b: u8,
    //       c: [u8],
    //   }
    //
    // In `Foo`, `c` is at byte offset 3. When `c.len() == 0`, `c` is followed
    // by a padding byte.
    offset: usize,
    // The size of the element type of the trailing slice field.
    elem_size: usize,
}

This contains two pieces of information that Pointee does not expose for slice DSTs:

  • The offset of the trailing slice field
  • The element size of the trailing slice field

These are needed in order to implement validate_cast_and_convert_metadata, which is what enables us to support unsized types in methods like FromBytes::ref_from_bytes. validate_cast_and_convert_metadata validates that a particular memory range could hold a particular T: ?Sized + KnownLayout (it satisfies T's alignment and is a valid size for T). On success, it computes the correct pointer metadata to describe a T at that memory range.

In order for zerocopy to replace KnownLayout with Pointee, we would need Pointee to provide these two pieces of information.


EDIT: It may also be useful to support slicing slice DSTs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. S-tracking-design-concerns Status: There are blocking ❌ design concerns. T-lang Relevant to the language team, which will review and decide on the PR/issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests