Custom Dynamically Sized Types for Rust #1524

strega-nil · 2016-03-02T19:14:00Z

I believe this fixes #813, and is a nicer, and far more powerful, solution than #709.

dgrunwald · 2016-03-02T22:42:41Z

&CStr was originally supposed to be compatible with *mut c_char for FFI purposes. (with CStr being an unsized type)
It's only a fat pointer because of Rust's current limitations regarding DSTs.

strega-nil · 2016-03-02T23:59:21Z

@dgrunwald Ah, well, this RFC does support that.

struct CStr([libc::c_char]);
impl Dst for CStr {
    type Meta = ();
}
impl Sizeable for CStr {
    fn size_of_val(&self) -> usize {
        libc::strlen(self.as_ptr())
    }
}
impl CStr {
    pub unsafe fn from_ptr(p: *const libc::c_char) -> &CStr {
        // either
        std::mem::transmute::<*const libc::c_char, &CStr>(p)
        // or
        std::ptr::make_fat_ptr(p as *const (), ())
    }
}

Aatch · 2016-03-03T00:06:00Z

I'm not sure about this. My main concern is around how actually useful this is compared to the complexity. There is a lot of code that assumes fat pointers are two words and two words only. This would completely change that. It's a very large increase in complexity.

Another issue is how this interacts with generics and unsizing. Currently Foo<[T; N]> will coerce into Foo<[T]> and everything will work properly because we know how to do that. There's also the case of people trying to use a generic Meta type, it's unclear how that might work.

Right now, my thought is that it's a nice feature (like most that get proposed), but not worth the increased complexity.

strega-nil · 2016-03-03T00:21:45Z

@Aatch We need something like this. Look at the documentation for CStr:

Note: This operation is intended to be a 0-cost cast but it is currently implemented with an up-front calculation of the length of the string.

This is just in the standard library. Out there in real code, we have to deal with C interop, where stuff like

typedef struct _TOKEN_GROUPS {
  DWORD              GroupCount;
  SID_AND_ATTRIBUTES Groups[ANYSIZE_ARRAY];
} TOKEN_GROUPS, *PTOKEN_GROUPS;

has to be translated into

struct TokenGroups {
    group_count: u32,
    groups: [SID_AND_ATTRIBUTES; 0],
}

And traits like Index and IndexMut only return references, so what if you have a 3D world which wants to return 2D slices into the world? You can't. You just can't. You have to define getters which return a

struct 2DSlice<'a, T> {
    width: usize,
    height: usize,
    ptr: *const T,
    lifetime: PhantomData<&'a [T]>,
}
struct 2DSliceMut<'a, T> {
    width: usize,
    height: usize,
    ptr: *mut T,
    lifetime: PhantomData<&'a mut [T]>,
}

strega-nil · 2016-03-03T00:22:53Z

@Aatch Unsizing isn't touched in this RFC. We really need integer generics to start doing stuff with unsizing. However, once we do get integer generics, it'll be quite easy to add that to this and we can have unsizing for everybody :)

strega-nil · 2016-03-03T00:30:38Z

@Aatch A generic Meta type isn't a problem, because of this error: "the type parameter T is not constrained by the impl trait, self type, or predicates".

reem · 2016-03-03T00:41:14Z

@ubsan struct Whatever<T> { inner: Vec<T>> } impl<T> Whatever<T> Dst { type Meta = T; }?

strega-nil · 2016-03-03T00:46:17Z

@reem 1) That's not an unsized type, 2) that wouldn't actually be a problem because you know what Meta is (as long as Whatever<T: Copy>)

bluss · 2016-03-03T00:49:27Z

@japaric Didn't you work on user-defined DSTs before? Do you have a link to your old effort?

thepowersgang · 2016-03-03T01:24:13Z

How would indexing the inner [T] of a custom DST be handled? It doesn't have the element count metadata from a normal slice. Would it be an unsafe operation? Or would the normal impl [T] methods be removed/hidden by the compiler when the slice is accessed via a custom DST pointer.

Idea: Add fn element_count(&self) -> usize to the Dst trait

strega-nil · 2016-03-03T01:29:10Z

@thepowersgang It's an unsafe operation. The inner [T] isn't "real"; it's only there to get the base pointer of the unsized data.

eddyb · 2016-03-03T01:45:22Z

I originally proposed using [T; 0] for anything slice-like because it has the right alignment and can serve as a base pointer.

@Aatch Fat pointers being hardcoded to one pointer-sized piece of metadata is a mistake. There are important extensions to slice and traits we need to support and that hardcoding has to go (which is easier with the MIR).
cc @nikomatsakis about his desire to have TraitA+TraitB work as a trait object with 2 vtable pointers.

strega-nil · 2016-03-03T01:47:21Z

I liked [T] because it seemed nicer; this should be up for bikeshedding.

japaric · 2016-03-03T01:52:14Z

Didn't you work on user-defined DSTs before? Do you have a link to your old effort?

Yes, I was working on it last year. I don't have time right now to analyze/comment on the design proposed in this RFC. So I'm just going to drop a few links here:

Aatch · 2016-03-03T02:02:56Z

@thepowersgang actually brings up a good point, if you can get at the [T], then you'd expect to be able to work with it like a regular [T]. I'd prefer a way of indicating "this is unsized" that doesn't use [T] in that case.

Either use a [T;0] or something else that isn't already a DST. I don't think this really needs a marker for the "unsized portion" anyway, since a more trait-like DST would probably want to use (). The trait itself should be enough. It might be possible to treat an implementation of Dst as an implied impl !Sized, which would handle most of the issues I can think of.

strega-nil · 2016-03-03T02:12:16Z

Hmm, yeah, I do think [T; 0] is better now that it's been discussed.

strega-nil · 2016-03-03T02:26:45Z

Although, I'm thinking about how it would work with integer generics and Unsizing coercions... I think that [T] is the best way, because we would have some sort of way of saying "This PascalStr is a Sized PascalStr (which can be unsized)", or whatever, which would look something like

struct PascalStr {
    len: usize,
    buf: [u8],
}
// &PascalStr == ptr -> { len, [buf; len] }
// &PascalStr<N> == ptr -> { N, [buf; N] }
// PascalStr<N> == { N, [buf; N] }

Aatch · 2016-03-03T02:55:29Z

@ubsan eh, I'm not sure that's really worth it here. My main concern is that the compiler now has to be aware of a lot more context when handling field access. The main advantage of [T; 0] (or a semantically-similar marker type) is that you can't safely access any data through a pointer to it, but it has all the relevant properties of T. I'm not sure I like the idea of &foo.bar being invalid just because foo implements Dst.

Unsizing is probably better handed via separate types. This is already the case, [T; N] might look similar to [T], but they're different types completely. It's more obvious when you consider trait object unsizing. I think the same is going to be true here, if you want unsizing, you're gonna have to have a separate Sized type like everybody else.

strega-nil · 2016-03-03T02:58:57Z

@Aatch but they're not completely different types. You need them to be exactly the same type, in fact, behind the pointer, for unsizing coercions to work.

Aatch · 2016-03-03T03:00:03Z

@ubsan no, they're the same representation, not the same type.

eddyb · 2016-03-03T03:06:44Z

struct PascalStrBuffer<B> {
    len: usize,
    buf: B
}
type PascalStrArray<const N: usize> = PascalStrBuffer<[u8; N]>;
type PascalStr = PascalStrBuffer<[u8]>;

This might be an okay compromise of sorts.

I've also considered having a single optional integer parameter, which denotes the minimal contained size and defaults to 0 (which would mean [u8; 0], i.e. "any size").
The problem with that, though, is you lose the sized aspect for which you wanted the flexibility in the first place, so it's probably pointless.

strega-nil · 2016-03-03T03:09:17Z

@Aatch fine; you can argue whether they're the same type or not. They are the same representation, definitely, and if you had to write separate Sized versions of each type, imagine the really big types:

#[repr(C)]
struct CLAIM_SECURITY_ATTRIBUTE_RELATIVE_V1 {
    Name: DWORD,
    ValueType: WORD,
    Reserved: WORD,
    Flags: DWORD,
    ValueCount: DWORD,
    union Values {
        pInt64: [DWORD],
        pUint64: [DWORD],
        ppString: [DWORD],
        pFqbn: [DWORD],
        pOctetString: [DWORD],
    }
}

imagine writing two of these types, and now for each of the Windows flexible array structs. It gets ridiculous.

eddyb · 2016-03-03T03:11:29Z

It would be almost impossible to ensure the same layout for two different definitions though.
And I'm not sure how well #[repr(C)] would work with the DST stuff, although I reckon that it seems useful to have C interop defined for non-trait DSTs anyway.

Aatch · 2016-03-03T03:16:57Z

In that case how about we restrict this to slice-like DSTs and use @thepowersgang's suggestion of having an element_count method?

strega-nil · 2016-03-03T03:26:40Z

@Aatch that seems overly restrictive. For example, if you have a

struct 2dSlice<T>([T]);
impl<T> Dst for 2dSlice<T> {
    type Meta = (usize, usize);
    fn element_count(&self) -> usize {
         // (meta.0 * meta.1)
    }
}
impl<T> Sizeable for 2dSlice<T> {...}

fn main() { 
    let x = 2dSlice::<(), (std::usize::MAX, std::usize::MAX)>::new(); // not possible with `element_count`
}

Aatch · 2016-03-03T03:32:27Z

@ubsan that's not possible anyway! You can't have something that large. And it's not like the same reasoning doesn't apply to the size_of_val case.

Oh, and something I just thought of: destructors/dropping. It's all well and good having something that only contains Copy` types, but what if you have something non-Copy in your custom DST?

strega-nil · 2016-03-03T03:35:17Z

@Aatch yes you can :) look at the type inside!

You only have a pointer to the values, and aren't the owner. You could impl Drop to drop the inside values, and that would work for something like Box.

Aatch · 2016-03-03T03:49:21Z

@ubsan "not the owner"? Then who is? [T] owns the Ts it contains, sure, if you have a &[T] you don't own the [T] itself, but that's not the same.

Do I now have to explicitly implement Drop to get the correct behaviour for my custom DST? I mean, fine, if that's the case, that's the case, but it should be in the RFC.

So far everything has been focused on custom strings, N-dimensional arrays of Copy types and N-dimensional slices from... somewhere. I'd like to see this actually all worked out, you may have your use-cases, but you need to look at how this can be used, not just how you would use it.

strega-nil · 2017-02-06T10:06:30Z

I can no longer keep this open in good faith. Someone else can take it from me, but ... personal issues.

nikomatsakis · 2017-02-17T18:11:48Z

So, for some time now I've been wanting to write a followup to my previous comment sharing my thoughts about this work. Even though @ubsan has closed the RFC, I figured I should leave it anyway. Hopefully it is clear from my previous comments that I think the ideas in here are good in a technical sense, and I'd like to do some experimentation with them. However, it's probably also clear that I feel somewhat hestitant about this. My concerns here are not technical, but rather all about prioritization and motiviation.

Rust is an ambitious language. Making it successul is a big job and there is a lot of work ahead for us. When I ask myself "how would Custom DST help Rust grow over the next year", I don't feel like I hear a convincing answer. The main use case that I see is helping make matrix libraries more ergonomic. Now, admittedly, this is somewhat compelling, but otoh lack of custom dst isn't, I don't think, blocking work on Matrix-like things, it's just a way to make them nicer to use. (I'd be interested to hear otherwise; are there any applications where the presence of custom DST would dramatically change the design in some way that it's not even worth doing it otherwise?)

Earlier I had thoughts, which I mentioned briefly in my comment, about the idea of trying to land this work in a very provisional way, to enable hacking. I am still interested in establishing a process for this. I think there are a number of things that might benefit from having a more "experimentation friendly" process (e.g., naked fns, inline asm, and interrupt ABIs come to mind).

But I haven't gone and publicly worked on defining such a process because I also have concerns: the fact is that any time there is significant churn in the codebase, it has the potential to take a lot of time for core developers who are trying to focus on other things. I think the work around ! has proven this point quite dramatically! By any measure, ! is not a core feature, but working out all the details on it and how it interacts with the unsafe code guidelines, match statements, and so forth has taken and will continue to take quite a lot of time and energy. (I think that much of the work around ! is work we would have to do anyway, as it happens, but perhaps with less urgency and more at a time of our choosing.) And even the most innocent and orthogonal of feature can cause ICEs and regressions which are a big distraction.

(On the other hand, there are some areas where we are adopting a looser process. We did land naked fns in a "provisional state", for example, and that has largely been a non-event. There is some experimentation around the embedded area, particularly with ABIs. But all of these are quite narrow and targeted compared to custom dst, which affects the code that handles every single reference.)

Before @ubsan closed this RFC, I was working up my courage to move to postpone (close) -- but I kept hesitating, because I think it's a good idea and I like it. Yet still I have the feeling that on balance it just isn't a good investment of resources. I regret that hesitating perhaps sends a more frustrating message than actually moving to close. So @ubsan, I'm sorry about that.

AtheMathmo · 2017-02-17T18:32:49Z

@nikomatsakis Thank you for your comment.

I agree with what you're saying - there are more pressing things to work on right now. I also wanted to comment on the following:

Now, admittedly, this is somewhat compelling, but otoh lack of custom dst isn't, I don't think, blocking work on Matrix-like things

I think this is an accurate statement. There are a few things that I cannot do without custom DSTs, some of which don't even have ugly workarounds, but these are only mechanisms to tidy up the code and make things easier for users. Here are some example issues that we had to put off:

AtheMathmo/rulinalg#149 (I think custom DSTs help this)
AtheMathmo/rulinalg#14

It would be really nice to parallel the Vec<T>-&[T] relationship in the core library but Matrix-like things are still workable without this (to me).

nikomatsakis · 2017-02-21T17:29:04Z

@AtheMathmo interestingly, I was planning to come and post a comment here anyhow just to mention I myself ran into a use-case for this just yesterday when hacking on my NLL prototype, specifically the bitset code. Really it's just the "2-D matrix use case" in disguise, but in trying to play with it, I did appreciate that there are things that are hard to do "just with methods". Though maybe I didn't find the best answer.

In particular, I have a BitSet type that supports, for each graph node, a bit set of N bits. I wanted the ability to also create a BitBuf (one row of N bits) and a BitSlice (a view into either BitSet or BitBuf). This all worked ok but it seemed like I had to copy-and-paste various bits of logic and couldn't easily eliminate the redundancies. I remember thinking that if I could have Deref and DerefMut impls I would have wound up with less redundancy. Sorry not very specific. I have to revisit it now and try to re-examine.

In any case, I definitely think that this conversation -- to what extent is this "just sugar" vs unlocking some fundamental capability -- is the critical one to deciding how to prioritize this change.

nikomatsakis · 2017-02-21T17:29:57Z

Another thing that I was thinking about was this question: is this the sort of capability that, when added, would mean that the interface for existing libraries would want to be completely reworked to take advantage of it?

AtheMathmo · 2017-02-21T19:04:50Z

is this the sort of capability that, when added, would mean that the interface for existing libraries would want to be completely reworked to take advantage of it?

I cannot speak for the others who are looking to make use of this feature, but for me I would be making significant changes.

Right now we have the following types in rulinalg: Matrix, MatrixSlice, MatrixSliceMut. And we have BaseMatrix and BaseMatrixMut traits to let us be generic over these. If we had custom DSTs then we could use Deref and DerefMut and (probably) wouldn't want these traits any more - we just implement everything for the slice types and let Deref do the work.

Another particularly difficult area is with operator overloading. Right now we have an explosion of overloading implementations for all combinations of matrix types. I think that custom DSTs could help here too but I haven't explored the idea too much. (Hoping Deref lets us cover all &MatType op &OtherMatType etc. ops).

I'll try to spend some time thinking about whether there is anything else that I can add to this discussion. I've got quite used to working around the lack of custom DSTs and I don't think there are many features missing for the user. There are however quite a few ways we could make things nicer - not needing to import traits everywhere, being able to have &MatrixSlice return types in their code and with Index etc., and others.

nikomatsakis · 2017-02-22T15:16:08Z

@AtheMathmo it seems like it would be a fruitful exercise to try and use e.g. the design from my comment and sketch out how the types in your library would work (in contrast to now) and throw it in a gist. (I'd like to see how you envision the operator overloading working, too.) Apologies if you've already done it and I missed it.

(The design from the RFC seems mostly fine too, I don't recall there being any major differences.)

AtheMathmo · 2017-02-22T16:00:46Z

@nikomatsakis that sounds like a good idea!

I'm a little busy for the rest of this week but hopefully will be able to sketch something out early next week.

nikomatsakis · 2017-02-23T20:36:03Z

@AtheMathmo great, I'd love to see it. Since this RFC is closed, and it feels like we're still a bit in "design discussion" here, I'm going to open a thread on internals to carry on the conversation:

https://internals.rust-lang.org/t/custom-dst-discussion/4842

ghost · 2017-06-17T10:25:53Z

For some more motivation, here's what I need in Crossbeam:

Heap-allocated Array<T>. The head consists of an usize (length), and the tail is [T].
Heap-allocated SkiplistNode<K, V>. The head consists of (K, V, usize) (key, value, height), and the tail is [AtomicPtr<SkiplistNode<K, V>>].
Other similar situations with heap-allocated objects. The head is some arbitrary data and the length. The tail is an array of that length.

Pointers to such objects must be thin because atomically manipulating multiple words is a pain to do portably and performantly. Some support from the language for such use of DSTs would be great to have.

SimonSapin · 2017-08-07T11:37:59Z

Heap-allocating (HeaderIncludingLength, [Item]) (possibly with not all items initialized) is a common pattern that could use some more built-in support. But does it need to be language-level with DSTs, or could it be a library-level?

bergus · 2018-06-14T20:19:16Z

@SimonSapin Looking for exactly that as well. Have you found a suitable workaround yet?

SimonSapin · 2018-06-15T06:53:58Z

@bergus Today, the best you can do is manual memory layout computation, raw allocation, and unsafe initialization. See for example:

https://github.com/rust-lang/rust/blob/1.26.2/src/liballoc/rc.rs#L714-L727
https://github.com/rust-lang/rust/blob/1.26.2/src/liballoc/rc.rs#L664-L680

eddyb · 2018-06-15T14:23:43Z

There's also this pattern for a slice with a header, and !Sized thin pointers:
https://github.com/rust-lang/rust/blob/5205ae8bdc2991eecf3bfbb58ed8f56c0673e738/src/librustc/ty/mod.rs#L603-L706

bergus · 2018-06-15T16:15:01Z

@SimonSapin Thanks
@eddyb That's pretty much what I ended up with (needed a bit until I figured out #[repr(C)] was necessary…) though without the OpaqueSliceContents. I had seen extern type mentioned in #813, but the guide explanation from the RFC was not exactly helpful. Does this basically prevent us from allocating the dynamically-sized slice on the stack and from statically determining its size, so that we are forced to use it only through pointers?

eddyb · 2018-06-15T18:01:05Z

Does this basically prevent us from allocating the dynamically-sized slice on the stack and from statically determining its size, so that we are forced to use it only through pointers?

Yes, and if you need mutable references, it prevents someone from (mis-)using mem::swap.

velvia · 2019-01-06T06:53:05Z

Hi folks, I'm new to the Rust community, but was really hoping to implement a cross-platform library for high performance compressed vectors using this DST feature. These vectors would have, for compatibility reasons, a u32 size header and other header fields followed by something like data: [u8] (or u64, etc.). Kind of like what @stjepang and @SimonSapin are asking for. Having to work around this using unsafe thin pointer coercion and tricks like [u8; 0] as markers kind of is discouraging me a bit from porting it from the JVM (though there are certainly other reasons to port it, such as access to SIMD and things like that).

eddyb · 2019-01-30T09:51:21Z

@velvia This was superseded by #2594.

Custom Dynamically Sized Types for Rust

16ae377

Add a size_of_prelude intrinsic

ca35a3e

strega-nil closed this Feb 6, 2017

AtheMathmo mentioned this pull request Feb 11, 2017

Range-based indexing AtheMathmo/rulinalg#149

Closed

SSheldon mentioned this pull request Feb 21, 2017

RFC: truly unsized types #813

Closed

cramertj mentioned this pull request Mar 27, 2017

Tuple-Based Variadic Generics #1935

Closed

hanna-kruppe mentioned this pull request Apr 25, 2017

document what "thin pointers" are rust-lang/rust#41300

Open

mikeyhew mentioned this pull request Jun 2, 2017

Make mem::size_of_val and mem::align_of_val take raw pointers instead of reference #2017

Open

plietar mentioned this pull request Aug 10, 2017

Tracking issue for RFC 1861: Extern types rust-lang/rust#43467

Open

7 tasks

dtolnay mentioned this pull request Nov 1, 2017

Remove T: Sized on pointer as_ref() and as_mut() rust-lang/rust#44932

Merged

hanna-kruppe mentioned this pull request Nov 30, 2017

Consider adding Box::uninitialized function rust-lang/rust#46406

Closed

mikeyhew mentioned this pull request Dec 20, 2017

Add DynSized trait (rebase of #44469) rust-lang/rust#46108

Closed

gnzlbg mentioned this pull request Oct 11, 2018

Representation of Rust references (&T, &mut T) and raw pointers (*const T, *mut T`) rust-lang/unsafe-code-guidelines#16

Closed

Ixrec mentioned this pull request Nov 11, 2018

RFC: Custom DSTs #2594

Closed

Custom Dynamically Sized Types for Rust #1524

Custom Dynamically Sized Types for Rust #1524

Conversation

strega-nil commented Mar 2, 2016

dgrunwald commented Mar 2, 2016

strega-nil commented Mar 2, 2016

Aatch commented Mar 3, 2016

strega-nil commented Mar 3, 2016

strega-nil commented Mar 3, 2016

strega-nil commented Mar 3, 2016

reem commented Mar 3, 2016

strega-nil commented Mar 3, 2016

bluss commented Mar 3, 2016

thepowersgang commented Mar 3, 2016

strega-nil commented Mar 3, 2016

eddyb commented Mar 3, 2016

strega-nil commented Mar 3, 2016

japaric commented Mar 3, 2016

Aatch commented Mar 3, 2016

strega-nil commented Mar 3, 2016

strega-nil commented Mar 3, 2016

Aatch commented Mar 3, 2016

strega-nil commented Mar 3, 2016

Aatch commented Mar 3, 2016

eddyb commented Mar 3, 2016

strega-nil commented Mar 3, 2016

eddyb commented Mar 3, 2016

Aatch commented Mar 3, 2016

strega-nil commented Mar 3, 2016

Aatch commented Mar 3, 2016

strega-nil commented Mar 3, 2016

Aatch commented Mar 3, 2016

strega-nil commented Feb 6, 2017

nikomatsakis commented Feb 17, 2017

AtheMathmo commented Feb 17, 2017

nikomatsakis commented Feb 21, 2017

nikomatsakis commented Feb 21, 2017

AtheMathmo commented Feb 21, 2017

nikomatsakis commented Feb 22, 2017 • edited Loading

AtheMathmo commented Feb 22, 2017

nikomatsakis commented Feb 23, 2017

ghost commented Jun 17, 2017

SimonSapin commented Aug 7, 2017

bergus commented Jun 14, 2018

SimonSapin commented Jun 15, 2018

eddyb commented Jun 15, 2018

bergus commented Jun 15, 2018

eddyb commented Jun 15, 2018

velvia commented Jan 6, 2019

eddyb commented Jan 30, 2019

nikomatsakis commented Feb 22, 2017 •

edited

Loading