Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recursive and unsized enums #1151

Open
tomaka opened this issue Jun 5, 2015 · 21 comments
Open

Recursive and unsized enums #1151

tomaka opened this issue Jun 5, 2015 · 21 comments
Labels
T-lang Relevant to the language team, which will review and decide on the RFC.

Comments

@tomaka
Copy link

tomaka commented Jun 5, 2015

It is currently forbidden to have this enum, because it is recursive:

enum Sequence<T> {
    Element { value: T, next: Sequence<T> },
    End
}

However in theory we could accept it by making Sequence unsized.

@glaebhoerl
Copy link
Contributor

I think this is a really cool idea, if it can work (I don't momentarily see any reason why it couldn't, but I haven't thought very hard). Reminds me a bit of serialization formats where you have a flat stream of bytes with tags to indicate what comes next.

@ftxqxd
Copy link
Contributor

ftxqxd commented Jun 6, 2015

It could be a little tricky to define how to construct one of these. A similar instance of the same problem exists today: it’s valid to to define a struct whose last field is a non-type-parameter unsized type…

struct Dst {
    x: [u8],
}

…but it’s basically impossible to construct one of these, because there’s no Sized equivalent to coerce from. I believe this proposal suffers from the same problem. The obvious solution would be to somehow allow unsized types in struct initialisers, but I’m not sure about how that would work in practice.

@Kimundi
Copy link
Member

Kimundi commented Jun 7, 2015

@P1start: Afaik @eddyb has a few ideas how to make that work.

@eddyb
Copy link
Member

eddyb commented Jun 7, 2015

@P1start you would only allow it in an in-place context (&expr, box expr, in place { expr }) and then drag around enough context to save the extra information after (while?) writing the data.
Though panics might be tricky, it should still be possible.

@ticki
Copy link
Contributor

ticki commented Nov 16, 2015

I think this is really a cool idea, and is needed for the sake of consistency.

@nrc nrc added the T-lang Relevant to the language team, which will review and decide on the RFC. label Aug 29, 2016
@serprex
Copy link

serprex commented Nov 8, 2016

A useful extension of this is that an unsized enum which is non recursive could also be useful for taking up less space (consider enum BigOrSmall { A([u8; 2000]), B([u16; 10])}). Perhaps a #[repr(unsized)]

@tupshin
Copy link

tupshin commented Jan 7, 2017

I'll just weigh in in favor of this given my internals question where I unexpectedly ran into this same limitation.

@burdges
Copy link

burdges commented Jan 7, 2017

There are discussions elsewhere about doing match on "artificial" enums, so another idea might be a trait

trait Match {
    type MatchEnum;
    fn run_match(&self) -> MatchEnum;
    fn run_match_mut(&mut self) -> MatchEnum;
}

so that if impl Match for FakeEnum exists then match fake_enum { .. } becomes match run_match(&fake_enum) { .. } or the mut version depending upon the patterns.

Assuming the trait fields RFC, there is a version with nicer borrowing semantics

trait Match {
    type MatchEnum;
    type EnumField;
    target: EnumField;
    fn run_match(&EnumField) -> MatchEnum;
    fn run_match_mut(&mut EnumField) -> MatchEnum;
}

where match fake_enum { .. } becomes match run_match(&fake_enum.target) { .. }.

We can maybe do the recursive unwrapping these describe using this approach like

struct Sequence<T>([T]);

enum SequenceEnum<T> {
    Element(pub T, &Sequence<T>),
    End
}

impl<T> Match for Sequence<T> {
    type MatchEnum = SequenceEnum<T>;
    fn run_match(&EnumField) -> MatchEnum {
        if self.0.len() > 0 { Element(self.0[0],Sequence<T>(&self.0[1])) else { End }
    }
}

except with lifetimes and unsafety around the borrowing of self.0. And maybe even an explicit length field if [T] lacks that.

All this looks less ergonomic of course, but you could maybe fix that with macros somehow. And it's definitely something lower-level developers run into frequently.

I donno if this approach might place less nicely with future schemes for type-level numerics though.

@RReverser
Copy link

To bump this thread: now that we have enums with well-defined memory layout such as repr(u8), repr(C) (ping @eddyb), etc., could we allow at least such enums to contain unsized data as last item of enum payload?

Then it would be possible at least for an unsafe code to construct such enums using same techniques as for any other structs (which such enum is equivalent to), and for other safe code to use it.

@eddyb
Copy link
Member

eddyb commented Mar 25, 2020

@RReverser It helps with the optimization problem, but you still have the issue of the metadata not always being valid, and you also need to read from behind the data pointer to determine the active variant to get e.g. the alignment of the enum, which is needed for an operation like computing the true field offset in a type having that enum as a field (e.g. &pair.1 on (u8, Enum)).

I forget what we did for unions, did we just ban unsized unions? (cc @RalfJung)

@RustyYato
Copy link

@eddyb Unsized unions are in fact banned, playground.

@RReverser
Copy link

you also need to read from behind the data pointer to determine the active variant to get e.g. the alignment of the enum

Hmm. Isn't that the case for any unsized data structures anyway? E.g. you need to read from behind the data pointer to get alignment of a struct ending with a dyn Trait.

@eddyb
Copy link
Member

eddyb commented Mar 25, 2020

E.g. you need to read from behind the data pointer to get alignment of a struct ending with a dyn Trait.

No, the reference/pointer to the struct contains the data pointer and the vtable pointer, there's nothing behind the data pointer that's not in a version of the struct where the dyn Trait is replaced by the concrete type (this is how unsizing works after all).

@RReverser
Copy link

Ah, got it, makes sense... And, I suppose, same applies to actual variant vtables.

On a second thought, given that only one variant of the enum can be active and hold unsized data, couldn't &SomeEnum be encoded as a (data_ptr_for_the_entire_enum, vtable_just_for_active_variant)?

Then other parts of the type system could continue reading layout and vtable information as they normally do. The downside is that now you have to read from behind data pointer when taking a reference... but that's probably still fine, because creating a reference to invalid data is a declared UB anyway, and still better than the alternative.

@RustyYato
Copy link

RustyYato commented Mar 25, 2020

On a second thought, given that only one variant of the enum can be active and hold unsized data, couldn't &SomeEnum be encoded as a (data_ptr_for_the_entire_enum, vtable_just_for_active_variant)?

No, counterexample

enum Foo {
    N,
    C(Foo)
}

Some examples of this type Foo::C(Foo::C(Foo::N)), and Foo::C(Foo::N). Both of these have the same active variant for the outermost wrap, but the inner Foo have different variants. How would that be encoded? (hint, it can't in general). We would have to load from the data pointer. That said, that in itself wouldn't kill this. It would just make this the very first thin pointer DST with language support.


Just to make sure we understand why this isn't possible in general, because the enum above is solvable by just encoding the "length" of the enum, but it only get's more complex from here:

Consider this, (a simple length won't suffice, we also need to know what the last element is, but that's just a bit more metadata that we could store)

enum Foo {
    A,
    B,
    C(Foo)
}

Or even this, (where a length is insufficient)

enum Foo {
    A,
    B(Foo),
    C(Foo),
}

Here is a case that would truly require reading from the data pointer, no other way around it.

Then other parts of the type system could continue reading layout and vtable information as they normally do. The downside is that now you have to read from behind data pointer when taking a reference... but that's probably still fine, because creating a reference to invalid data is a declared UB anyway, and still better than the alternative.

I think this is fine, we should be able to do that.

@RReverser
Copy link

Sorry, before reading in depth, I'm confused:

No, counterexample: ...

I think this is fine, we should be able to do that.

So which one is it? 😀 (as the comment you referenced had only a single suggestion)

@RReverser
Copy link

RReverser commented Mar 25, 2020

Reading through the counter-examples.

Both of these have the same active variant for the outermost wrap, but the inner Foo have different variants. How would that be encoded? (hint, it can't in general).

I don't see why not. Such enum

enum Foo {
    N,
    C(Foo)
}

would be equivalent to already existing and working struct definition (not exactly equivalent code, just an example for demonstration purposes)

pub struct Foo_N;

pub struct Foo_C<V: ?Sized>(Foo<V>);

pub struct Foo<V: ?Sized> {
    tag: u8,
    variant_data: V,
}

pub fn dyn_foo() -> Box<Foo<dyn std::any::Any>> {
    let foo = Foo {
        tag: 1,
        variant_data: Foo_C(Foo {
            tag: 1,
            variant_data: Foo_C(Foo {
                tag: 0,
                variant_data: Foo_N
            })
        })
    };
    Box::new(foo)
}

Its construction is already well-defined and works on stable Rust. Godbolt link

All I'm suggesting is for unsized enums to be encoded in the same way.

@RustyYato
Copy link

Sorry, I may have misunderstood what you were proposing, but I thought you meant that

  1. It is possible to encode which variant the enum is in without reading from the data pointer (which isn't possible)
    • I thought you realized this wasn't possible after I had already written out that portion, so I didn't really want to cut it out

After that realization I may have misinterpreted the second part of your comment to mean

  1. We can allow reading from the data portion, and there wasn't anything wrong with that (this is fine, I think)

would be equivalent to already existing and working struct definition (not exactly equivalent code, just an example for demonstration purposes)

They sure do look the same, but they are very different in a single pivotal way. In the enum example you have just 1 type, but in the struct example you have 3 types FooN, FooC<FooN>, FooC<FooC<FooN>> each of these could have it's own v-table! This is relevant because we can't generate a v-table for every possible combination of the enum, but we can for the struct case. This is fundamental to how enums and traits work: traits offer a open polymorphism, and enum is closed polymorphism.

@RReverser
Copy link

This is relevant because we can't generate a v-table for every possible combination of the enum, but we can for the struct case.

Now that we're getting there... why not? Note that we wouldn't need to generate it for every possible combination, just for the ones actually constructed in code (and only for unsized enums), which are limited, just like in case with structs.

I believe there was even another long-term RFC going around about allowing to use separate enum variants as individual types, which seems like would intersect quite nicely with this proposal, as this one also suggests to treat enum variants as separate types.

@RustyYato
Copy link

So you are trying to blur the distinction between enums and traits? That sounds interesting. I could see this as a way of implementing a sort of sealed traits. One small problem, how do you construct the vtable (if we are going down that route)?. With traits we can never affect the bare trait object's implementation of the trait, so it is able to just call out using the v-table. But with enums, we are effectively writing that very implementation. What I'm trying to get at is this: We don't have Foo::C's implementation of some method, we only have Foo's implementation. So we can't generate a vtable.

@amosonn
Copy link

amosonn commented Jul 11, 2020

It might be possible to create a Foo::C "version" of Foo-s implementation, where in every destructuring (match, if let, etc.) only the Foo::C branch is taken - effectively encoding the discriminant into the implementation. This does indeed go further in the direction of making each variant a type of its own. But this also sounds like a lot of work in codegen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-lang Relevant to the language team, which will review and decide on the RFC.
Projects
None yet
Development

No branches or pull requests