Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Anonymous enum types called joins, as A | B #402

Closed
wants to merge 5 commits into from

Conversation

reem
Copy link

@reem reem commented Oct 16, 2014

Add support for defining anonymous, enum-like types using A | B.

Rendered

Add support for defining anonymous, enum-like types using `A | B`.
@reem
Copy link
Author

reem commented Oct 16, 2014

Note that this plays very well with #243 by allowing fn x() Y throws A | B and then allowing foo()?.bar()? where foo() throws A and bar() throws B without a lot of unnecessary cruft.

@netvl
Copy link

netvl commented Oct 16, 2014

I would very much like to see union types (why do you call them join types BTW?) in Rust! Ceylon language is probably the nicest example in existence of their implementation.

@Ericson2314
Copy link
Contributor

While I totally agree this is a useful feature for errors, it's a major change to the type system, and one that introduces a bunch of machine representation issues, especially if x : A coerces to x : A | B.

Seperate to that worry, I'd recommend taking a look at OCaml's "polymorphic variants" for inspiration, which is a neat idea similar to this.

@reem
Copy link
Author

reem commented Oct 16, 2014

@Ericson2314 By "coerces to" I meant that the compiler will implicitly insert the translation, not that they have the same representation. Notably, it is not safe to transmute from A | B to A if you know that the A | B is the A variant, since the size of A | B is the same as an enum with two variants, rather than the size of A.

@netvl the above is related to why I did not call this union types, because I see that as more UnsafeUnion<A, B>, which has the correct size and alignment to hold either A or B, much like C's union rather than an enum and would be "safe" to transmute from if you were sure of the type. Regardless, the name is up for bikeshedding.

@reem
Copy link
Author

reem commented Oct 16, 2014

@netvl Wow, Ceylon's union type is pretty cool! It is used in ways I wouldn't anticipate this being used in Rust though, i.e. I wouldn't necessarily expect let x = vec![1u, 3i, "hello"] to work or infer a type of Vec<uint | int | &'static str>, but that could be proposed as an addition to this RFC.

@Ericson2314
Copy link
Contributor

"By "coerces to" I meant that the compiler will implicitly insert the translation". Ok, that's a step in the right direction IMO. If A | B can generate new enum (invocations would be memoized), vaguely analogous to the new closure sugar, something like this might be more practical.

Edit: Making A | B = B | A is still a challange regarding discriminants. Otherwise one could make Either2, Either3, Either4... and have a macro pick the right one.

@netvl
Copy link

netvl commented Oct 16, 2014

@reem, well, "union types" is an established term for such kind of types, AFAIK. It is a bikeshed, however, I agree, I just wanted to clarify it :)

Union types in Ceylon (well, in just about any language on JVM, and not only union types - all types) rely much on subtyping. In Rust subtyping is almost nonexistent, and introducing it not only makes the type system much more complicated, there would also be no natural way to create subtypes in absence of inheritance or something like it. So it makes sense to treat union types just as anonymous enums, just as like you suggest.

After all, closures already are generating anonymous structures. I don't really know how the compiler operates, but I think this can be made in a similar way.

In the same vein, multiple occurrences of `A | B`, even in different crates,
are the same type.

As a result of this, no trait impls are allowed for join types.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not? Trait impls are allowed for tuples, which are anonymous structs.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I had not thought this through. My main motivation in saying this was that you can't write an impl for a type or trait you didn't define, but with the new rules only one of the types in the union has to be yours. We could use exactly the same rules as with tuples.

@huonw
Copy link
Member

huonw commented Oct 16, 2014

Is T | T legal?

@flying-sheep
Copy link

Is T | T legal?

I'd prefer if not. That would make no sense and be just sad

@mitsuhiko
Copy link
Contributor

The motivation mentions error handling but this problem can also be resolved in a different way through #201.

@huonw
Copy link
Member

huonw commented Oct 16, 2014

@flying-sheep why does it make no sense? enum Foo { X(T), Y(T) } makes sense.

@kennytm
Copy link
Member

kennytm commented Oct 16, 2014

If a flat union P | (Q | R) == P | Q | R is needed, then T | T == T must be handled, otherwise (A | B) | (A | C) cannot be made sense of.

@reem
Copy link
Author

reem commented Oct 16, 2014

@kennytm (A | B) | (A | C) would not be allowed because it would be flattened to A | B | A | C, which would not be allowed because of the duplication of A. I will clarify the illegality of duplicated types in joins in the RFC.

In the same vein, T | T is not allowed either.

My reasoning for this is that T | T is ambiguous to create or match against: in the case of struct T;:

let t = T as T | T; // ambiguous as to which anonymous variant is instantiated
match t {
    T => {} // ambiguous as to how this is matched
}

@kennytm
Copy link
Member

kennytm commented Oct 16, 2014

@reem You cannot disallow that because:

type AB = A | B;
type AC = A | C;
...
type ABC = AB | AC; // `A` duplicated.

Also disallowing T | T makes it useless on generics:

fn perform_either<T, A, B>(a: || -> Result<T, A>, b: || -> Result<T, B>) -> Result<T, A | B> { ... }

perform_either(
        || rename_file(),
        || copy_and_delete_original_file(),
);
// Will get `IoError | IoError` here.

@huonw
Copy link
Member

huonw commented Oct 16, 2014

A previous proposal for this supported T | T by being positional, like a tuple.

@reem
Copy link
Author

reem commented Oct 16, 2014

I think it's possible to resolve unambiguously without positioning if we just make T | T act like T | Void in the sense that you can't ever match against or instantiate the second variant, or something along those lines. Positioning makes this just shorthand for (Option<A>, Option<B>) and would require explicit instantiation syntax, which is significantly worse than the proposed sugar, IMO.

@huonw
Copy link
Member

huonw commented Oct 16, 2014

It's not shorthand for (Option<A>, Option<B>), since that allows for (None, None).

We could also have "do what I mean": let x: A = ...; let y: A | B = x; is fine since it's unambigous which variant is mean, it is just things like let y: A | A = x; that require an explicit constructor.

(This is true even with generics that can be instantiated to be the same type, since the point-of-coercion just sees the two distinct generic types, it does not care that they could end up both being u8: the variant decision has already been made.)

@reem
Copy link
Author

reem commented Oct 16, 2014

I can't really think of a situation where I'd care about which variant was chosen out of A | A, so we could just have the choice be undefined.

@arthurprs
Copy link

I kinda prefer this over #201

@Tobba
Copy link

Tobba commented Oct 16, 2014

There are a lot of uses of union types outside of error handling
In my opinion

type A = T | T;

should be equal to

type A = T;

You could possibly allow coercing a union type to another union type during match.

@glaebhoerl
Copy link
Contributor

As before I think we should clearly distinguish between anonymous sum types and union types, at least to avoid confusing ourselves. Both are potentially useful, they are vaguely similar but actually quite different, and the main thing they have in common is that both want the same syntax.

Anonymous sum types would be nothing more than special built-in syntax for enum types of any arity, just like tuples are for structs. Apart from the nicer syntax, it would be no more magical than the existing Result type. No automatic coercions, etc. You would have to explicitly construct and pattern match on them the same way as existing enums. (T|T) would make just as much sense as Result<T, T> (which is isomorphic to (bool, T)).

Union types would be a kind of type-based union which essentially has one variant per type, rather than "positional" variants. This may also allow automatic conversions from T to T|U, T|U to U|T|W, etc. Notably this is not really a sum type, because for sum types you would expect that the number of its inhabitants is equal to the sum of the number of its members' inhabitants, e.g. inhabitants(Result<T, U>) = inhabitants(T) + inhabitants(U) for all T and U, but this fails to hold for union types when T = U. It's kind of like the difference between HashSet and Vec. Different algebraic properties. A variant of union types is what is being proposed here.

(Maybe more later, only had time for this right now.)

@ghost
Copy link

ghost commented Oct 17, 2014

This is an interesting idea but there are a number of subtleties involved that I'm not sure are fully addressed by the current proposal. Some care should be taken as there are non-obvious issues with an ad-hoc approach that can lead to unsoundness and other problems.

It would be useful to understand this proposal in relation to similar ideas elsewhere, like extensible/polymorphic variants in OCaml, extensible exceptions (exn) in SML, extensible variants and open data types in Haskell, union types (also here) (as opposed to intersection types), etc. There's quite a lot of literature on this (and the dual: extensible records) and the problems have already been spelled out in great detail along with several solutions and implementations.

I'm also concerned that interaction with generics will lead to serious usability issues without something like row-polymorphism and/or subtyping (like OCaml). I don't know enough about Rust's internals to have a good idea whether either of those would be feasible here but neither are trivial additions in most cases.

@Tobba
Copy link

Tobba commented Oct 17, 2014

Let me leave in an argument for T | T becoming T:

type Lib1Error = IoError | DatabaseError;
type Lib2Error = IoError | FooError;
type Lib3Error Lib1Error | Lib2Error;

@reem
Copy link
Author

reem commented Oct 17, 2014

I've thought a bit about the duplicate type problem, and I want to articulate my new thoughts:

We must support duplicate types in joins, at minimum when instantiated through generics. Let's take an example:

fn try_both<A, B>(a: || -> Option<A>, b: || -> Option<B>) -> Option<A | B> {
    a().unwrap_or_else(|| b());
}

This function must work when instantiated with A and B as the same type, but I am not convinced that it must be unambiguous which variant was used, even if they are the same type. I think that if we sell this feature as inappropriate in the case where you care about being able to disambiguate T | T, such as in the above case.

If we assert that in the case where A | B is instantiated with the same types, it is optimized to T | T, which can be represented as T, then I think the above can work in a consistent way. We know the above optimization - that duplicate types are always compressed - will always be applicable because all types are statically known at compile time and all functions are monomorphized.

EDIT:

Effectively, if you instantiate the above method with the same type parameter, the type of the above function would be fn<T>(|| -> Option<T>, || -> Option<T>) -> Option<T>.

@pythonesque
Copy link
Contributor

I do not like this idea. I am seeing comments like "I don't care about this case" in response to issues with generics, and that makes me think this isn't well-thought out. What does this actually do that the existing type system cannot already do? Is this going to be a weird wart on Rust's type system?

I think any solution like this must be motivated by existing problems with more than syntax. For example, if there were a way that this could lead to collapsed enum representations for repeatedly nested types, that would be a win in my view. But it is not clear to me that this proposal attempts this, which again leads me to feel that it isn't that well thought-out.

@zwarich
Copy link

zwarich commented Oct 18, 2014

If you think of T1 | ... | Tn construct as defining an anonymous sum type with constructors labeled by T1, ..., Tn that are inserted in a type-directed way, then it doesn't seem to make sense unless T1, ..., Tn are pairwise incompatible.

Also, if you want (A | B) | C to be identical to A | B | C with separate compilation, then you will have to give up using the same memory representation as normal enums, similar to the various implementations of row polymorphism that exist.

@reem
Copy link
Author

reem commented Oct 18, 2014

@zwarich You've pointed out an important issue. I've given this some more thought and I'm now actually unsure if (A | B) | C should be identical to A | B | C or if (A | B) | C should create something akin to AorB | C, where AorB is just any other type.

@pythonesque This is a large and important feature, so I certainly think further consideration is warranted, but I also think that this, in some form, is a feature that we will likely want as it vastly simplifies many common cases.

@zwarich
Copy link

zwarich commented Oct 24, 2014

In the triage meeting today we decided to close this RFC and not merge it, as it is lacking some critical details with no single obvious solution. We did agree that this space of ideas has some compelling use cases, so I opened an issue in the RFCs repo to track looking into it further: #409.

@zwarich zwarich closed this Oct 24, 2014
@glaebhoerl
Copy link
Contributor

As others have noted this seems like a small and obvious feature to improve convenience and ergonomics at first, but it's actually a pretty big one with potentially profound implications for the type system. At least in its most general form.

With respect to representation, my feeling is that the correct one would likely be (TypeId, UnsafeUnion<T1...Tn>). I.e. similar to a enum, but not the same: instead of a small integer based on the number of variants, the discriminant would be a TypeId. Given that it is supposed to be a type-based union, this makes sense. It seems like it would be difficult to support all of the various equivalences and coercions between different orderings and number of occurrences of the member types any other way, although I haven't thought it through all of the way. (Maybe even the fact that the UnsafeUnion part of it would cause it to have different type/alignment when extended with more types would end up being problematic.)

Here are the tricky questions I can think of. Most of them have already been posed (and often answered in a way) in the RFC and comments, but just to gather them in one place. I'm going to use a variadic Union<...> syntax instead of | for greater clarity. Ts... stands for zero or more types.

  • Do we have:

    • A = Union<A>?
    • Union<A, Ts...> = Union<A, Ts..., A>?
    • Union<A, B, Ts...> = Union<B, A, Ts...>?
    • Union<Union<Ts...>, Us...> = Union<Ts..., Us...>?

    I think that apart from the first, these are probably things we would like to have. (I.e. so that two unions are equal iff the set of their member types is equal, irrespective of ordering or duplication, and they are "transparent" when nested.)

  • Regarding coercions and/or subtyping, do we have:

    • A -> Union<A, Ts...>?
    • Union<A, B> -> Union<A, B, Ts...>?

    Again, these seem desirable to have (or more like the whole point).

  • Suppose we write fn foo() -> Union<Box<ToString>, Box<int>> { box 100i }. Which type is the result?

  • What about fn bar() -> Union<Box<ToString>, Box<Hash>> { box 100i }?

  • What about fn baz() -> Union<Option<int>, Option<char>> { None }?

  • What about fn quux() -> Union<int, char> { FromStr::from_str("this will fail at runtime, but that's not the point").unwrap() }? Which is the type of from_str(): int, char, or Union<int, char>? Relatedly:

  • Can we have impl FromStr for Union<int, char>?

  • Do we infer Union<int, char> as the type of if cond { 9i } else { 'x' }?

In the short term, a pragmatic approach might be to just forbid most of the things, e.g. to disallow type variables (and possibly trait objects as well) from appearing in unions, which would avoid many, but not all, of the difficulties (but also be less useful). In the longer term it's probably best to study the literature, such as linked by @darinmorrison.

But all in all I'm not sure that all this machinery would be worthwhile for what is in the end a "constant-factor" improvement in convenience, rather than an increase in expressiveness or abstraction power.

@reem:

but I also think that this, in some form, is a feature that we will likely want as it vastly simplifies many common cases.

This comment appears to be at odds with my previous sentence. My impression was that the main benefit of this feature would be being able to avoid a bunch of map_err() calls in error handling code, i.e. just a convenience thing, even if a significant one. But it doesn't seem to me that it would rise to the level of a "vast simplification". Do you disagree? (Honest question - my impression may be mistaken.)

@scialex
Copy link

scialex commented Dec 9, 2014

@reem @glaebhoerl @zwarich

I have tried my hand at making a new version of this RFC that tries to address some of the concerns with it.

Do you think it's worth submitting a pull request?

@zwarich
Copy link

zwarich commented Dec 9, 2014

@scialex I haven't read the entire thing yet, but does it propose a compilation strategy?

@scialex
Copy link

scialex commented Dec 10, 2014

@zwarich not explicitly. I do not think it really needs a huge one. The compiler figures out all the overlaps that could occur given the type information it has, creates an enum with one varient for each containing a type of the given combinations. matches and all impls are just desuggared to use this. We have type mono-morph so it should be guaranteed to work (I think, its been a while since compilers and I haven't taken a very deep look at rustc internals).

This does rely on being willing to have the in memory layout of ie (A | B) | C be different from A | B | C and A | (B | C) even though the compiler will put in the code to convert between them automatically and they are semantically the same. Further it is possible that we might sometimes convert a A to a A | A with a different memory layout for some returns/arguments where we cannot rewrite it.

Also this could cause perf problems with large numbers of very generic types but that doesn't seem to be the main usecase.

IE this would take a lot of time/memory

fn stuff<A, B, C, D, E>(a: &A, b: &B, c: &C, d: &D, e: &E) -> &(A|B|C|D|E) { ... }

since it would be really returning a 120-variant enum.

@flying-sheep
Copy link

your new RFC is great!

i’m desperately missing something like join types in my attempt to create a typesafe document tree (the children of one type can be of one of several other types, e.g. Document can have Vec<Box<Section|Header>> children)

@reem
Copy link
Author

reem commented Dec 10, 2014

@scialex I like some of the changes in the RFC, but I think the treatment of duplicate types could use a lot of work, since it makes using variants of more than 2 extremely unergonomic - A | B | C could end up as A + B, B + C, A + C, or A + B + C in addition to A, B, or C. Forget matching on 4.

As a result, I don't think that RFC should be used as-is.

This has sort of slipped my mind in the past weeks, but I think I may take another stab at significantly clarifying many of the edge cases and important ideas brought up here.

@kennytm
Copy link
Member

kennytm commented Dec 10, 2014

Now that we have the FromError trait, the first example is no longer a good one.

// implement FromError for LibError.
enum LibError { ... }
impl FromError<ErrorX> for LibError { ... }
impl FromError<ErrorY> for LibError { ... }

// then just use `try!` as is from now on.
pub fn some_other_operation() -> Result<(), LibError> {
    let x = try!(produce_error_x());
    let y = try!(produce_error_y());
    Ok(())
}

@scialex
Copy link

scialex commented Dec 10, 2014

@flying-sheep: Cool that's actually almost the exact same reason why I wanted to revive it. I am making a file-system. stuff can return File|Directory|...

@kennytm: Added note about FromError.

@reem: Updated the RFC to address that concern, making it easier to match on large unions. I have improved it some and am going to submit it.

Submitted pull request #514

@huonw huonw mentioned this pull request Jul 16, 2015
@Centril Centril added A-syntax Syntax related proposals & ideas A-structural-typing Proposals relating to structural typing. A-data-types RFCs about data-types A-typesystem Type system related proposals & ideas T-lang Relevant to the language team, which will review and decide on the RFC. A-sum-types Sum types related proposals. A-expressions Term language related proposals & ideas labels Nov 27, 2018
@tema3210
Copy link

tema3210 commented Dec 4, 2019

We can define anonymous enums as follows:
(T|K|O|P|...) - any number of any types;

We can match on them as follows:

match a {
   (a,_,_,_,...)=>{...},
   (_,b,_,_,...)=>{...},
   ...
};

As for any other types we can implement traits for them.
Auto implemented traits should be implemented as for most limited variant.

@burdges
Copy link

burdges commented Dec 4, 2019

auto_enums crate provides this now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-data-types RFCs about data-types A-expressions Term language related proposals & ideas A-structural-typing Proposals relating to structural typing. A-sum-types Sum types related proposals. A-syntax Syntax related proposals & ideas A-typesystem Type system related proposals & ideas T-lang Relevant to the language team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.