RFC: Anonymous enum types called joins, as `A | B` #402

reem · 2014-10-16T05:23:21Z

Add support for defining anonymous, enum-like types using A | B.

Add support for defining anonymous, enum-like types using `A | B`.

reem · 2014-10-16T05:27:40Z

Note that this plays very well with #243 by allowing fn x() Y throws A | B and then allowing foo()?.bar()? where foo() throws A and bar() throws B without a lot of unnecessary cruft.

netvl · 2014-10-16T05:58:00Z

I would very much like to see union types (why do you call them join types BTW?) in Rust! Ceylon language is probably the nicest example in existence of their implementation.

Ericson2314 · 2014-10-16T06:02:04Z

While I totally agree this is a useful feature for errors, it's a major change to the type system, and one that introduces a bunch of machine representation issues, especially if x : A coerces to x : A | B.

Seperate to that worry, I'd recommend taking a look at OCaml's "polymorphic variants" for inspiration, which is a neat idea similar to this.

reem · 2014-10-16T06:04:56Z

@Ericson2314 By "coerces to" I meant that the compiler will implicitly insert the translation, not that they have the same representation. Notably, it is not safe to transmute from A | B to A if you know that the A | B is the A variant, since the size of A | B is the same as an enum with two variants, rather than the size of A.

@netvl the above is related to why I did not call this union types, because I see that as more UnsafeUnion<A, B>, which has the correct size and alignment to hold either A or B, much like C's union rather than an enum and would be "safe" to transmute from if you were sure of the type. Regardless, the name is up for bikeshedding.

reem · 2014-10-16T06:09:17Z

@netvl Wow, Ceylon's union type is pretty cool! It is used in ways I wouldn't anticipate this being used in Rust though, i.e. I wouldn't necessarily expect let x = vec![1u, 3i, "hello"] to work or infer a type of Vec<uint | int | &'static str>, but that could be proposed as an addition to this RFC.

…ence.

Ericson2314 · 2014-10-16T07:17:38Z

"By "coerces to" I meant that the compiler will implicitly insert the translation". Ok, that's a step in the right direction IMO. If A | B can generate new enum (invocations would be memoized), vaguely analogous to the new closure sugar, something like this might be more practical.

Edit: Making A | B = B | A is still a challange regarding discriminants. Otherwise one could make Either2, Either3, Either4... and have a macro pick the right one.

netvl · 2014-10-16T07:27:56Z

@reem, well, "union types" is an established term for such kind of types, AFAIK. It is a bikeshed, however, I agree, I just wanted to clarify it :)

Union types in Ceylon (well, in just about any language on JVM, and not only union types - all types) rely much on subtyping. In Rust subtyping is almost nonexistent, and introducing it not only makes the type system much more complicated, there would also be no natural way to create subtypes in absence of inheritance or something like it. So it makes sense to treat union types just as anonymous enums, just as like you suggest.

After all, closures already are generating anonymous structures. I don't really know how the compiler operates, but I think this can be made in a similar way.

huonw · 2014-10-16T07:28:11Z

text/0000-anonymous-enums.md

+In the same vein, multiple occurrences of `A | B`, even in different crates,
+are the same type.
+
+As a result of this, no trait impls are allowed for join types.


Why not? Trait impls are allowed for tuples, which are anonymous structs.

Actually I had not thought this through. My main motivation in saying this was that you can't write an impl for a type or trait you didn't define, but with the new rules only one of the types in the union has to be yours. We could use exactly the same rules as with tuples.

huonw · 2014-10-16T07:31:37Z

Is T | T legal?

flying-sheep · 2014-10-16T07:39:42Z

Is T | T legal?

I'd prefer if not. That would make no sense and be just sad

mitsuhiko · 2014-10-16T07:39:59Z

The motivation mentions error handling but this problem can also be resolved in a different way through #201.

huonw · 2014-10-16T07:40:41Z

@flying-sheep why does it make no sense? enum Foo { X(T), Y(T) } makes sense.

kennytm · 2014-10-16T07:45:57Z

If a flat union P | (Q | R) == P | Q | R is needed, then T | T == T must be handled, otherwise (A | B) | (A | C) cannot be made sense of.

reem · 2014-10-16T08:08:55Z

@kennytm (A | B) | (A | C) would not be allowed because it would be flattened to A | B | A | C, which would not be allowed because of the duplication of A. I will clarify the illegality of duplicated types in joins in the RFC.

In the same vein, T | T is not allowed either.

My reasoning for this is that T | T is ambiguous to create or match against: in the case of struct T;:

let t = T as T | T; // ambiguous as to which anonymous variant is instantiated
match t {
    T => {} // ambiguous as to how this is matched
}

kennytm · 2014-10-16T08:35:16Z

@reem You cannot disallow that because:

type AB = A | B;
type AC = A | C;
...
type ABC = AB | AC; // `A` duplicated.

Also disallowing T | T makes it useless on generics:

fn perform_either<T, A, B>(a: || -> Result<T, A>, b: || -> Result<T, B>) -> Result<T, A | B> { ... }

perform_either(
        || rename_file(),
        || copy_and_delete_original_file(),
);
// Will get `IoError | IoError` here.

huonw · 2014-10-16T08:45:16Z

A previous proposal for this supported T | T by being positional, like a tuple.

reem · 2014-10-16T09:06:58Z

I think it's possible to resolve unambiguously without positioning if we just make T | T act like T | Void in the sense that you can't ever match against or instantiate the second variant, or something along those lines. Positioning makes this just shorthand for (Option<A>, Option<B>) and would require explicit instantiation syntax, which is significantly worse than the proposed sugar, IMO.

huonw · 2014-10-16T09:13:22Z

It's not shorthand for (Option<A>, Option<B>), since that allows for (None, None).

We could also have "do what I mean": let x: A = ...; let y: A | B = x; is fine since it's unambigous which variant is mean, it is just things like let y: A | A = x; that require an explicit constructor.

(This is true even with generics that can be instantiated to be the same type, since the point-of-coercion just sees the two distinct generic types, it does not care that they could end up both being u8: the variant decision has already been made.)

reem · 2014-10-16T09:37:19Z

I can't really think of a situation where I'd care about which variant was chosen out of A | A, so we could just have the choice be undefined.

arthurprs · 2014-10-16T13:31:50Z

I kinda prefer this over #201

Tobba · 2014-10-16T13:57:25Z

There are a lot of uses of union types outside of error handling
In my opinion

type A = T | T;

should be equal to

type A = T;

You could possibly allow coercing a union type to another union type during match.

glaebhoerl · 2014-10-16T15:42:56Z

As before I think we should clearly distinguish between anonymous sum types and union types, at least to avoid confusing ourselves. Both are potentially useful, they are vaguely similar but actually quite different, and the main thing they have in common is that both want the same syntax.

Anonymous sum types would be nothing more than special built-in syntax for enum types of any arity, just like tuples are for structs. Apart from the nicer syntax, it would be no more magical than the existing Result type. No automatic coercions, etc. You would have to explicitly construct and pattern match on them the same way as existing enums. (T|T) would make just as much sense as Result<T, T> (which is isomorphic to (bool, T)).

Union types would be a kind of type-based union which essentially has one variant per type, rather than "positional" variants. This may also allow automatic conversions from T to T|U, T|U to U|T|W, etc. Notably this is not really a sum type, because for sum types you would expect that the number of its inhabitants is equal to the sum of the number of its members' inhabitants, e.g. inhabitants(Result<T, U>) = inhabitants(T) + inhabitants(U) for all T and U, but this fails to hold for union types when T = U. It's kind of like the difference between HashSet and Vec. Different algebraic properties. A variant of union types is what is being proposed here.

(Maybe more later, only had time for this right now.)

ghost · 2014-10-17T05:06:59Z

This is an interesting idea but there are a number of subtleties involved that I'm not sure are fully addressed by the current proposal. Some care should be taken as there are non-obvious issues with an ad-hoc approach that can lead to unsoundness and other problems.

It would be useful to understand this proposal in relation to similar ideas elsewhere, like extensible/polymorphic variants in OCaml, extensible exceptions (exn) in SML, extensible variants and open data types in Haskell, union types (also here) (as opposed to intersection types), etc. There's quite a lot of literature on this (and the dual: extensible records) and the problems have already been spelled out in great detail along with several solutions and implementations.

I'm also concerned that interaction with generics will lead to serious usability issues without something like row-polymorphism and/or subtyping (like OCaml). I don't know enough about Rust's internals to have a good idea whether either of those would be feasible here but neither are trivial additions in most cases.

Tobba · 2014-10-17T05:45:42Z

Let me leave in an argument for T | T becoming T:

type Lib1Error = IoError | DatabaseError;
type Lib2Error = IoError | FooError;
type Lib3Error Lib1Error | Lib2Error;

reem · 2014-10-17T07:49:03Z

I've thought a bit about the duplicate type problem, and I want to articulate my new thoughts:

We must support duplicate types in joins, at minimum when instantiated through generics. Let's take an example:

fn try_both<A, B>(a: || -> Option<A>, b: || -> Option<B>) -> Option<A | B> {
    a().unwrap_or_else(|| b());
}

This function must work when instantiated with A and B as the same type, but I am not convinced that it must be unambiguous which variant was used, even if they are the same type. I think that if we sell this feature as inappropriate in the case where you care about being able to disambiguate T | T, such as in the above case.

If we assert that in the case where A | B is instantiated with the same types, it is optimized to T | T, which can be represented as T, then I think the above can work in a consistent way. We know the above optimization - that duplicate types are always compressed - will always be applicable because all types are statically known at compile time and all functions are monomorphized.

EDIT:

Effectively, if you instantiate the above method with the same type parameter, the type of the above function would be fn<T>(|| -> Option<T>, || -> Option<T>) -> Option<T>.

pythonesque · 2014-10-18T06:10:05Z

I do not like this idea. I am seeing comments like "I don't care about this case" in response to issues with generics, and that makes me think this isn't well-thought out. What does this actually do that the existing type system cannot already do? Is this going to be a weird wart on Rust's type system?

I think any solution like this must be motivated by existing problems with more than syntax. For example, if there were a way that this could lead to collapsed enum representations for repeatedly nested types, that would be a win in my view. But it is not clear to me that this proposal attempts this, which again leads me to feel that it isn't that well thought-out.

zwarich · 2014-10-18T06:20:51Z

If you think of T1 | ... | Tn construct as defining an anonymous sum type with constructors labeled by T1, ..., Tn that are inserted in a type-directed way, then it doesn't seem to make sense unless T1, ..., Tn are pairwise incompatible.

Also, if you want (A | B) | C to be identical to A | B | C with separate compilation, then you will have to give up using the same memory representation as normal enums, similar to the various implementations of row polymorphism that exist.

reem · 2014-10-18T11:48:12Z

@pythonesque This is a large and important feature, so I certainly think further consideration is warranted, but I also think that this, in some form, is a feature that we will likely want as it vastly simplifies many common cases.

zwarich · 2014-10-24T10:37:51Z

In the triage meeting today we decided to close this RFC and not merge it, as it is lacking some critical details with no single obvious solution. We did agree that this space of ideas has some compelling use cases, so I opened an issue in the RFCs repo to track looking into it further: #409.

glaebhoerl · 2014-10-24T13:02:49Z

As others have noted this seems like a small and obvious feature to improve convenience and ergonomics at first, but it's actually a pretty big one with potentially profound implications for the type system. At least in its most general form.

With respect to representation, my feeling is that the correct one would likely be (TypeId, UnsafeUnion<T1...Tn>). I.e. similar to a enum, but not the same: instead of a small integer based on the number of variants, the discriminant would be a TypeId. Given that it is supposed to be a type-based union, this makes sense. It seems like it would be difficult to support all of the various equivalences and coercions between different orderings and number of occurrences of the member types any other way, although I haven't thought it through all of the way. (Maybe even the fact that the UnsafeUnion part of it would cause it to have different type/alignment when extended with more types would end up being problematic.)

Here are the tricky questions I can think of. Most of them have already been posed (and often answered in a way) in the RFC and comments, but just to gather them in one place. I'm going to use a variadic Union<...> syntax instead of | for greater clarity. Ts... stands for zero or more types.

Do we have:
- A = Union<A>?
- Union<A, Ts...> = Union<A, Ts..., A>?
- Union<A, B, Ts...> = Union<B, A, Ts...>?
- Union<Union<Ts...>, Us...> = Union<Ts..., Us...>?
I think that apart from the first, these are probably things we would like to have. (I.e. so that two unions are equal iff the set of their member types is equal, irrespective of ordering or duplication, and they are "transparent" when nested.)
Regarding coercions and/or subtyping, do we have:
- A -> Union<A, Ts...>?
- Union<A, B> -> Union<A, B, Ts...>?
Again, these seem desirable to have (or more like the whole point).
Suppose we write fn foo() -> Union<Box<ToString>, Box<int>> { box 100i }. Which type is the result?
What about fn bar() -> Union<Box<ToString>, Box<Hash>> { box 100i }?
What about fn baz() -> Union<Option<int>, Option<char>> { None }?
What about fn quux() -> Union<int, char> { FromStr::from_str("this will fail at runtime, but that's not the point").unwrap() }? Which is the type of from_str(): int, char, or Union<int, char>? Relatedly:
Can we have impl FromStr for Union<int, char>?
Do we infer Union<int, char> as the type of if cond { 9i } else { 'x' }?

In the short term, a pragmatic approach might be to just forbid most of the things, e.g. to disallow type variables (and possibly trait objects as well) from appearing in unions, which would avoid many, but not all, of the difficulties (but also be less useful). In the longer term it's probably best to study the literature, such as linked by @darinmorrison.

But all in all I'm not sure that all this machinery would be worthwhile for what is in the end a "constant-factor" improvement in convenience, rather than an increase in expressiveness or abstraction power.

@reem:

but I also think that this, in some form, is a feature that we will likely want as it vastly simplifies many common cases.

This comment appears to be at odds with my previous sentence. My impression was that the main benefit of this feature would be being able to avoid a bunch of map_err() calls in error handling code, i.e. just a convenience thing, even if a significant one. But it doesn't seem to me that it would rise to the level of a "vast simplification". Do you disagree? (Honest question - my impression may be mistaken.)

scialex · 2014-12-09T23:14:29Z

@reem @glaebhoerl @zwarich

I have tried my hand at making a new version of this RFC that tries to address some of the concerns with it.

Do you think it's worth submitting a pull request?

zwarich · 2014-12-09T23:38:06Z

@scialex I haven't read the entire thing yet, but does it propose a compilation strategy?

scialex · 2014-12-10T00:02:13Z

@zwarich not explicitly. I do not think it really needs a huge one. The compiler figures out all the overlaps that could occur given the type information it has, creates an enum with one varient for each containing a type of the given combinations. matches and all impls are just desuggared to use this. We have type mono-morph so it should be guaranteed to work (I think, its been a while since compilers and I haven't taken a very deep look at rustc internals).

This does rely on being willing to have the in memory layout of ie (A | B) | C be different from A | B | C and A | (B | C) even though the compiler will put in the code to convert between them automatically and they are semantically the same. Further it is possible that we might sometimes convert a A to a A | A with a different memory layout for some returns/arguments where we cannot rewrite it.

Also this could cause perf problems with large numbers of very generic types but that doesn't seem to be the main usecase.

IE this would take a lot of time/memory

fn stuff<A, B, C, D, E>(a: &A, b: &B, c: &C, d: &D, e: &E) -> &(A|B|C|D|E) { ... }

since it would be really returning a 120-variant enum.

flying-sheep · 2014-12-10T09:59:20Z

your new RFC is great!

i’m desperately missing something like join types in my attempt to create a typesafe document tree (the children of one type can be of one of several other types, e.g. Document can have Vec<Box<Section|Header>> children)

reem · 2014-12-10T10:12:51Z

@scialex I like some of the changes in the RFC, but I think the treatment of duplicate types could use a lot of work, since it makes using variants of more than 2 extremely unergonomic - A | B | C could end up as A + B, B + C, A + C, or A + B + C in addition to A, B, or C. Forget matching on 4.

As a result, I don't think that RFC should be used as-is.

This has sort of slipped my mind in the past weeks, but I think I may take another stab at significantly clarifying many of the edge cases and important ideas brought up here.

kennytm · 2014-12-10T13:03:54Z

Now that we have the FromError trait, the first example is no longer a good one.

// implement FromError for LibError.
enum LibError { ... }
impl FromError<ErrorX> for LibError { ... }
impl FromError<ErrorY> for LibError { ... }

// then just use `try!` as is from now on.
pub fn some_other_operation() -> Result<(), LibError> {
    let x = try!(produce_error_x());
    let y = try!(produce_error_y());
    Ok(())
}

scialex · 2014-12-10T20:51:30Z

@flying-sheep: Cool that's actually almost the exact same reason why I wanted to revive it. I am making a file-system. stuff can return File|Directory|...

@kennytm: Added note about FromError.

@reem: Updated the RFC to address that concern, making it easier to match on large unions. I have improved it some and am going to submit it.

Submitted pull request #514

tema3210 · 2019-12-04T23:31:12Z

We can define anonymous enums as follows:
(T|K|O|P|...) - any number of any types;

We can match on them as follows:

match a {
   (a,_,_,_,...)=>{...},
   (_,b,_,_,...)=>{...},
   ...
};

As for any other types we can implement traits for them.
Auto implemented traits should be implemented as for most limited variant.

burdges · 2019-12-04T23:55:38Z

auto_enums crate provides this now.

RFC: Anonymous enum types called joins, as A | B

989a61f

Add support for defining anonymous, enum-like types using `A | B`.

reem mentioned this pull request Oct 16, 2014

First-class error handling with ? and catch #243

Merged

Added a new unresolved question about the interaction with type infer…

5f045d9

…ence.

huonw reviewed Oct 16, 2014
View reviewed changes

reem added 3 commits October 16, 2014 01:13

Clarify the rules for impls and duplicate types in joins.

d0915a9

Resolve a parser ambiguity in the as syntax for creating join values.

c683a70

Add an unresolved question relating duplicate types in joins.

c8e4fd2

zwarich closed this Oct 24, 2014

scialex mentioned this pull request Dec 10, 2014

Anonymous enum types (A|B) take 2 #514

Closed

reem mentioned this pull request Dec 15, 2014

Consider some form of extensible enums #409

Open

Ericson2314 mentioned this pull request Jan 26, 2015

Amend RFC 517: Revisions to reader/writer, core::io and std::io #576

Merged

flying-sheep mentioned this pull request Jun 9, 2015

Disjoins (anonymous enums) #1154

Closed

huonw mentioned this pull request Jul 16, 2015

Anonymous sum types #294

Open

RFC: Anonymous enum types called joins, as A | B #402

RFC: Anonymous enum types called joins, as A | B #402

Conversation

reem commented Oct 16, 2014

reem commented Oct 16, 2014

netvl commented Oct 16, 2014

Ericson2314 commented Oct 16, 2014

reem commented Oct 16, 2014

reem commented Oct 16, 2014

Ericson2314 commented Oct 16, 2014

netvl commented Oct 16, 2014

huonw Oct 16, 2014

Choose a reason for hiding this comment

reem Oct 16, 2014

Choose a reason for hiding this comment

huonw commented Oct 16, 2014

flying-sheep commented Oct 16, 2014

mitsuhiko commented Oct 16, 2014

huonw commented Oct 16, 2014

kennytm commented Oct 16, 2014

reem commented Oct 16, 2014

kennytm commented Oct 16, 2014

huonw commented Oct 16, 2014

reem commented Oct 16, 2014

huonw commented Oct 16, 2014

reem commented Oct 16, 2014

arthurprs commented Oct 16, 2014

Tobba commented Oct 16, 2014

glaebhoerl commented Oct 16, 2014

ghost commented Oct 17, 2014

Tobba commented Oct 17, 2014

reem commented Oct 17, 2014

pythonesque commented Oct 18, 2014

zwarich commented Oct 18, 2014

reem commented Oct 18, 2014

zwarich commented Oct 24, 2014

glaebhoerl commented Oct 24, 2014

scialex commented Dec 9, 2014

zwarich commented Dec 9, 2014

scialex commented Dec 10, 2014

flying-sheep commented Dec 10, 2014

reem commented Dec 10, 2014

kennytm commented Dec 10, 2014

scialex commented Dec 10, 2014

tema3210 commented Dec 4, 2019 • edited Loading

burdges commented Dec 4, 2019

RFC: Anonymous enum types called joins, as `A | B` #402

RFC: Anonymous enum types called joins, as `A | B` #402

tema3210 commented Dec 4, 2019 •

edited

Loading