-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Anonymous sum types #294
Comments
What's the state of this? |
Compared to tuples, anonymous enums would become increasingly tedious to use since a match statement would have
The syntax would be compatible with a future extension that allows enums to be declared with named choices:
|
I think the feature would be more useful without allowing matching, just doing trait dispatch. I guess it's a different feature, where |
@eddyb I've been putting some thoughts into a feature like that |
I'd think an |
Passing by, but if you are curious in syntax's then OCaml has anonymous sum types called Polymorphic Variants. Basically they are just a name, like `Blah, which can have optional values. An example of the syntax: # let three = `Int 3;;
val three : [> `Int of int ] = `Int 3
# let four = `Float 4.;;
val four : [> `Float of float ] = `Float 4.
# let nan = `Not_a_number;;
val nan : [> `Not_a_number ] = `Not_a_number
# let list = [three; four; nan];;
val list : [> `Float of float | `Int of int | `Not_a_number ] list The In the back-end at assembly time the names are given a globally unique integer (in the current implementation it is via hashing, a chance of collision but overall the chance is extremely low as well as warnings can be put in place to catch them), however I've seen talk of making a global registry so they just get incremented on first access efficiently. A plain Polymorphic Variant with no data is represented internally as an integer: `Blah Becomes the integer `Blah (42, 6.28) Gets encoded internally as an array of two fields in assembly, the first is the above number as before, the second is the pointer to the data of the tuple (although in most cases these all get inlined into the same memory in OCaml due to inlining and optimization passes). In the typing system the above would be However, about Polymorphic variants is that they can be opened or closed. Any system can pass any of them that they want, including passing through if you want. For example, a simple way to handle something like a generic event in OCaml would be like: let f next x = match x with
| `Blah x -> do_something_with x
| `Foobar -> do_something_else ()
| unhandled -> next unhandled Which is entirely type safe, dependent on what each function handles down the chain and all. The big thing on the typing system is that things can be open or close typed, I.E. they either accept any amount of Polymorphic Variants or a closed set of Polymorphic Variants. If something like anonymous sum type here were to be accepted then that concept would be exceedingly useful while being very easy and very fast to statically type. |
Anonymous sum types might interact with
You could make this make sense with an anonymous sum type of the form One could do it in
so the above code becomes
I could imagine some |
this might sound like a weird hack , but how about just making A|B sugar for 'Either', i suppose it might get even weirder to start composing A|B|C as Either<A,Either<B,C>> or have that mapping to something . What if there was some sort of general purpose 'operator overloading' in the 'type syntax' , allowing people code to experiment with various possibilities - see what gains traction |
@dobkeratops I'd rather just have a |
I wrote some code that could potentially fit into a library now that type macros are stable: https://gist.github.com/Sgeo/ecee21895815fb2066e3 Would people be interested in this as a crate? |
I've just come upon this issue, while looking for a way to avoid having some gross code that simply doesn't want to go away (actually it's slowly increasing, started at 8 variants and passed by 9 before reaching 12): use tokio::prelude::*;
pub enum FutIn12<T, E, F1, F2, F3, F4, F5, F6, F7, F8, F9, F10, F11, F12>
where
F1: Future<Item = T, Error = E>, // ...
{
Fut1(F1), // ...
}
impl<T, E, F1, F2, F3, F4, F5, F6, F7, F8, F9, F10, F11, F12> Future
for FutIn12<T, E, F1, F2, F3, F4, F5, F6, F7, F8, F9, F10, F11, F12>
where
F1: Future<Item = T, Error = E>, // ...
{
type Item = T;
type Error = E;
fn poll(&mut self) -> Result<Async<Self::Item>, Self::Error> {
use FutIn12::*;
match *self {
Fut1(ref mut f) => f.poll(), // ...
}
}
} I was thus thinking that it'd be great to have anonymous sum types that automatically derived the traits shared by all their variants, so that I could get rid of this code and just have my |
As I wrote here I think this use case would be better addressed by something modelled after how closures work. |
I don’t think it would be wise to make anonymous sum types nominally typed, as you seem to suggest. Structural typing, as with tuples, is far more useful and less surprising to the programmer. |
@alexreg What they're saying is that the specific use-case of wanting to return Therefore, anonymous sum types are separate (and mostly unrelated) from that use case. |
@Pauan Oh, well I agree with that. As long as we consider these things two separate features, fine. Thanks for clarifying. |
Oh indeed good point, thanks! Just opened #2414 to track this separate feature, as I wasn't able to find any other open issue|RFC for it :) |
I'm planning to get out a pull request for this proposed RFC. Most of you following this thread probably know that a number of proposals like this were rejected for being too complex, so its focus is minimalism and implementation simplicity rather than ergonomics and features. Any words before I get it out? (I've asked this question in multiple other areas to try to collect as much feedback before getting the proposed RFC out, fyi) https://internals.rust-lang.org/t/pre-rfc-anonymous-variant-types/8707/76 |
My two cents; I think the motivation for anonymous enums is broadly:
For these use cases, I think it's important that anonymous enums are easy to refactor. In the case of anonymous structs, the transformation is pretty painless. You just introduce a new tuple struct, and prefix each tuple with the struct's name: fn split(text: &'static str, at: usize) -> (&'static str, &'static str) {
(&text[..at], &text[at..])
}
assert_eq!(split("testing", 4), ("test", "ing")); + #[derive(Debug, PartialEq)]
+ struct Split(&'static str, &'static str);
! fn split(text: &'static str, at: usize) -> Split {
! Split(&text[..at], &text[at..])
}
! assert_eq!(split("testing", 4), Split("test", "ing")); On the other hand, I don't think structurally anonymous enums would be as helpful for prototyping since there isn't an equivalent in explicit enums. The transformation would likely involve tedious renaming and rearranging for every pattern: fn add_one(value: String | i64) -> String | i64 {
match value {
mut x: String => {
x.push_str("1");
x
}
y: i64 => {
y + 1
}
}
}
fn something(value: String | i64) {
match value {
x: String => println!("String: {x}"),
y: i64 => println!("i64: {y}"),
}
} + #[derive(Debug, PartialEq)]
+ enum AddOne {
+ String(String),
+ i64(i64),
+ }
! fn add_one(value: AddOne) -> AddOne {
match value {
! AddOne::String(mut x) => {
x.push_str("1");
! AddOne::String(x)
}
! AddOne::i64(y) => {
! AddOne::i64(y + 1)
}
}
}
! fn something(value: AddOne) {
match value {
! AddOne::String(x) => println!("String: {x}"),
! AddOne::i64(y) => println!("i64: {y}"),
}
} Not to mention that the names would likely be changed from I also think that anonymous enums should be able to represent more common stateful enum types similar to use std::cmp::Ordering;
fn max(a: i64, b: i64) -> (i64, Ordering) {
match a.cmp(&b) {
Ordering::Less => (b, Ordering::Less),
Ordering::Equal => (a, Ordering::Equal),
Ordering::Greater => (a, Ordering::Greater),
}
}
assert_eq!(max(4, 7), (7, Ordering::Less)); use std::cmp::Ordering;
+ enum SomeOrdering {
+ Less(i64),
+ Equal(i64),
+ Greater(i64),
+ }
! fn max(a: i64, b: i64) -> SomeOrdering {
match a.cmp(&b) {
! Ordering::Less => SomeOrdering::Less(b),
! Ordering::Equal => SomeOrdering::Equal(a),
! Ordering::Greater => SomeOrdering::Greater(a),
}
}
! assert!(matches!(max(4, 7), SomeOrdering::Less(7))); I would prefer a more general syntax where the variants are explicitly named and referenced using something like return type notation. If the anonymous enum is converted into an explicit one in the future, any code referencing its variants could still function with zero refactoring required. use std::cmp::Ordering;
fn max(a: i64, b: i64) -> enum {
Less(i64),
Equal(i64),
Greater(i64),
} {
match a.cmp(&b) {
Ordering::Less => max()::Less(b),
Ordering::Equal => max()::Equal(a),
Ordering::Greater => max()::Greater(a),
}
}
assert!(matches!(max(4, 7), max()::Less(7))); use std::cmp::Ordering;
+ enum SomeOrdering {
+ Less(i64),
+ Equal(i64),
+ Greater(i64),
+ }
! fn max(a: i64, b: i64) -> SomeOrdering {
match a.cmp(&b) {
Ordering::Less => max()::Less(b),
Ordering::Equal => max()::Equal(a),
Ordering::Greater => max()::Greater(a),
}
}
assert!(matches!(max(4, 7), max()::Less(7))); Alternatively, maybe a general syntax for explicit enums with anonymous variants could be defined, although it seems niche and awkward to me. Your friend, |
I found this issue while trying to do a struct where I wanted to contain either The syntax I imagined was let mut value: <u8, u8, u8> = ::0(1);
value = ::1(2);
value = ::2(3);
match a {
::0(a) => println!("first: {a}"),
::1(b) | ::2(b) => println!("second or third: {b}"),
} Using number indexes, inspired by how tuples work. |
i dont like the order being relevant... addition is commutative, so different permutations of the same sum type should be equivalent (like reordering fields in a struct). i dont want to have to convert the output of a function to pass it to another if they disagree on the order i think what most of us want is a sum type like for a tuple, though your suggestion is interesting because it's a nice parallel with how tuples work, i don't think this is what we're looking for |
I like having order matter since it fixes a hard problem: |
Doesn't that address the issue? |
no, because you should be able to write |
this is because the compiler erases all lifetime information before it generates the de-genericified code where all generics are substituted by actual types, which means that it can't tell the difference between |
both are valid use cases but they really are different you're describing C++'s i think it wouldn't be too difficult to implement but TypeScript-like unions, for which the order isn't relevant, would probably require compiler-level support (so that i wanna be able to write a function that returns some |
For the record, this particular issue is explicitly about anonymous sum types, not union types. I.e. with position/tag/discriminant-based matching (like Rust |
@glaebhoerl I don't know if Wikipedia is wrong here, but:
Maybe that's why the confusion. I don't think anyone is asking for an untagged union: https://news.ycombinator.com/item?id=32018886 So I think we're actually asking for the same thing, i.e. tagged union, a.k.a. sum type. |
The thread you quoted seems to be about Haskell-like languages, and I guess the untagged union in that context differs from what you imagine (maybe the A union type is like a set union. It differs from sum type (corresponds to Rust's Some people in this issue have proposed the union type in this sense in addition to the sum type. |
Yeah, that Haskell-related discussion doesn't really make sense here. We already have tagged unions in Rust, they're called enums. Each tag is the variant's name. Our goal here is having untagged (or anonymous) ones so you can define |
@Keavon you seem to be confusing tagged/untagged with named/anonymous, they are very different things. Rust already has untagged and tagged named unions, what it's missing are anonymous tagged unions. "tag" refers to the enum discriminant. Untagged unions do not have a discriminant, and so are unsafe to access. See https://en.wikipedia.org/wiki/Tagged_union The names of enum variants are not called tags - they have associated tags, for example: enum Foo {
A, // Might have tag 0
B, // Might have tag 1
C // Might have tag 2
} Tags also exist for anonymous enums, since the compiler still needs to differentiate which variant is selected, for example: type Foo = A /* might get tag 0 */ | B /* might get tag 1 */ | C /* might get tag 2 */ |
I see, thanks for pointing out my terminology error. If I'm reading what you explained correctly, I think you're responding to this part of my second sentence above:
(is that right?) Rephrasing what I wrote above, then, I think that I was describing a goal of having the compiler's type system figure out the tags behind the scenes, allowing you to write code with anonymous variant names as well as anonymous enum types. So behind the scenes, it would be tagged (using your illustrations of
The result should be an equivalent to TypeScript's approach, however with the ability for the compiler to discriminate between the types at compile time so the code doesn't have to match based on something kind a |
The key distinction lies in this scenario: type Foo<A, B> = A | B;
type Bar = Foo<i32, i32>; // !!
fn bar_consumer(bar: Bar) {
match bar {
// ??
}
} Option 1: Union type (like TypeScript)In this case, type Foo<A, B> = A | B;
type Bar = Foo<i32, i32>;
fn bar_consumer(bar: Bar) {
match bar {
i: i32 => ...,
// No other options are possible.
}
} Option 2: Sum type (like standard Rust enums)With a sum type, each choice in the anonymous enum must be assigned a locally unique name. Here, I chose to use the type parameter itself as the name. There are other alternatives of course, like giving them integer names analogous to tuples ( type Foo<A, B> = A | B;
type Bar = Foo<i32, i32>; // Remains a two-member union
fn bar_consumer(bar: Bar) {
match bar {
A(i: i32) => ...,
B(j: i32) => ...,
}
} |
Thanks for laying that out clearly @Rufflewind, that's easy to understand. Perhaps I come from a biased perspective as a TypeScript user with only a cursory understanding of type theory, but to me option 1 seems like the obvious, no-brainer, only good solution. But I must be missing the perspective to make a fair comparison. Are there downsides to option 1, and upsides to option 2? Is there a compelling reason to consider option 2 or are people just discussing it because it's technically the name of this issue? |
yes there are downsides, imo critically so for Rust, since Rust has lifetimes where each different lifetime makes references different types, but Rust is intentionally designed such that when generating the final compiled code, it never knows what lifetimes anything has (lifetimes have been erased). This means that using the TypeScript-style Union approach can't work for the following code: fn do_match<T, F>(v: <T | F>) -> Result<T, F> {
// this requires Rust to be able to distinguish T and F when generating code for this function,
// which happens after lifetimes are erased. So, if `T = &'a u8` and `F = &'b u8`,
// then when Rust is generating code for this function it sees
// `T = &'erased u8` and `F = &'erased u8`, so are they the same type or not?
//
// If Rust decides they're the same type, then does Rust return Ok?
// if so, then this can be used to convert one lifetime `'b` into another
// lifetime `'a` in safe code, which is unsound since `'a` could live for
// much longer than `'b` so code can use memory after it has been freed,
// which is Undefined Behavior (aka. rustc is allowed to generate code that
// does anything at all, such as crash, overwrite unrelated memory, delete
// all your code, do exactly what you wanted, format your harddrive, etc.).
// If Rust decides to return Err instead, you can still convert lifetimes,
// just the other way around, so it's just as bad.
//
// If Rust decides they're not the same type, then when some other
// function tries to call this function where `'a` and `'b` are actually the
// same lifetime, then the caller will think it's passing a TypeScript-style
// Union type with only one variant, but `do_match`'s code will have been
// generated for a TypeScript-style Union type with two variants, which is
// also unsound and Undefined Behavior!
match v {
T(t) => Ok(t),
F(f) => Err(f),
}
} |
Interesting... I suppose there will always have to be some level of compromises but it's a matter of picking the least bad ones, while also ensuring that we don't use small downsides as a reason against even attempting the larger gains. (The "but sometimes..." fallacy coined by Technology Connections in his video about LED stoplights being worse at melting snow so people don't want to switch to them at all, or engineer mitigations.) So I suppose I have a couple questions:
|
I think two entirely unrelated features are being talked about concurrently One would be just syntactic sugar to make anonymous "sum" types behave the same way as enums do currently. I would like to see this in the language. My understanding:
The second features being talked about are a sum type (I will call it Union type). My understanding:
I think what's missing for this second feature is making traits closed to extension. Right now we can have a trait U and two conforming structs A and B, but we cannot be 100% certain (usually) that A and B are the only structs that can conform to this trait. Just a syntax to say "Hey, the only structs allowed to implement this trait are this list I am defining" should be able to serve the niche of a union type. We should be able to declare that |
@programmerjake
I'm not exactly familiar with TypeScript unions, so could you expand on this? What exactly would cause Undefined Behavior here? Beyond that, I agree that anonymous sum types including generics are a bit awkward. |
This is a union type but not a sum type: a characteristic feature of a sum type |
sort of (e.g. wasm can still distinguish between passing a i32 and a i64), but that isn't the relevant difference, which is that Rust, when generating the final code (at the stage where it substitutes the actual types into generics and generates separate functions for each combination of types used, so it does know the exact types then except without lifetime information).
The key part of TypeScript-style unions here is that they act like |
I disagree, I think that anonymous sum types should be like tuples, where you select which one you want based on position, and selecting which one you want based on type should be syntax sugar at most -- so by the time any code is generated and/or any generics are substituted with actual types, all such matching is already converted to matching based on position so doesn't need to know if types are the same or not. |
Positional selection does very little to help us solve the primary ergonomics problems in Rust - most notably, error handling being atrocious. Sadly, reviewing the opening post, it does describe a positional selector mechanism, which ... While novel, leaves us in the doldrums with regard to solving existing, practical problems. We should probably file or contribute to an RFC for anonymous / TypeScript-style (but automatically tagged, and thus safely matchable back to original type!) unions. Whether such an RFC works through sum types in the compiler or not is a matter of design and implementation, and - while it may have Rust-ABI implications in the long term - does not actually matter all that much for a description of intended usage and syntax. |
I long proposed I suspect error handling requires one first considers the actual calling convention. At present Rust only has variance from lifetimes, but we could consider either complicating variance or else providing implicit We'll likely need attributes that constructed the specific variant types, or at least tell the compiler what's going on.
Along similar lines, you could imagine explicitly merging enums, which only works within one crate and adjusts discriminants, like
Anyways, there are perforamance concerns which require care, so anonymous sum types might fit poorly. |
If they can't afford to use tagged unions in their errors, they can choose not to do so. The point here is to supplant the need for layers of wrapping The whole point of bringing this up was to make it possible to avoid writing all of that boilerplate, since all of that extra work just gets us ... Java's old Checked Exceptions, which were too boilerplatey for anyone to actually use back then either, despite being less work than our current state of affairs. |
A hypothetical syntax for anonymous sum types with labeled choices might be: fn foo(...) -> Result<T, Err(enum { X, Y(i32), Z(String) })> { ... }
fn bar(e: enum { X, Y(i32), Z(String) }) {
match e {
enum::X => { ... },
enum::Y(i) => { ... },
enum::Z(s) => { ... },
}
} which could desugar into, say: enum __AutogeneratedEnum735fc68a {
X = 0x4b68ab38,
Y(i32) = 0x18f5384d,
Z(String) = 0xbbeebd87,
}
fn foo(...) -> Result<T, __AutogeneratedEnum735fc68a> { ... }
fn bar(e: __AutogeneratedEnum735fc68a) {
match e {
__AutogeneratedEnum735fc68a::X => { ... },
__AutogeneratedEnum735fc68a::Y(i) => { ... },
__AutogeneratedEnum735fc68a::Z(s) => { ... },
}
} The discriminants are generated deterministically from the labels "X", "Y", and "Z" respectively. (This toy example uses sha256.) Deterministic discriminants would allow for some degree of cheap enum-to-enum coercion, e.g. perhaps |
Why do we erase type identity prior to solving which type we're passing via which paths? This seems to propose that we can't fix that - or even patch it over with a rule like "matching types combine to the shortest lifetime"? |
I'll give this one more try: I originally opened this issue back in 2013. It is about anonymous sum types with positional matching, for symmetry with tuples. No extra type-based magic, just as with tuples. I agree that type-based features also seem potentially valuable, but please, I beg, take those discussions to other issues and threads, because they are separate features. |
Issue by glaebhoerl
Saturday Aug 03, 2013 at 23:58 GMT
For earlier discussion, see rust-lang/rust#8277
This issue was labelled with: B-RFC in the Rust repository
Rust has an anonymous form of product types (structs), namely tuples, but not sum types (enums). One reason is that it's not obvious what syntax they could use, especially their variants. The first variant of an anonymous sum type with three variants needs to be syntactically distinct not just from the second and third variant of the same type, but also from the first variant of all other anonymous sum types with different numbers of variants.
Here's an idea I think is decent:
A type would look like this:
(~str|int|int)
. In other words, very similar to a tuple, but with pipes instead of commas (signifying or instead of and).A value would have the same shape (also like tuples), with a value of appropriate type in one of the "slots" and nothing in the rest:
(Nothing is a bikeshed, other possible colors for it include whitespace,
.
, and-
._
means something is there we're just not giving it a name, so it's not suitable for "nothing is there".!
has nothing-connotations from the negation operator and the return type of functions that don't.)I'm not sure whether this conflicts syntax-wise with closures and/or negation.
Another necessary condition for this should be demand for it. This ticket is to keep a record of the idea, in case someone else has demand but not syntax. (If the Bikesheds section of the wiki is a better place, I'm happy to move it.)
SEE ALSO
A | B
#402The text was updated successfully, but these errors were encountered: