Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rust Structure and Implementation "Embedding" Brainstorm #2431

Open
Eoin-ONeill-Yokai opened this issue May 4, 2018 · 23 comments
Open

Rust Structure and Implementation "Embedding" Brainstorm #2431

Eoin-ONeill-Yokai opened this issue May 4, 2018 · 23 comments
Labels
T-lang Relevant to the language team, which will review and decide on the RFC.

Comments

@Eoin-ONeill-Yokai
Copy link

Eoin-ONeill-Yokai commented May 4, 2018

I've been talking about code reuse in Rust with my brother ( @emmetoneillpdx ) and one of the ideas we considered was a form of "static inheritance" which basically amounts to a syntax for automatically pulling either data or functions (or both) from existing structs and trait implementations. The proposed syntax is roughly based on Rusts' existing "Struct Update Syntax". Here's a simple pseudocode example:

trait Consume {
    fn eat(&self);
    fn drink();
}

struct Animal {
    favorite_food : &'static str,
}

impl Consume for Animal {
    fn eat(&self) {
        println!("I'm eating some {}!", self.favorite_food);
    }

    fn drink() {
        println!("I'm drinking now!");
    }
}

struct Dog {
    name : &'static str,
    ..Animal, // Statically "inherit" any field that is not already defined in Dog from Animal
}

impl Consume for Dog {
    fn drink() { 
        println!("I'm drinking out of a dog bowl!");
    }

    ..Animal // Likewise, statically "inherit" any undefined Consume function implementation for Dog from Animal
                 // Since drink() has already been implemented for Dog, Animal's implementation won't be used.
}
//If Dog tries to inherit an implementation referring to a variable outside of its scope, compile error.
//Fundamentally the same as if you tried to copy + paste the eat(&self) implementation from Animal to Dog. (but without copy and pasting code!)

Allowing for code reuse this way should not fundamentally change the way Rust behaves at runtime but still allows for behavior that is somewhat similar to inheritance without error-prone code duplication practices. Essentially, we're just asking the compiler to copy/paste any undefined fields or functions from any known (at compile time) struct or implementation.

We don't have anything more concrete than that right now, but this is an idea that came up over the course of a conversation and we both wanted to share it and see what people thought. I'd love to hear feedback and form a discussion around this idea or others like it. Also, if you know of any relevant RFCs where it might be appropriate to share this (or is already working on a similar idea), please share.

@Ixrec
Copy link
Contributor

Ixrec commented May 4, 2018

Some relevant info:

  • If I understand the proposal correctly, the change to struct declarations is something that Go has, and they call it "type embedding" to distinguish it from the far more heavyweight "inheritance" of other languages: https://golang.org/doc/effective_go.html#embedding
  • The change to impl blocks appears to be what we've previously called "delegation". Past discussions on delegation have made it clear that delegation is way too complicated and nuanced a problem for the solution to ever be as simple as this, so it should probably stay a separate proposal with separate syntax. The current RFC is RFC: Delegation #2393
  • The main reason this hasn't been done in Rust already is that there are other motivations for an inheritance-like feature that aren't solved by embedding, because they need some performance or layout guarantees. Sometimes this is called the "virtual struct" problem. It's spelled out in detail over at Efficient code reuse #349.
  • Last I heard, there was a general consensus that any good solution to the virtual struct problem would involve the "fields in traits" proposal, which is at https://github.com/nikomatsakis/fields-in-traits-rfc

What I don't recall any preexisting discussion about is whether or not "embedding" could or should exist alongside a complete virtual struct solution.

@emmetoneillpdx
Copy link

Hi Ixrec.

the change to struct declarations is something that Go has, and they call it "type embedding" to distinguish it from the far more heavyweight "inheritance" of other languages: https://golang.org/doc/effective_go.html#embedding

Yep. That seems to be the case, more or less. Neither of us is too familiar with Go, so thanks for pointing that out.

While embedding data into structs this way isn't terribly different than simply adding a field and delegating to the field, in my opinion there is an important distinction to be made: a field that's embedded from another struct would be accessed by object.field instead of object.subobject.field. It's the equivalent of automatically copying all the fields of one struct and embedding them into another struct.

As such, I think these two concepts (struct "embedding" and implementation "delegation") are actually very relevant to each other in the context of this particular code reuse schema. For example:

trait Consume {
    fn eat(&self);
    fn drink();
}

impl Consume for Animal {
    fn eat(&self) {
        println!("I'm eating some {}!", self.favorite_food);
    }

    fn drink() {
        println!("I'm drinking now!");
    }
}

impl Consume for Dog {
    ..Animal
}

In this simple example, Dog would be delegating/inheriting/embedding all implementations from Animal. As a result, the eat(&self) method that Dog has "inherited" expects to be able to access a variable self.favorite_food. Without structure "embedding", manually adding a field of the same name and type, or copy-pasting, Rust would fail to compile with a "cannot find value 'favorite_food' in this scope" error. Even if you were to add a field of type Animal to Dog this error would continue unless the compiler knew to treat self.animal.favorite_food is the same thing as self.favorite_food.

In this case, I would argue that embedding ..Animal inside of Dog is actually cleaner and more readable; you're now saying "this class doesn't contain an Animal, but it does promise to have all of the same fields as Animal in its scope". Do you see what I mean? Because of this I'd argue that struct embedding and implementation reuse (in this specific context) are actually closely related and having a similar syntax makes some sense; in both cases you're asking the compiler to reuse code on your behalf in a way that ensures they will still work with each other.

The change to impl blocks appears to be what we've previously called "delegation". The current RFC is #2393

Great, we'll check it out and join the ongoing conversation over there. Again, thanks for the heads up.

The main reason this hasn't been done in Rust already is that there are other motivations for an inheritance-like feature that aren't solved by embedding, because they need some performance or layout guarantees. Sometimes this is called the "virtual struct" problem. It's spelled out in detail over at #349.

Without a doubt, the "delegating/embedding" syntax that we're talking about here wasn't suggested as a full or comprehensive implementation of traditional inheritance with virtual methods, implicit upcasting, struct layout guarantees, etc.

Instead, we're talking about a pretty clean syntax for basic code reuse that Rust is lacking as of now. You'd be able to easily share implementation among structures that share common traits, without complex syntax or bad coding practices, but it's not full c++/java style OOP nor is it meant to be. One of the things that I think we can all agree makes Rust great is that it has been designed in a way that helps us to write good code and avoid many kinds of bugs - well, I think we can also all agree that copy-pasting code, which is sloppy and error-prone, is really not in line with that core philosophy!

At any rate, if Rust is looking for a traditional and comprehensive solution to OOP/inheritance then I agree that may demand a much more complex solution. But when you look at this through the lens of quick-and-easy code reuse, I think this idea and syntax still has merit especially since it could potentially co-exist with a "true" inheritance implementation.

@Eoin-ONeill-Yokai Eoin-ONeill-Yokai changed the title Rust "Static Inheritance" Brainstorm Rust Structure and Implementation "Embedding" Brainstorm May 4, 2018
@Eoin-ONeill-Yokai
Copy link
Author

I just want to add that I've changed the title of this RFC to use the term "embedding" instead of "inheritance" to help put the focus on code reuse instead of OOP.

@Centril Centril added the T-lang Relevant to the language team, which will review and decide on the RFC. label May 4, 2018
@Centril
Copy link
Contributor

Centril commented May 4, 2018

If we were to do some sort of type embedding, I would prefer some syntax a bit more loud than reusing FRU syntax. Perhaps:

struct Foo {
    x: usize,
    use Bar,
}

There's also issues around visibility that needs to be worked out, for example:

  • If Bar has pub fields, does ..Bar retain the visibility or does everything become private?
  • How do you make the visibility public syntactically? Should you be able to change it?

@emmetoneillpdx
Copy link

emmetoneillpdx commented May 4, 2018

If we were to do some sort of type embedding, I would prefer some syntax a bit more loud than reusing FRU syntax. Perhaps:

struct Foo {
    x: usize,
    use Bar,
}

Hello Centril. The use keyword would also be fine with me (although I also think the 'FRU syntax' is elegant). But as you said, use is very noticeable and I think the meaning is pretty clear. My only question is that would use also be fine in the event that "implementation embedding" was also added? For example:

trait Consume {
    fn eat(&self);
    fn drink();
}

struct Animal {
    favorite_food: &'static str,
}

impl Consume for Animal {
    fn eat(&self) {
        println!("I'm eating some {}!", self.favorite_food);
    }

    fn drink() {
        println!("I'm drinking now!");
    }
}

//================= Basic Embedding 

struct Dog {
    name : &'static str,
    use Animal, //Dog 'embeds' any fields that aren't already defined in this scope from Animal
}

impl Consume for Dog {
    use Animal; //Dog 'embeds' Animal's entire Consume trait implementation.
}

//================= "Override" Embedding

impl Consume for Dog {
    fn drink() {  //Dog implements its own drink()
        println!("Woof! I'm drinking out of a dog bowl!");
    }

    use Animal; //But then 'embeds' any remaining Consume trait functions from Animal.
}

//================= Multiple Embedding

impl Consume for SnakeDog {
    fn eat(&self) use Snake; //SnakeDog eats like a Snake!
    fn drink() use Dog; //But drinks like a Dog! Imagine that!
}

That makes sense, is really quick, and it also reads very clearly, in my opinion!

As for structure embedding with pub fields. In my opinion, embedding should generally be the equivalent of directly copy-pasting code. If Foo embeds Bar and Bar has a pub x then Foo should now have a pub x, i think...

In the case of generics, for example, I think it'd be visually cleaner if structs which embed a generic struct have to be explicitly generic themselves. For example:

struct Gen<T> {
    x: T,
}

struct Data {
    //...//,
    use Gen<T>,
} //Bad, Unclear, Compile error!

struct Data<T> {
    //...//,
    use Gen<T>,
} //Readable! OK!

The compiler simply tries to copy-paste the Gen fields into Data, but in order for x: T to make sense in the scope of Data, the programmer must add the generic to Data manually. While the compiler could probably be programmed to do it, it'd be much more readable for the programmer to do it manually.

@Centril
Copy link
Contributor

Centril commented May 4, 2018

We're currently reserving delegate, #2429, in edition 2018 for delegation; previously #1406 used use.

My concern is that use Foo inside impls is easily confused with paths and stuff.

@emmetoneillpdx
Copy link

emmetoneillpdx commented May 5, 2018

Here's a description of embedding that I wrote in the delegation thread #2393. I think it probably makes sense to paste it here too:

Embedding is not meant to be a comprehensive solution or replacement for traditional inheritance. Instead, like delegation, it's meant to be a nice syntax for effective code reuse that works within the existing structure and trait paradigm of Rust. The syntax that we're suggesting to use is ..T, which is lean, simple, and roughly based on Rust's "struct update syntax". Let's start with this simple code:

trait Consume {
    fn eat(&self);
    fn drink();
}

struct Animal {
    favorite_food: &'static str,
}

impl Consume for Animal {
    fn eat(&self) {
        println!("I'm eating some {}!", self.favorite_food);
    }

    fn drink() {
        println!("I'm drinking now!");
    }
}

We have a small trait, Consume, a struct, Animal, and Animal's implementation of the Consume trait. But what happens if I want to create a new structure, Dog, which shares Animal's implementation of the Consume trait? Well, I could give my Dog an internal Animal field, internal_animal and forward/delegate each Consume function call to internal_animal. But what if we could use structure and implementation "embedding" to make that cleaner, easier, and more readable? Something like this:

struct Dog {
    name: &'static str,
    ..Animal //Dog 'embeds' any fields that aren't already defined in this scope from Animal
}

impl Consume for Dog {
    ..Animal //Dog 'embeds' Animal's entire Consume trait implementation.
}

This is "embedding", a simple method of code-reuse that allows us to basically tell the compiler to copy-paste code for us! Embedding allows you to transparently merge the fields of one struct into another, as well as to share function implementations among structures that share a trait.

To go into more detail, in the above example the structure Dog 'embeds' all fields of Animal, which means that Dog now also contains the field favorite_food: &'static str.

Similarly, Dog's implementation of the Consume trait 'embeds' all of function/method implementations that were defined in Animal's implementation of the Consume trait. In other words, embedding tells the compiler to turn this:

impl Consume for Animal {
    fn eat(&self) {
        println!("I'm eating some {}!", self.favorite_food);
    }

    fn drink() {
        println!("I'm drinking now!");
    }
}

impl Consume for Dog {
    ..Animal
}

into this:

impl Consume for Animal {
    fn eat(&self) {
        println!("I'm eating some {}!", self.favorite_food);
    }

    fn drink() {
        println!("I'm drinking now!");
    }
}

impl Consume for Dog {
    fn eat(&self) {
        println!("I'm eating some {}!", self.favorite_food);
    }

    fn drink() {
        println!("I'm drinking now!");
    }
} //If we could look behind the scenes, the embedded implementations would appear to have been copy-pasted!

Ok, we can embed any known struct into any other custom struct this way. But, what happens if we try to embed from a type that does that doesn't implement the same trait? Maybe some compile error like this:

error[E####]: Dog cannot embed Consume implementation from Car because Car does not implement the Consume trait! Did you mean ..Cat?

Ok, but if Dog embeds Animal's Consume implementation, how can we guarantee that Dog will have the self.favorite_food field that's accessed in eat(&self)? We don't need to because the Rust compiler should already catch this and throw this error:

error[E0425]: cannot find value 'self.favorite_food' in this scope

At this point the programmer can either embed Animal into Dog (shown above and usually recommended) or they could even manually add the field favorite_food: &'static str into Dog (which may be appropriate if only a small number of fields are required to satisfy the embedded implementations).

How is this different than adding an Animal field to Dog, and why use the same syntax for both structures and trait implementations?

Adding something like internal_animal: Animal to Dog would certainly be similar, because Dog would now have access to it's own Dog fields as well as Animal fields, much like embedding. However there are two key differences:

1.) Structure embedding copies all of the fields from Animal and pastes them into Dog, skipping any fields of the same name:Type that already exist. As such, embedding Animal multiple times wouldn't do change anything, nor would embedding a Animal alongside another structure that also embeds Animal. In other words, Dog will only ever be able to have one favorite_food: &'static str no matter what it embeds or how many times it tries to embed it. Conversely, a Dog could contain multiple fields of type Animal or even a container of Animals.

2.) Remember that the eat(&self) method that Dog embedded from Animal accessed a variable that was bound to self.favorite_food! Without embedding (or even manually adding the field), Dog would not have that field. Even if Dog had an internal_animal: Animal field, the function would either need to be changed to access self.internal_animal.favorite_food, or the compiler would have to do extra work to recognize that self.favorite_food is actually referring the self.internal_animal.favorite_food. As such, "embedding" implementations and structures should result in code reuse that works together nicely as embedded methods shouldn't need to change at all.

Here's a few more examples. The first is a partial implementation of Consume for Dog, which implements drink() but embeds any other undefined functions/methods from Animal. This is someone analogous to method overriding in OOP:

//Partial "Override" Embedding
impl Consume for Dog {
    fn drink() {  //Dog implements its own drink()
        println!("Woof! I'm drinking out of a dog bowl!");
    }

    ..Animal //But then 'embeds' any remaining Consume trait functions (i.e.: eat(&self)) from Animal.
}

Finally, here's an example of a spin of the syntax for allowing cherry-picked embeddings from multiple different implementations of the Consume trait:

//Multiple Embedding
impl Consume for SnakeDog {
    fn eat(&self) ..Snake; //SnakeDog eats like a Snake!
    fn drink() ..Dog; //But drinks like a Dog! Imagine that!
}

That's all for now. The thing to keep in mind is that embedding isn't meant to be "inheritance in rust" and that the focus of embedding (much like delegation) is supposed to be a quick-and-clean syntax for basic code reuse which doesn't attempt to drastically change Rust's underlying paradigms. Embedding is simply about avoiding the sloppiness, tedium, and bugs associated with bad coding practices like copy-pasting code by giving the programmer a useful and convenient syntax for code reuse.

What's also important to note is that the existence of embedding in Rust wouldn't preclude other features like delegations or even 'true' inheritance. Delegation and inheritance could still be useful in various situations or designs which might require dynamic behavior or other complex setups, while embedding is fundamentally a compile-time shorthand for simple code reuse, similar to a context aware version of a traditional "#include".

@H2CO3
Copy link

H2CO3 commented May 6, 2018

Note that this is essentially at least a subset of inheritance, even if you phrase it in a very roundabout way. In particular, it brings up the many issues that come with "traditional" OO inheritance which considers fields too. For example, it gives rise to the well-known (and dreaded) diamond problem. It also looks painfully non-orthogonal or non-uniform in Rust's rich type system: opposed to the existing trait inheritance, this embedding would only work with structs, and not with other types such as enums or primitives.

Rust had more OO-like class types in the past, before the 1.0 release. They have been removed because they had been deemed not worthy of existence in the face of their limited improvement in usefulness (given that the trait system already provides inheritance of behavior) and the numerous problems they introduce into the language.

Finally, one more very specific piece of criticism:

Structure embedding copies all of the fields from Animal and pastes them into Dog, skipping any fields of the same name:Type that already exist.

This looks very scary. If embedding ever gets implemented, this situation should definitely provoke a compiler error. Silently ignoring duplicate fields in the very definition of a type is dangerous, it can lead to extremely subtle and surprising errors.

@Ixrec
Copy link
Contributor

Ixrec commented May 6, 2018

Note that this is essentially at least a subset of inheritance, even if you phrase it in a very roundabout way. In particular, it brings up the many issues that come with "traditional" OO inheritance which considers fields too. For example, it gives rise to the well-known (and dreaded) diamond problem. It also looks painfully non-orthogonal or non-uniform in Rust's rich type system: opposed to the existing trait inheritance, this embedding would only work with structs, and not with other types such as enums or primitives.

Good points. Somehow I'd missed that "impl embedding" does so much with so little syntax that it sneakily reintroduces all the usual gotchas and corner cases of traditional inheritance, like the diamond problem. It's only struct embedding that has the very simple, straightforward semantics that make it work so well in Go (and even that is partially because Go lacks complications like generics).

Therefore, I'm convinced that, whether or not we should have struct embedding, it should be a separate feature from delegation.

So we're probably at the point where, as with many other requests for sugar syntax, whoever wants to see this feature happen needs to produce some compelling examples of realistic code (not Cat/Dog/Animal toy examples) where this would be a significant improvement.

@crlf0710
Copy link
Member

crlf0710 commented May 7, 2018

I posted this on the forum:
https://internals.rust-lang.org/t/idea-layout-inheritance-once-more-an-easier-way/7461

@emmetoneillpdx
Copy link

emmetoneillpdx commented May 10, 2018

Note that this is essentially at least a subset of inheritance, even if you phrase it in a very roundabout way.

We originally used the term "inheritance", but that's also a term with a lot of assumptions and baggage behind it. As this doesn't do all of the things that many people expect traditional inheritance to do, I'm drifting further towards calling it "embedding" of data and implementations. At any rate, I don't mind whatever it's called.

In particular, it brings up the many issues that come with "traditional" OO inheritance which considers fields too. For example, it gives rise to the well-known (and dreaded) diamond problem.

Hmm. I'm not really seeing this. Sure, if "blanket" embedding of impl blocks was the only thing that's being discussed here then maybe that would be the case, but being able to use specific and cherry-picked embedding on a per-function basis seems to eliminate that problem by allowing the user to tell the compiler exactly which implementation they would like. Attempting to blanket embed two trait could simply be disallowed, resulting in a compiler error that suggests that the programmer use:

// This compiles!
impl Consume for Dog {
    use Animal;
}

// This does not!
impl Consume for SnakeDog{
    use Snake;
    use Dog;
} // Something like "error: can only blanket embed from one implementation..."

// However this would compile and (unless I'm missing something) avoids the diamond problem.
impl Consume for SnakeDog {
    fn eat(&self) use Snake; 
    fn drink() use Dog;
} // No ambiguity. Specific 'cherry-picked' embeds from multiple other implementations.

(I'll probably be using the use keyword syntax suggested above from now on, because I'm coming to prefer it.)

It also looks painfully non-orthogonal or non-uniform in Rust's rich type system: opposed to the existing trait inheritance, this embedding would only work with structs, and not with other types such as enums or primitives.

I'll admit I haven't thought of this as being universal as of yet. Having said that, just off the top of my head I can't see any reason why something similar couldn't exist for enums. Outside of structs and trait implementations I haven't given the bigger picture much thought or made any attempts to generalize this idea further, so I'll just have to think about it in other contexts.

If embedding ever gets implemented, this situation should definitely provoke a compiler error. Silently ignoring duplicate fields in the very definition of a type is dangerous, it can lead to extremely subtle and surprising errors.

Can you be more specific about the type of errors that ignoring duplicate field embeds would bring about? As I see it, if a piece of data has both the same name and type, it is effectively the same piece of data. If one structure has age: u8 and another structure also has age: u8, then are they not effectively the same potential piece of data?

In the event that the same name is used but the types are clashing, I think a compile error would be 100% appropriate. But, I'm not really seeing the potential harm involved in ignoring duplicate fields which share both name and type. A compiler warning, maybe, but a completely show-stopping error? I would need to know more about the potential errors to get behind this.

At any rate, thanks for the comment H2CO3, and I hope you don't mistake my elaborate responses as defensiveness. We've labelled this as a "brainstorming" session and nothing more, so all input and criticisms are very welcome. I've been a compiler user but not a compiler developer, so I'll be the first to admit that I'm looking at this from an idealistic perspective that really demands a dose of pragmatic critique and skepticism. I don't know whether this is a good fit for Rust and the Rust community, but I'm glad to have a chance to work with people who are smarter than myself to flesh out this idea in order to see where it goes.

Somehow I'd missed that "impl embedding" does so much with so little syntax that it sneakily reintroduces all the usual gotchas and corner cases of traditional inheritance, like the diamond problem.

Ixrec, other than the diamond problem which I don't think truly applies here thanks to specific implementation cherry-picking (see my above example), what other gotchas and corner cases of traditional inheritance do you argue that this suffers from? I'm more than happy to try to dig into the specifics in order to work out the details - in fact, that's really why I'm here at all!

Therefore, I'm convinced that, whether or not we should have struct embedding, it should be a separate feature from delegation.

For the record, I still disagree with this for the same reasons that I listed above. I see this as "compiler-assisted copy-pasting" that only does very slightly different things in different contexts. So, we'll probably have to agree to disagree on this one, especially since one of H2CO3's criticisms seems to be that this isn't general enough to Rust's other contexts (enums, primitive types, etc).

So we're probably at the point where, as with many other requests for sugar syntax, whoever wants to see this feature happen needs to produce some compelling examples of realistic code (not Cat/Dog/Animal toy examples) where this would be a significant improvement.

Again, not to be defensive, but we saw this "toy example" as a necessarily clear and concise example that models a very basic and intuitive relationship just like you'd see in almost any discussion of inheritance. It's definitely more concrete than class A and trait X, while being intentionally simple enough to get the idea across and show only what needed to be shown. Only one field was needed to show how struct embedding might work, two functions were needed to show how 'overriding' and 'cherry-picking' might work, etc.

I'll be happy to work on a more detailed and concrete example of this in the near future. But even this simple example has already raised a lot of important questions and confusion that has been (in my opinion) quite helpful. Anyway, I don't see this as too far afield from the types of inheritance hierarchies that a game developer or GUI developer might come across, no?

Let me know what type of relationship you want to see modelled in this way and I'll try do my best to make something more concrete.

@H2CO3
Copy link

H2CO3 commented May 11, 2018

At any rate, I don't mind whatever it's called.

Me neither; my problem is not nomenclature, but semantics.

I can't see any reason why something similar couldn't exist for enums.

How do you embed the data of one enum into another? What would the following code even mean?

enum Foo {
    Var1,
    Var2(String).
}

enum Bar {
    use Foo;
    Var3,
}

The interpretation of embedding in this context is not immediately obvious to say the least; I would go so far as thinking it would be necessarily surprising and/or illogical, because sum types by definition are not known at compile time to contain all of their data — that is the point in enums. So what can we do? Embed Foo as an additional field in every variant of Bar? Insert the list of Foo::* variants to Bar at the point of the use item? Generate types for each variant containing associated data that wraps that associated data and the corresponding associated data from Foo, if an identically-named variant exists? All of it seems wildly disconnected from reality in terms of added value vs. complexity ratio.

Can you be more specific about the type of errors that ignoring duplicate field embeds would bring about?

Sure, the worst one that immediately strikes me as wrong in inheritance is that it violates encapsulation. Make a field or method so private that "subclasses" can't use it? It will be useless for inheritance. Make it "protected", so that only subclasses can access it? That provides a false sense of security, since at that point a subclass can just re-export it through a public method. As an unpleasant but related side effect, it also makes reasoning about visibility a nightmare both in the compiler, and, what's more important, for human consumers of the code too.

Encapsulation and visibility is not the only problem with inheritance. The general issue is that any sort of subtyping can easily produce surprising results in practically any context, because as it turns out, people are not terribly good at keeping entire hierarchies in mind. If I say a value is of type T, then readers of the code (my future self included) will assume that it does exactly what type T itself does and nothing else, an assumption broken by subtyping, where "superclass" behavior must also be taken into account. In other words, inheritance prevents local reasoning and introduces the need for global reasoning, which is a huge burden on both tools and brains, and a regular source of to hard-to-find bugs. (I've been using several traditional object-oriented languages, and I can tell you this is a very realistic problem in a large code base. Both I and many respected and skilled co-workers made really bad mistakes related to this nature of inheritance.)

As I see it, if a piece of data has both the same name and type, it is effectively the same piece of data.

I beg to differ. The two fields can still have different contexts and different meanings. The name and the type are — unfortunately — not everything. This "ignoring duplicates" feature sounds very much like the compiler second guessing the programmer based on some sort of heuristics which may even work in most cases, but would be completely broken if the assumption doesn't hold in just one single case. Magic like this I think has no place in a language that picked safety and correctness as its explicit goals.

A very concrete example is: I'm currently writing a webservice and I'm using public key cryptography (through the excellent sodiumoxide crate) at the HTTP API boundaries. There are several types that need one or more PublicKeys. If there is only one such key, it's simply named public_key by convention. There is some shared logic in such types, which I currently generate using an extension trait and a macro (and to be honest I'm more than satisfied with the approach). If I were to use an inheritance-like feature for factoring out the common code, however, I would never want the compiler to assume that two public_keys do the same thing — they happen to have the same name and type, but they mean something slightly different in two different containing types. (In fact, I would never want the compiler to do anything with my struct fields at all. They are mostly private, and for a good reason.)

You might argue that I should just use newtypes to prevent this, however:

  1. in this specific case, the difference is not significant enough to warrant newtypes, yet the fields are not identical, they shouldn't be contracted into one.
  2. Even if in principle I should have used newtypes, I would definitely not want to always worry about such accidental conflation of fields unless I constantly keep wrapping every field into its own newtype (which would be completely unreasonable and infeasible of course). I have used languages with footgun "features" that require permanent vigilance, and it's extremely mentally tiring to program in those languages. For instance, having to check for overflow before every arithmetic operation in C is a huge hassle. In principle, one should always do it because signed integer overflow in C is undefined behavior. However, it's so annoying that realistically no-one ever codes in that style, and then of course it results in a myriad of bugs. Array bounds checking, null pointers, and all other classic "you weren't careful enough" bugs could also be added to this list. Thank goodness we don't have to worry about these things in Rust nearly as much as in C.

No ambiguity. Specific 'cherry-picked' embeds from multiple other implementations.

If we restrict the feature to inheriting methods, then what we get is a subset of the existing trait + default method system. If we also allow cherry-picking of individual methods, then we get a subset of the delegation RFC. I don't think we should add a subset of any feature twice to the language.

In conclusion, while I do think at least some of these problems could somehow be worked around, they are so fundamental that it would never be possible to completely eliminate them without resorting to several special cases and ugly hacks (including non-orthogonal restrictions) in the design of the language, and thus for me, any OO-inheritance-like feature would essentially be a complete showstopper.

@emmetoneillpdx
Copy link

emmetoneillpdx commented May 12, 2018

How do you embed the data of one enum into another? What would the following code even mean?

Enums weren't really considered within the original scope of this discussion. Having said that, if they were, I think it could follow the same pattern of behavior as structs or implementations; to embed a struct (A) within another (B) would effectively copy-paste fields from A into B (perhaps excluding exact name:Type duplicates); to embed an implementation (X) into another (Y) would effectively copy-paste function definitions from X into Y; and, by that same pattern, to embed an enum (U) into another (v) would effectively copy-paste variants from U into V. Each one of these exists as nothing more than a piece of friendly syntax that discourages manually copy-pasted code. In other words:

enum Foo {
    Var1,
    Var2(String).
}

enum Bar {
    use Foo;
    Var3,
}

// Could compile to something like:

enum Foo {
    Var1,
    Var2(String).
}

enum Bar {
    Var1,
    Var2(String),
    Var3,
}

Different-yet-similar enum types with zero automatic conversion or compatibility between, perhaps with sensible limitations, restrictions, or errors that occur to prevent name collisions. That's under the assumption that this type of embedding needs to be universal or apply to enums; I think it's possible, but it wasn't what I had in mind.

Just like in the case of structs or impl blocks, the mantra here is "compiler-assisted copy-pasting of code" - it doesn't make the same promises that traditional inheritance makes, it simply tells the compiler to help you generate similar structures. In some ways I personally don't think it's too much different that Rust's generics, which really serve as a command to tell the compiler to statically generate a bunch of similar code on your behalf - you could just copy and paste a bunch of functions changing the type each time, but generics help you avoid that error-prone and time-wasting behavior.

Insert the list of Foo::* variants to Bar at the point of the use item?

Bingo.

Generate types for each variant containing associated data that wraps that associated data and the corresponding associated data from Foo, if an identically-named variant exists?

A few of the guesses seem wildly out of left field here. If variants have clashing names do we need some genius solution? No way. Rust will do what it does best - compile error with a nice message telling you what the problem is and what needs to be done to fix it - in this case, one of the variants needs to be renamed because the same name cannot be used to describe to different variants.

In my opinion, the compiler doesn't need to cleverly figure everything out or solve problems in the user's code, it just needs to enforce a set of rules, spitting out useful warning and errors wherever necessary. If anything, I think I've been pretty straight-forward about the concept here, it's not some massive, complex, or smart solution to ever problem, it's struct->struct or impl->imply (perhaps even enum->enum) code reuse with a basic set of logic and rules for each one.

Sure, the worst one that immediately strikes me as wrong in inheritance is that it violates encapsulation. Make a field or method so private that "subclasses" can't use it? It will be useless for inheritance. Make it "protected", so that only subclasses can access it? That provides a false sense of security, since at that point a subclass can just re-export it through a public method. As an unpleasant but related side effect, it also makes reasoning about visibility a nightmare both in the compiler, and, what's more important, for human consumers of the code too.

I don't see this is a correct interpretation of what's being discussed here. The visibility or permissions of a field are not changed in any way. There is no subclassing, there is no implicit casting between these structures, no true 'inheritance'. If some field of the same name:Type exists, it will simply not be embedded (in other words, it won't be copied and pasted by the compiler).

The general issue is that any sort of subtyping can easily produce surprising results in practically any context, because as it turns out, people are not terribly good at keeping entire hierarchies in mind. If I say a value is of type T, then readers of the code (my future self included) will assume that it does exactly what type T itself does and nothing else, an assumption broken by subtyping, where "superclass" behavior must also be taken into account. In other words, inheritance prevents local reasoning and introduces the need for global reasoning, which is a huge burden on both tools and brains, and a regular source of to hard-to-find bugs. (I've been using several traditional object-oriented languages, and I can tell you this is a very realistic problem in a large code base. Both I and many respected and skilled co-workers made really bad mistakes related to this nature of inheritance.)

I believe you, and I know that inheritance has hidden pitfalls. But since we're talking about behavior now, we can talk about implementation embedding. Instead of using animal/dog/cat/etc, i'll just use letters this time:

impl T for A {
    fn ding() {
         //Type A uses an entirely customized version of `ding()`.
    }
    fn foo() use X; // Type A uses X's entire and exact `foo()` implementation.
    fn bar() use Y; // Type A uses Y's entire and exact `bar()` implementation.
    use Z; // Type A uses Z's entire and exact implementation for any remaining functions of this trait, T.
}

In other words, each function implementation is either reused from an existing implementation of that same trait OR it is a new implementation - embedding allows no in-between. If you're using code from an existing implementation, you are committing to doing exactly what that other implementation does. And if you're writing your own implementation, you're committing to doing exactly what is within the brackets. There's no way to call into the superclass' version half way through, there is no dynamic behavior or polymorphism, etc. Your implementation either does something new or copies something else's exact way of behaving.

I beg to differ. The two fields can still have different contexts and different meanings. The name and the type are — unfortunately — not everything. This "ignoring duplicates" feature sounds very much like the compiler second guessing the programmer based on some sort of heuristics which may even work in most cases, but would be completely broken if the assumption doesn't hold in just one single case. Magic like this I think has no place in a language that picked safety and correctness as its explicit goals.

We'll have to agree to disagree here. There's no "second guessing" or "magic" here: as programmers we are telling the compiler exactly what to do. It's my opinion that two data fields with the exact same name and type are effectively the same, just in the same way that if I ask someone to pass me an empty 4x4x4ft box with the word "tools" written on the side, it doesn't matter how many identical boxes exist in the world, I am simply asking for one thing that meets that name:Type specification. I do realize that this is a matter of opinion and highly subjective, so we'll just have to leave it at this: either the compiler treats two fields of identical name:Type as the same OR the compiler spits out an error and the user has to manually clean things up. It's merely a matter of taste.

A very concrete example is: I'm currently writing a webservice and I'm using public key cryptography (through the excellent sodiumoxide crate) at the HTTP API boundaries. There are several types that need one or more PublicKeys. If there is only one such key, it's simply named public_key by convention. There is some shared logic in such types, which I currently generate using an extension trait and a macro (and to be honest I'm more than satisfied with the approach). If I were to use an inheritance-like feature for factoring out the common code, however, I would never want the compiler to assume that two public_keys do the same thing — they happen to have the same name and type, but they mean something slightly different in two different containing types. (In fact, I would never want the compiler to do anything with my struct fields at all. They are mostly private, and for a good reason.)

In this example you don't want embedding or inheritance though - you want aggregation, no? I agree that embedding doesn't sound like the answer for this particular scenario, but just because you have a hammer doesn't make every problem a nail.

If you have a structure with one PublicKey you simply call it public_key and use it like that. Makes sense. But if you ask the compiler to embed another structure, or to embed some implementation that uses public_key then you may be asking the compiler to do something that you don't actually want it to do; If you want to have multiple PublicKeys inside a single structure you can't simply reuse a function implementation that was designed for a single PublicKey field, right?

I think embedding feels wrong here because it is wrong here, and the generic or an enum parameter is right design. You may have already come across a good design.

If we restrict the feature to inheriting methods, then what we get is a subset of the existing trait + default method system.

How can you get a subset of what we have now by adding new functionality? I'm not sure I follow you there. But, more importantly, why be limited to a "this or default" paradigm, when we could allow embedding from any other existing implementation?

If we also allow cherry-picking of individual methods, then we get a subset of the delegation RFC.

You could see it that way. But I see it instead as a much simpler solution to a much simpler problem, with, in my opinion, a bit nicer syntax. Embedding doesn't make the same promises or guarantees as delegation, nor does it have any real effect on the run-time behavior of a Rust program.

In conclusion, while I do think at least some of these problems could somehow be worked around, they are so fundamental that it would never be possible to completely eliminate them without resorting to several special cases and ugly hacks (including non-orthogonal restrictions) in the design of the language

But which fundamental problems are those exactly? "Diamond problem?", doesn't really exist here. "Could this exist for enums too?", I've shown that it could. "Should identical fields be ignored or result on a compiler error?", that's a matter of opinion but would work either way - certainly not a "fundamental problem".

From my perspective, the most immediately obvious problem is that assumptions are being made here based on an existing understand of inheritance in other languages. But this isn't "inheritance as it exists in other languages", embedding is simply a compile-time tool for simple and effective code reuse within data structures, implementations, and (maybe) enumerations. As was said at the outset, if a full and traditional implementation of OO-style inheritance is what Rust needs, then this isn't it.

@H2CO3
Copy link

H2CO3 commented May 12, 2018

Each one of these exists as nothing more than a piece of friendly syntax that discourages manually copy-pasted code.

Alright. So no subtyping at all, but pure syntactic sugar? I think that should be made clearer. References have been made to "inheritance" and Go's "embedding", both of which imply more or less subtyping, respectively.

However, I still have to question the value in introducing special syntax for such a piece of functionality. What problem does this solve that a macro (either declarative or procedural) couldn't? Rust users suggesting new syntactic sugar features often forget that the point of macros is that you don't have to build every new piece of syntactic sugar into the language; instead, you can write your own! Everyone benefits from that: if you want this functionality, you don't have to go through the RFC process and wait until the feature eventually lands in the language, if it ever does. You don't have to make compromises as to how it works in order to cater the entire community. You can just write your own macro implementing it for exactly the semantics you want. And those who don't need or want it wouldn't need to worry about the increased complexity in the language and the bugs it potentially hides.

There's no "second guessing" or "magic" here: as programmers we are telling the compiler exactly what to do.

So… if what you are proposing is indeed only compiler-assisted copy-pasting, then I especially don't see why you wouldn't want duplicate fields to be an error. When you copy fields of struct B into struct A, and there are two fields with the same name and type, there are three possible reasons (and corresponding solutions) for it:

  1. If the field indeed has to possess the same semantics in the embedding type as it currently has in the embedded type, then just remove the field from the embedding type, and let the compiler copy it over from the embedded type.
  2. If the fields should have different semantics, then rename the conflicting field in one of the types (probably the embedding one), resolving the name conflict.
  3. If we allow embedding of two structs at the same time, and the field has the same name and type in both, then you can resolve the conflict by the aforementioned cherry-picking.

According to your proposal, the compiler would need to assume scenario 1. This means that genuine oversights resulting in scenario 2 would go undetected. (See, this problem is not even specific to subtyping at all.) Since all of this is happening at compile time, a "duplicate field" error could be trivially resolved by the programmer in no time using static knowledge of his/her own code: either delete a line or edit the name of a field. But the important part is that the programmer gets to make that decision.

But this isn't "inheritance as it exists in other languages"

Again, I do realize that, although the description wasn't very clear about what exactly this doesn't propose.

@ibraheemdev
Copy link
Member

ibraheemdev commented Mar 17, 2021

Just to clarify, Go embedding is not inheritance, and does not behaves like inheritance at all. For example:

type T int

func (T) foo()

type U struct {
    T // embedded T
}

var u U

u.foo() behaves exactly like u.T.foo() when there's no foo defined within U - that's composition, not inheritance. It's only syntactic sugar with zero semantic change. When (T).foo executes it doesn't have the slightest idea its receiver is embedded in some "parent" type. Nothing from T gets inherited by U. The syntactic sugar only saves typing the selector(s), nothing else is happening.

If I understand this proposal correctly, Rust could do something similar. No inheritance, just syntactic sugar for composition and method/field delegation.

@ibraheemdev
Copy link
Member

ibraheemdev commented Mar 17, 2021

The problem with proposals like this, is that people don't like it because "it feels like inheritance". Instead, everyone just implements Deref. And yet the docs still argue that Deref "is only for smart pointers", while the standard library ignores it, and the book says otherwise. Perhaps #2393 is more straightforward solution, although it does not solve the case of "not a smart pointer but still kind of one" for structs like actix_web::Data, where Deref is arguably the best solution. Either way, I believe something needs to be done at the language level to solve this issue.

@burdges
Copy link

burdges commented Mar 17, 2021

Actually Deref provides a better solution than struct embedding because Deref gives smart-pointer-like control over mutability.

pub struct Builder {
    pub foo: ..
    pub bar: ..
}
impl Builder {
    fn build(self) -> Built {
        Built { builder: self, .. }
    }
}
pub struct Built {
    builder: Builder,
    baz: ..
} 
impl Deref for Built {
    type = Builder;
    fn deref(&self) { &self.builder }
}

We never impl DerefMut for Built here so builder becomes immutable, even if Built remains mutable.

@ibraheemdev
Copy link
Member

ibraheemdev commented Mar 17, 2021

@burdges Deref is very limited. You can only deref to one type for a given struct. That is very different than what composition by embedding allows:

pub struct Dog {
   ..Legs,
   ..Mouth
}

let dog = Dog::new();
dog.walk(); // dog.legs.walk()
dog.bark(); // dog.mouth.bark()

The point of embedding is not to implement smart-pointers. That is literally what Deref is for. The point of embedding is to aid in composition. Per-field mutability is not possible in Rust today, and should not be possible with embedding.

@H2CO3
Copy link

H2CO3 commented Mar 17, 2021

At that point, however, it would be trivial to add inherent accessor methods to the embedded types, like dog.as_legs().walk() and dog.as_mouth().bark(). Given how infrequently this pattern is needed in idiomatic Rust, that should not cause complexity to skyrocket.

@ibraheemdev
Copy link
Member

@H2CO3 I'm not saying that pattern is common, it is more common to have a single embedded field. I was simply showing how composition by embedding is very different than composition by Deref. The latter, in my opinion, is very wrong.

@burdges
Copy link

burdges commented Mar 17, 2021

I think "embedding" only confuses the issue, because whether you write dog.legs.right or dog.right does not matter, and the second usually makes a mess with say dog.ear.right, etc.

Access and polymorphism matter though, meaning whether Dog provides mutable access, immutable access, or only filtered access via its own methods to Legs and Mouth. We accept filtered access only goes through traits, meaning ..Legs necessarily means pub ..Legs, so mutability becomes the entire question here.

It's clear Deref provides a sensible solution for the common builder-buildee-like case of a single canonical referent with immutable access. I think Borrow handles immutable access with multiple referent types quite well, ala impl Borrow<Legs> for Dog and impl Borrow<Mouth> for Dog. AsRef works too when not worried about detached usage.

I'd expect derive_more could be expanded for say #[derive(Borrow<Legs>,Borrow<Mouth>)] since derive_more already handles AsRef, which addresses the boilerplate problem.

@ibraheemdev
Copy link
Member

With Deref, you cannot delegate trait implementations to fields. You would still have to copy paste self.0.foo for every method. Struct embedding paired with the newtype pattern to get around coherence limitations is very convenient.

@burdges
Copy link

burdges commented Mar 18, 2021

We've a separate delegation discussion that largely avoids discussing embedding. If anything, embedding increases the strangeness cost of delegation, and thus makes ever approving a delegation RFC less likely.

AsRef might be preferable over Borrow in this case JelteF/derive_more#145 although Borrow works if your type does not support Eq, Ord, Hash,.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-lang Relevant to the language team, which will review and decide on the RFC.
Projects
None yet
Development

No branches or pull requests

8 participants