Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework enums in rust. #6098

Merged
merged 20 commits into from
Oct 19, 2020
Merged

Rework enums in rust. #6098

merged 20 commits into from
Oct 19, 2020

Conversation

CasperN
Copy link
Collaborator

@CasperN CasperN commented Sep 2, 2020

They're now a unit struct, rather than an enum. This is a backwards incompatible change but the previous version had UB and was also backwards incompatible so while I was at it; I brought some constants and functions into the type's own namespace, which is also backwards incompatible.

ENUM_MIN_FOO => Foo::ENUM_MIN
ENUM_MAX_FOO => Foo::ENUM_MAX
enum_name_foo => Foo::variant_name

I deleted the jump-table-to-name-str thing since Rust seems to compile match into jump tables in the common case of sequential enum values (as far as I can tell).

Casper Neo added 2 commits September 2, 2020 16:36
They're now a unit struct, rather than an enum. This is a
backwards incompatible change but the previous version had UB
and was also backwards incompatible so...
@CasperN
Copy link
Collaborator Author

CasperN commented Sep 2, 2020

@jean-airoldie @vglavnyy @rw @aardappel

This is the implementation of #5581 (comment) PTAL

@aardappel
Copy link
Collaborator

I have no idea what effect this has on usage in Rust, but them being explicitly sized seems nice.

@CasperN
Copy link
Collaborator Author

CasperN commented Sep 3, 2020

Oh, one other backwards-incompatible thing I forgot to add is proper support for bitflags
https://docs.rs/bitflags/1.2.1/bitflags/

@krojew
Copy link
Contributor

krojew commented Sep 16, 2020

What's the motivation for this change?

@CasperN
Copy link
Collaborator Author

CasperN commented Sep 16, 2020

What's the motivation for this change?

Most of it is discussed in #5581 and #5467, but TLDR: having an enum who's discriminant isn't in the declaration is UB.

@krojew
Copy link
Contributor

krojew commented Sep 16, 2020

Ok, but the idiomatic thing is to use a Result. By adding a magic "unknown value" enum variant, you not only introduce a custom Result implementation (which this is, since everyone would need to account for that), but also the awful C-style magic constants approach. I would vote for placing invalid values as an error variant of a Result.

@CasperN
Copy link
Collaborator Author

CasperN commented Sep 16, 2020

I'm not adding a magic "unknown value" variant. I'm using a C like representation.

struct MyEnum(u16);
impl MyEnum {
  // Variants are expressed as associated constants.
  pub const Variant1: MyEnum = MyEnum(1);
  pub const Variant2: MyEnum = MyEnum(2);
}

The result based representation goes against Flatbuffer's zero-parsing / be-very-fast philosophy because Result<u16, u16>, takes up more than 16 bits (to fit the discriminant). The struct MyEnum(u16) requires no parsing while the result must first read the data to decide if its Ok or Err. The C-like way is a little faster and less memory hungry. Also, an aligned array of MyEnum can be accessed as a slice while the Result version cannot because there has to be an interpretation step.

Another minor win, this way, is that we'd write

match my_enum {
  MyEnum::Variant1 => ...,
  MyEnum::Variant2 => ...,
  MyEnum(x) => ...,
}

which I think looks better than

match my_enum {
  Ok(MyEnum::Variant1) => ...,
  Ok(MyEnum::Variant2) => ...,
  Err(x) => ...,
}

@krojew
Copy link
Contributor

krojew commented Sep 17, 2020

While I agree using a Result will copy some data, I nevertheless think in 2020 we can live with the burden of copying 16bits, even on embedded devices, if we get proper idiomatic API in return. Not to mention your example does the exact same thing by putting values in your custom struct, just like you would do when putting them in a Result. This is a typical case of a fallible conversion which in Rust is handled by TrFrom/TryInto traits, and the resulting type is a Result. Your example further reinforces my point:

Another minor win, this way, is that we'd write

match my_enum {
  MyEnum::Variant1 => ...,
  MyEnum::Variant2 => ...,
  MyEnum(x) => ...,
}

which I think looks better than

match my_enum {
  Ok(MyEnum::Variant1) => ...,
  Ok(MyEnum::Variant2) => ...,
  Err(x) => ...,
}

We can see that the proposed implementation simply reinvents the Result type. In fact, if this is merged every enum in FB will become a custom, independent version of a Result, with MyEnum(x) acting as a custom Err value. I can guarantee Rust devs seeing this API will not like it, if not outright call it a design error. We loose all the benefits of a Result, from chaining methods, interoperability with other types, to the ? operator.

What should be done to solve the compatibility problem is to use idiomatic fallible conversion mechanism already provided by Rust, rather than reinvent the wheel. Simply implement TryFrom<underlying type> for a given enum and return its result. This will be familiar to Rust programmers, clean and simple to use, with the added bonus of not having to create an addtional struct per enum. Enum getters will simply contain a value.try_into() call.

@aardappel
Copy link
Collaborator

@rw opinion?

@CasperN
Copy link
Collaborator Author

CasperN commented Sep 17, 2020

we can live with the burden of copying 16bits ... your example does the exact same thing by putting values in your custom struct, just like you would do when putting them in a Result... the proposed implementation simply reinvents the Result type.

I think you've misunderstood. In my example MyEnum::Variant1 == MyEnum(1). Note that MyEnum::Variant1 is declared as an associated constant. MyEnum is a unit-struct, not a (rust) enum, and its guaranteed to have the same layout as the thing inside the struct, u16 in this case. I'm not "reinventing the Result type", I'm modelling a C style enum. MyEnum(x) can represent both declared and unknown variants (in fewer bits!)

This is a typical case of a fallible conversion which in Rust is handled by TrFrom/TryInto traits, and the resulting type is a Result.

I think the Result implementation doesn't fare too well when we consider forwards compatibility (adding new variants). Suppose we returned Result<MyEnum, u16>, there are two choices:

  • MyEnum is exhaustive: Then adding a new variant is a code breaking change (because of exhaustive pattern matching).
  • MyEnum is #[non_exhaustive]: This defeats the purpose of Result as the unknown-variant handling logic isn't confined to the Err branch. Even if you got Result::Ok, you still must add a match arm for unknowns. I think this leaky-logic option is just bad.

@krojew
Copy link
Contributor

krojew commented Sep 17, 2020

I don't think you should model Rust api based on C idioms, since Rust has its own idiomatic ways of doing things, which can be argued, are better in this case. The compatibility case you describe isn't a problem in proper Rust for a few reasons.

  • If someone wants to handle all variants, it is a good thing his code will not build when a new one is added and the code is regenerated. This is exactly why enum matches in Rust are exhaustive by default, which saves us from any error resulting from omitting a value.
  • If someone does not want to handle new values and doesn't regenerate the code, he can always treat the resulting error as a non-fatal one. This works the same way as in the proposed api.
  • If someone does not want to handle new values but chooses to regenerate the code anyway, the idiomatic way to do it is by using the _ pattern.

As you can see, Result already handles problematic cases and has the added bonus of already being the standard way of doing those things. In short - let's not reinvent the wheel, especially if the wheel is based on C and looks like a square.

@CasperN
Copy link
Collaborator Author

CasperN commented Sep 17, 2020

In short - let's not reinvent the wheel, especially if the wheel is based on C and looks like a square.

Ok, that's getting a little rude.


Regardless, I think our points have been made.

If we prioritize idiomatic-ness over performance and also think enums should be exhaustive (thus code breaking is a good thing), we can do things the Resulty way.

Care to weigh in? @rw @aardappel

@aardappel
Copy link
Collaborator

I've yet to write a single line of Rust, so I'm going to leave that to @rw :)

@CasperN
Copy link
Collaborator Author

CasperN commented Sep 22, 2020

@jean-airoldie might also have an opinion

Casper Neo added 2 commits October 2, 2020 20:30
Previously, the bitflags attribute was just ignored. This is a breaking change
as the bitflgs API is not like a normal rust enum (duh).
tests/monster_test_generated.rs Outdated Show resolved Hide resolved
tests/monster_test_generated.rs Show resolved Hide resolved
tests/optional_scalars2_generated.rs Show resolved Hide resolved
tests/optional_scalars2_generated.rs Outdated Show resolved Hide resolved
@krojew
Copy link
Contributor

krojew commented Oct 3, 2020

At this point I'm wondering if we should simply wrap accessors in a Result/Option rather than depending on (non-existent yet) validator. @rw @ImmemorConsultrixContrarie opinions?

@ImmemorConsultrixContrarie
Copy link
Contributor

At this point I'm wondering if we should simply wrap accessors in a Result/Option rather than depending on (non-existent yet) validator. @rw @ImmemorConsultrixContrarie opinions?

No, we can't. Getting things from table or struct should be infallible. However, reading tables or structs from bytes could be fallible. But that's not a PR about rewriting Follow.

@krojew
Copy link
Contributor

krojew commented Oct 3, 2020

No, we can't. Getting things from table or struct should be infallible. However, reading tables or structs from bytes could be fallible. But that's not a PR about rewriting Follow.

That's true in a general sense, but in FB, a struct and a vector of bytes is the same thing, unless we add an intermediate parsing step. To have a good Rust API which upholds the contract of being a safe abstraction over an unsafe code, we either need a validator or a Result everywhere we can fail. Right now we have a situation where we panic or cause unsoundness. Of course this PR is not about rewriting the whole Rust generator, but it exhibits a part of the problem.

@ImmemorConsultrixContrarie
Copy link
Contributor

ImmemorConsultrixContrarie commented Oct 3, 2020

I think, bitflags part should be deferred for later, because right now Scalar makes it unsound.

Without bitflags part this PR would only close soundness holes, not dig a new ones.

And once it is agreed on what variant_name and same functions should return, this would be a cool improvements-only PR.

Edit: @CasperN

@aardappel
Copy link
Collaborator

Ok, I'm not a Rust programmer (yet), but I just had a brainfart for another possible solution:

The problem is that Rust enums's semantics rely on them being fixed to an exact number of elements (which is a great feature for the language itself), but which is fundamentally incompatible with extensible nature of serialization.

Most Rust enums (and other ADTs like in Haskell etc) that must represent non-presence usually bake a None value in as one of the possible values. Can we do something similar here?

For a schema enum that has A, B, C, we could generate a Rust enum that has A, B, C, None. The value of None could be any value that is not equal to A, B, C, since it only needs to be unique for the current version of the Rust code.

Advantages:

  • You can now naturally have a single-level case that deals with all values, rather than the double-level you'd have with an Option/Result of an enum.
  • You could test for a single value without dealing with the None value.
  • Maps serialization semantics precisely onto a Rust enum.

Disadvantages:

  • We now maybe need a case to read a value (map int -> enum) ?
  • We need to ensure that any writing code treats errors on any attempt to write a non-optional enum value as None (since that would turn this value into the default value, which might be e.g. A, which is undesirable).
  • Any other situations in Rust where the actual integer value of None would be exposed, as it may differ when the schema changes.

If I would re-design FlatBuffers from scratch with Rust in mind, I would require at the schema level that every enum has a None much like unions already do. This would allow it to have a fixed value and be the default, taking care of the second and third disadvantage. But it is too late for that, unless we invented a new kind of enum, which I don't think we want to do. It also clashes with the typical C/C++ style enums that like to map schema enums to the exact values of some existing C/C++ enum (so it can be casted), meaning we can't guarantee a fixed value.

@krojew
Copy link
Contributor

krojew commented Oct 7, 2020

The proposed None (not to be confused with Rust built-in None) would have to be mapped to a concrete value in an enum, which removes the ability to read what the underlying value was. It would be equivalent to a flattened Rust Option<Enum>. Having a Result with the unknown value wrapped inside an error gives that possibility back and introduces exactly the same number of additional matches needed, as with proposed None: 1.

@ImmemorConsultrixContrarie
Copy link
Contributor

For a schema enum that has A, B, C, we could generate a Rust enum that has A, B, C, None. The value of None could be any value that is not equal to A, B, C, since it only needs to be unique for the current version of the Rust code.

Not solving a problem, because enum can't be stored as enum, unless you close all the gaps, eventually making enum a 0..=Repr::MAX fully filled thing, that can have any value.

The problem here is that Rust enums can't be stored in structs or slices as enums due to endianness and inexhaustiveness of flatbuffers' enums, because in Rust having enum with invalid value is UB; making enum from integer is not a problem.

@CasperN
Copy link
Collaborator Author

CasperN commented Oct 9, 2020

Anyway, let's try to get back on topic. The new discussion is whether we should go with a combined approach, where the native Rust enum is used in the object API. It seems @aardappel and @ImmemorConsultrixContrarie and I agree that the normal enum accessor should be a closer model to the flatbuffers enum.

This might be a good compromise, although we're making the api a bit non-obvious.

@krojew we'll add it to the tutorial if that helps

Is there a way to synthesize the ideas into one approach, perhaps?

@rw how does this sound?

@CasperN CasperN reopened this Oct 9, 2020
@rw
Copy link
Collaborator

rw commented Oct 11, 2020

@CasperN SGTM, thanks!

@CasperN
Copy link
Collaborator Author

CasperN commented Oct 15, 2020

I think this PR is almost ready to go in. Given its a fairly signifiant change, there's remarkably little code changed in tests/rust_usage_test/tests/integration_test.rs, which is awesome. I need to add back some of the enum constants (behind a deprecation warning). There's a minor point about us misusing bitflags, but I think we'll wait for bitflags/bitflags#228 before addressing it.

Can I get another round of comments?

@CasperN CasperN marked this pull request as ready for review October 15, 2020 18:26
@CasperN
Copy link
Collaborator Author

CasperN commented Oct 16, 2020

In another business day, I'll assume tacit approval and will merge.

@rw @ImmemorConsultrixContrarie @krojew

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants