Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: attach Substrait version number to plans #347

Merged
merged 3 commits into from
Oct 5, 2022

Conversation

jvanstraten
Copy link
Contributor

No description provided.

@@ -24,6 +24,10 @@ message PlanRel {
// Describe a set of operations to complete.
// For compactness sake, identifiers are normalized at the plan level.
message Plan {
// Substrait version of the plan. Optional up to 0.16.0, required for later
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this isn't merged this week, I'll probably have to update this to track the version that would introduce this.

Copy link
Member

@mbrobbel mbrobbel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we should also add Substrait versioning to extensions? I forgot to read the PR title.

@jvanstraten
Copy link
Contributor Author

FWIW there's some discussion about extension (function) versioning here at #274, and there have been several verbal discussions about extension versioning and URI schemes during various sync meetings. I think a good summary of the situation is that no one's really sure how to do it :) This PR is just low-hanging fruit though.

// using a lowercase hex ASCII string 40 characters in length. The version
// number above should be set to the most recent version tag in the history
// of that commit.
string git_hash = 4;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the major/minor/path be left empty in this case (0.0.0)? Or should they be set to the most recent version before the fork? I would think empty but might be good to clarify.

Copy link
Contributor Author

@jvanstraten jvanstraten Sep 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't the last sentence of the comment cover it? I guess I could make explicit that the tag I'm referring to must be an "official" substrait-io version tag.

ETA: and I think there must always be a version tag reachable in the history, because there is a tag reachable from the commit that would add this feature in the first place.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yeah, that is clear. I think I just glossed over it reading, sorry

// using a lowercase hex ASCII string 40 characters in length. The version
// number above should be set to the most recent version tag in the history
// of that commit.
string git_hash = 4;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to introduce what tool name and version produced the plan. In the practical (annoying) world, there may be 3 different versions of a producer that all produce using the exact same Substrait plan version but one of them produces invalid plans that have been stored. Consumers may need to be conditional about what is seen to handle bugs in producers. We've had to do this a bunch of times with Parquet's producer field and I expect the same here (sadly).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't necessarily disagree, but I fear we might end up with something like Isthmus/5.0 (X11; Linux x86_64) Acero/537.36 (KHTML, like Gecko) Velox/105.0.0.0 Ibis/537.36 if we're not careful, especially if we just add a string field for this. For consumer-specific plans (i.e. those using extensions from some particular engine), shouldn't there then also be an optional intended consumer identifier?

Typing a bit more freely I came up with this...

message Plan {
  // Metadata for the plan. Optional up to 0.17.0, required for later versions.
  Metadata metadata = 6;
  
  // ...
}

message Metadata {
  // Optional name of the plan. No semantical meaning; should only be used for
  // debugging and tracing.
  string name = 1;

  // Substrait version of the plan. Optional up to 0.17.0, required for later
  // versions.
  Version version = 2;

  // Identification of the tool generating the plan. Optional (hand-written
  // plans, for example, don't logically have a producer), but strongly
  // recommended when a plan is
  Tool producer = 3;
  
  // If the plan was generated to be compatible with one or more specific
  // consumers, this/these consumer(s) can be identified here. Optional.
  repeated Tool consumer = 4;

  // A place to store custom metadata.
  google.protobuf.Any custom = 5;
}

message Version {
  // Substrait version number.
  uint32 major = 1;
  uint32 minor = 2;
  uint32 patch = 3;

  // If a particular version of Substrait is used that does not correspond to
  // a version number exactly (for example when using an unofficial fork or
  // using a version that is not yet released or is between versions), set this
  // to the full git hash of the utilized commit of
  // https://github.com/substrait-io/substrait (or fork thereof), represented
  // using a lowercase hex ASCII string 40 characters in length. The version
  // number above should be set to the most recent version tag in the history
  // of that commit.
  string git_hash = 4;
}
  
message Tool {
  // Name of the tool, e.g. Acero, Velox, DuckDB, etc. Required.
  string name = 1;

  // Version of the tool. Put the integer components of the version in
  // version_components, and anything that's needed to identify the version in
  // addition in version_suffix. For a release version of tools using semver,
  // version_components should have 3 entries, and version_suffix should be
  // empty.
  repeated uint32 version_components = 2;
  string version_suffix = 3;
  
  // An optional place to store tool-specific information, such as feature
  // flags.
  google.protobuf.Any custom = 4;
}

... but this is all pretty arbitrary and grows without bound.

IMO there's no need to add something like this already and we should postpone this, especially considering we've already had talks of more precise communication of capabilities and negotiation. An embedded Substrait version number, on the other hand, addresses a much more pressing need: if a proposal like #333 would have been accepted (shouldn't we close that PR, since even Weston changed his mind in the end...?) there would have been no way to tell without context what a project relation means, even without accounting for consumer/producer bugs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isthmus/5.0 (X11; Linux x86_64) Acero/537.36 (KHTML, like Gecko) Velox/105.0.0.0 Ibis/537.36 if we're not careful, especially if we just add a string field for this

This is exactly what we want. Ideally, we never need to use the field. Practically, we will. We should add it now before we need it. This isn't structured information, it's a log entry to deal with unexpected bugs later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jvanstraten
Copy link
Contributor Author

Updated the version number in the comment for Plan.version, and also added a message that can be used to deserialize only the version of a plan even in the presence of breaking changes that would cause protobuf itself to throw.

@jvanstraten jvanstraten merged commit 2d1bb9d into substrait-io:main Oct 5, 2022
@jvanstraten jvanstraten deleted the version-number-in-plan branch October 5, 2022 05:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants