-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: extend exported proto data structures for simple extension information #50
feat: extend exported proto data structures for simple extension information #50
Conversation
…rmation BREAKING CHANGE: validator.proto has been split up into three files. Some existing fields relating to extensions have now also been deprecated.
7f5ce70
to
595bd39
Compare
|
||
// All extensions defined in the type class namespace of this extension. All | ||
// names are case-insensitively unique. | ||
repeated ExtensionDefinition type_classes = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about making this (and the same for the other fields) repeated TypeClass
where message TypeClass { ExtensionMetaData metadata, ... }
, and message ExtensionMetaData { string uri = 1; repeated string names = 2; ... }
? This would make consuming these messages more type safe: no need to match on the kind (it should match but there is no guarantee by the given structure).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored, but it kinda rippled all over the place thanks to there being different function extension types.
// The set of names that may be used to case-insensitively refer to this | ||
// extension within the scope of the above URI. For type classes and type | ||
// variations there will only ever be one of these. For functions, the first | ||
// name is always the compound name. The simple name will only be added when |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the names are different for the different variants, maybe we should just copy-paste the relevant fields instead of introducing a metadata message type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know, those three files together very clearly represent identifying information for extensions. It's annoying that Substrait makes that exception for function naming that a simple name can sometimes also be used, but I'd rather generalize that special case to all the other extension types as well in case more weirdness ends up being added later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer using the proto definition to guide based on type instead of having to deal with these semantics when consumed, but I understand how this would blow up if applied everywhere.
|
||
// Any resolved extensions defined by dependencies of this module. Note that | ||
// these are not publicly exposed by this extension. | ||
repeated ExtensionDefinition dependencies = 5; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one confuses me. Can you elaborate a bit on this one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extensions should (eventually) have support for dependencies, primarily to allow one YAML with function definitions to use types defined in another YAML. Those type definitions wouldn't be visible directly, since we use URIs to refer to things, so they don't get added to the normal list of type definitions, but they do need to go somewhere such that the data types and patterns eventually used in that function definition can refer to them via the extension_id
thing. Everything like that, i.e. not publicly exposed by the extension YAML being represented here, insofar as the validator has resolved them, would end up in that list.
ETA: example of a thing that might refer to something in that list:
substrait-validator/proto/substrait/validator/type_system.proto
Lines 69 to 82 in ae0b278
// Information about a user-defined type. | |
message UserDefinedType { | |
// URI of the YAML file that the type is (supposed to be) defined in, if | |
// known. | |
string uri = 1; | |
// Name of the type within the scope of that YAML file. | |
string name = 2; | |
// If nonzero, points to a variation extension definition elsewhere in the | |
// tree. All extension definitions can be gathered by traversing the tree | |
// and looking for ExtensionDefinition messages in the data associated with | |
// each node. Note that extension IDs are only unique within a single tree. | |
uint64 extension_id = 4; |
(somewhere nested deep inside a data type pattern)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 I now remember about the max_uri_resolution_depth
field in Config
.
// Identifier for the extension. | ||
Identifier identifier = 1; | ||
|
||
// Common metadata for the extension. | ||
Metadata metadata = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. I think this is an improvement.
The idea here is mostly to define a nice output data structure as an interface to work toward, rather than leaving this until the end and then requiring lots of refactoring for the data structure. There are some new concepts in here as well that don't exist in Substrait yet (such as already supporting substrait-io/substrait#265, substrait-io/substrait#320, various new description fields that were randomly missing from the YAML schema, etc) to future-proof somewhat.
Some semantic changes to patterns snuck in here as well. Specifically, bindings now always bind the non-nullable version of typename metavalues, and only match non-nullable by default. This makes their behavior more consistent with typenames like
i32
(which means non-nullable, not either nullability).BREAKING CHANGE: validator.proto has been split up into three files. Some existing fields relating to extensions have now also been deprecated.