Replies: 1 comment
-
Ugh, I really need to figure out how to get these discussions to alert somewhere. Anybody know how to do this? Hey @esammer , welcome. Sorry that nobody responded sooner. @cpcloud, @rdblue, @westonpace drawing your attention to this discussion. I'm going to answer your questions in backwards the order you suggested them as I wonder whether one can build to the other. I'll also start by noting that neither of these are things I've personally thought a lot about. Unions: I think there are four types of patterns to consider:
For reference (not sure if you saw this), a Substrait struct type does not carry fields. As part of implementing a clean/consistent set of semantics, Structs are entirely positional. As such, a struct declaration might look like In implicit referencing, I generally think that the only viable plan is doing something like a variant, which isn't coercible to other types without an explicit cast operation (or similar sql json function type that declares explicit type). In those situations, it also seems like the struct representation is sufficient along with potentially a specialized function that accepts an arbitrary struct and resolves based on a particular set of definitions. So, I guess the question is what specifically feels "wasteful and complex" as you put it? Recursive types. I also think it may be possible in your particular use case to think of this more as a new extension type you simply name "variant", similar to Snowflake. You could then, through function invocations or similar ultimately turns this into a known type? Don't perceive any of this as a hard pushback on union types, more just trying to get to the root of the problems and figure out the right solution. |
Beta Was this translation helpful? Give feedback.
-
Hey all. We're evaluating Substrait for plan representation at Decodable within a stream processing engine. We frequently deal with the initial structuring of data from complex sources. This means we have to model complicated cases within the type system. A few questions.
Recursive types
How do folks intend to represent (infinitely) recursive type definitions in the type system? We frequently deal with progressive specification of a type in a pipeline: where a record begins with a fully generic type (e.g. Value which is a union of all scalar types as well as list, map<string, Value>) and becomes incrementally more well-defined as it passes through stages of a pipeline of operators.
Union types
I read the rationale for ignoring unions, but I think they're deeply important for complex cases as mentioned above. We could (and do) use column-like structures like
struct<[][]a, [][]b, [][]c, ...>
that are indexed by field position, but asking users to think about queries in this way is really difficult. We could represent things this way within the engine and just provide syntactic sugar in the language, but it's really wasteful and complex. What are folks' thoughts on this?A real-world example of these two issues is a source that starts with json records with heterogeneous structures and uses a series of filters and projections to structure that data based on its contents.
e.g.
Beta Was this translation helpful? Give feedback.
All reactions