-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: various fixes for #252 (write and DDL relations) and #284 (relation references) #288
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -323,7 +323,7 @@ doing `ReferenceRel(0) JOIN D`. This allows to avoid the redundancy of `A JOIN B | |
|
||
| Signature | Value | | ||
| -------------------- |---------------------------------------| | ||
| Inputs | 1 | | ||
| Inputs | 0 | | ||
| Outputs | 1 | | ||
| Property Maintenance | Maintains all properties of the input | | ||
| Direct Output Order | Maintains order | | ||
|
@@ -333,7 +333,7 @@ doing `ReferenceRel(0) JOIN D`. This allows to avoid the redundancy of `A JOIN B | |
|
||
| Property | Description | Required | | ||
|-----------------------------|--------------------------------------------------------------------------------| --------------------------- | | ||
| Referred Rel | A zero-indexed positional reference to a `Rel` defined within the same `Plan`. | Required | | ||
| Referred Rel | A zero-indexed positional reference to a `Rel` defined within the same `Plan`. The index must be less than the index of the relation tree that the reference appears in; put differently, you can only refer to trees that have already been declared. This avoids cyclic dependencies and forward references. | Required | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not convinced we need the constraint of no forward references. I'm fine with it, just not convinced it is necessary. If you feel strongly, sounds good. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Seems like a fine sufficient condition, not necessary, but loop-checks are maybe trickier? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's indeed not a necessary constraint to ensure there are no loops. But the point of Substrait is to be machine-readable and machine-writable, right? So here's an algorithmic argument. On the producer side, a valid algorithm to convert any DAG to a list of relations that satisfies this constraint is pretty trivial; emit all nodes without in-edges, follow their out-edges to the inputs of other nodes to replace those in-edges with relation references using the indices that were just emitted (removing the edges), repeat until all nodes have been emitted. With the right data structure for the DAG this is O(n). In fact, the majority of the algorithms I've managed to come up with for this give you a correct ordering (or its reverse) naturally. The only one I've come up with that doesn't is to first generate relation indices by a random walk over the node list, which, in fairness, might be a natural way to do it if you're storing references to all nodes in a list with no particular order vs. relying on edge references (that would be a directed graph data structure though, not a guaranteed-acyclic graph data structure). On the consumer side, the algorithm with this constraint is equally simple: starting at the front of the list, emit nodes and keep an index-to-node-reference map as you do, replacing relation references with edges by resolving the indices using the map. You're now guaranteed to have a DAG. Error-checking happens naturally because the index-to-node resolution will fail if the constraint is not satisfied. O(n) complexity. If the consumer does cannot rely on this constraint, the algorithm I would probably use is to loop over the list to ensure each node is created, creating dependent nodes earlier as needed using recursion. Problem is, this gives you an arbitrary directed graph, which means you have to go out of your way to write validation to detect cycles. And the easiest way to do that is... to see if it's possible to create a node ordering that satisfies exactly the constraint I'm proposing (or its inverse), by actually running a constructive algorithm for such an ordering and seeing if it will fail. It'd be pretty tempting for a consumer to just not check, though... and now you have guaranteed memory leaks if you were using reference counting for the node edges and probably a deadlock, stack overflow, or infinite runtime when actually executing the plan. This might never even make it to a test case, in which case it might surface as a DoS exploit ten years later instead. So, to be honest, the only arguments I see for not having the constraint is to let humans write nodes in any order they want or make it easier for people to write subtly broken consumers. |
||
|
||
=== "ReferenceRel Message" | ||
|
||
|
@@ -343,14 +343,14 @@ doing `ReferenceRel(0) JOIN D`. This allows to avoid the redundancy of `A JOIN B | |
|
||
## Write Operator | ||
|
||
The write operator is an operator that consumes one output and writes it to storage. This can range from writing to a Parquet file, to INSERT/DELETE/UPDATE in a database. | ||
The write operator is an operator that consumes one output and writes it to storage. Currently, only named tables and extensions are supported, but the intention is to also support writing files in the future. | ||
|
||
| Signature | Value | | ||
| -------------------- |---------------------------------------------------------| | ||
| Inputs | 1 | | ||
| Outputs | 1 | | ||
| Property Maintenance | Output depends on OutputMode (none, or modified tuples) | | ||
| Direct Output Order | Unchanged from input | | ||
| Signature | Value | | ||
| -------------------- |----------------------------------------------------------| | ||
| Inputs | 1 | | ||
| Outputs | 1 | | ||
| Property Maintenance | Output depends on OutputMode (none, or modified records) | | ||
| Direct Output Order | Unchanged from input | | ||
|
||
### Write Properties | ||
|
||
|
@@ -360,8 +360,8 @@ The write operator is an operator that consumes one output and writes it to stor | |
| Write Type | Definition of which object we are operating on (e.g., a fully-qualified table name). | Required | | ||
| CTAS Schema | The names of all the columns and their type for a CREATE TABLE AS. | Required only for CTAS | | ||
| Write Operator | Which type of operation we are performing (INSERT/DELETE/UPDATE/CTAS). | Required | | ||
| Rel Input | The Rel representing which tuples we will be operating on (e.g., VALUES for an INSERT, or which tuples to DELETE, or tuples and after-image of their values for UPDATE). | Required | | ||
| Output Mode | For views that modify a DB it is important to control, which tuples to "return". Common default is NO_OUTPUT where we return nothing. Alternatively, we can return MODIFIED_TUPLES, that can be further manipulated by layering more rels ontop of this WriteRel (e.g., to "count how many tuples were updated"). This also allows to return the after-image of the change. To return before-image (or both) one can use the reference mechanisms and have multiple return values. | Required for VIEW CREATE/CREATE_OR_REPLACE/ALTER | | ||
| Rel Input | The Rel representing which records we will be operating on (e.g., VALUES for an INSERT, or which records to DELETE, or records and after-image of their values for UPDATE). | Required | | ||
| Output Mode | For views that modify a DB it is important to control, which records to "return". Common default is NO_OUTPUT where we return nothing. Alternatively, we can return MODIFIED_RECORDS, that can be further manipulated by layering more rels ontop of this WriteRel (e.g., to "count how many records were updated"). This also allows to return the after-image of the change. To return before-image (or both) one can use the relation reference mechanisms to use the relation subtree passed to the write input more than once. | Required for VIEW CREATE/CREATE_OR_REPLACE/ALTER | | ||
|
||
|
||
### Write Definition Types | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@curino and I had a long conversation about this offline. We should have done a better job of capturing that discussion here. Yeah, I agree this is a problem. In some circumstances you can imagine from input that there is a implicit rowId column that comes from the input to the delete. However, there are also scenarios where you might be using a different pattern and those should be illegal. @curino wasn't as sold on the problem so we punted on it until someone was actually trying to implement delete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SQL delete is based on a condition, which makes sense to me, so given this structure, I would expect a delete operator in Substrait to take a schema with a single boolean as its relation input... though on second thought, that doesn't really make sense either, because that boolean could be derived from a different table entirely. It's also not obvious at all how you'd make use of indices in that case.
Okay, I guess I see the problem. I imagine @curino had a structure like
read(table X) -> filter(condition) -> write(delete from table X)
in mind for deletions? The problem with that is if I would writeread(table Y) -> filter(condition) -> write(delete from table X)
it would be legal by these rules if table X and Y happen to have the same schema, but equal schema doesn't mean equal table. Likewise I could stick a projection in there to make the schema match and it'd still be nonsense.I feel like the problem is that we're trying to push a square peg in a round hole here. Why are we trying to make these fundamentally different operators satisfy normal relation semantics, and the exact same semantics across a number of them to boot? I didn't want to butt into this discussion at the time because I didn't feel qualified, but maybe I should have...
IMO, update and delete should be a different operator entirely, specifically one without an input. But if we would want to stick to a single relation type for modifications, I would define it like this, based on my limited knowledge of SQL:
But I understand if it's a bit late for that.
ETA: DELETE TABLE is missing from this, but I don't see that in the current definition either to be fair.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jvanstraten thanks for looking into this. I think the challenge @jacques-n and I were faced with is that SQL supports more complex statements where you can write a nearly arbitrary query that affects a
DELETE
orUPDATE
statement. A couple of examples:This made us prefer reusing the
Rel
structure, vs building out a separate dedicated scheme. You are right that you could construct a non-sensical plan, but this is true in general (hence your awesome validator work). The intuition is what you describe, where theinput
selects the tuples we want to affect and either describe its after-image for updates, or the tuples to be deleted. SQL being declarative there is no way to say "delete the first occurrence of this identical tuples", as a consequence folks are used to workarounds such as primary keys or rowIds (per @jacques-n's comment) in the less common scenario where separate handling of cloned tuples are required. Also I believe theExpression
approach above might suffer of the same limitation right?I am not attached to this implementation, and happy to brainstorm/support variations.
I think we should also soon build-out a "golden dataset" of queries we aim to support so that we can iterate variations and compare them against that (or at least convince ourselves we can cover all cases that matter). Folks at MS are starting to using this in practical settings and we will stumble on any detail issue and come back and address them, but if you have ideas/time to work on this, super happy to collaborate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also for the pre/post image I liked the structure we came up with with @jacques-n as it was rather minimal and allowed us to stitch together plans that do any sort of pre/after image with very few additions/fields.
I think beside better comments we should soon get to have a bunch of example SQL->Plan listed in the docs to explain what we mean (I have been hesitating not to overly SQL-ify our website).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, after a few hours of typing (most of which I've now edited out), I think I've narrowed down where my misunderstanding comes from. I now agree that, with some further specification, WriteRel could work, but with the following limitations:
These feel like pretty hefty limitations, considering that the rest of Substrait works just fine for non-key-value-based systems, and considering that Substrait currently doesn't even acknowledge the concept of key columns in the first place. However, I'm not the one who has experience implementing query engines, so I'll yield if you guys think these limitations are perfectly acceptable.
That being said, my proposal does not suffer from these limitations. The key difference is that an expression is defined to be evaluated for each record individually, so any implicit information identifying the record can stay implicit in the execution context of that expression. A consumer could even define a function extension that allows a user to query this implicit information for use in the expression; for example, you could define a function that returns the index of the record in the current iteration, if that is meaningful information for that particular system. It would be, for instance, for a read from a columnar format, which has implicit ordering (even if Substrait does not in general require a meaningful ordering from a read).
Also, I'm confident that I can convert everything that the current WriteRel supports to my proposal. You mention deletions and updates as pain points, so here goes. Delete:
That's pretty much just making explicit what a key-value-based delete operator actually does. I imagine a key-value-based system would special-case that syntax.
Update is uglier:
but only because I'm lacking a proper struct-building expression (and this is not the first time I'm running into this by far; other things it would make redundant are the repeated expressions in aggregate grouping sets, the 2D version of or_list, and struct literals; you could do the same for list and map too and avoid lots of literal shenanigans in fact). It would be much more natural to have only one update expression and require it to yield a struct that matches the schema, rather than requiring individual expressions. That would also solve the problem of differentiating between "don't update the record" (would be a null struct) and "replace a field in the record with null" (would be a struct containing null). The whole thing in Substrait where the schema of a row is defined using the same abstraction as a struct is awesome; I don't see why we're not using it more. But I digress. Assuming that expression would exist, it would become:
Even better would be if the field reference would be generalized to be able to return the entire record as a struct, rather than requiring at least one index operation. Then the
struct(field(0) ... field(n-1))
subexpressions would just become that generalized reference type. I suppose technically you could already do it with a mask expression, not sure if it would be considered legal to apply that to RootReference and OuterReference though.The particular update query you listed can be represented much more naturally, though,
but if SQL supports IMO arcane syntax like the delete statement for updates as well, I imagine it would be harder to generate them like this from SQL in general. That being said, I don't think we should be considering a SQL-to-Substrait producer as the golden standard for how things should be represented in Substrait anyway. There are more producers being worked on than just Isthmus; if Substrait intends to be a generalized interchange format for queries, I don't see why it should receive special treatment. Substrait is not supposed to just be a SQL syntax tree, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jvanstraten. I see your points. Some could be dealt with by constraining which
Rel
can be in the body of aninput
but covering all corner cases might be tricky (and leave them unspecified scatchy ;-)).Let's maybe chat about it in the next community meeting (briefly) and schedule some time for me/you/ @jacques-n and anyone else interested to sit down together and close one this, the async long messages mode is not very efficient to drive convergence.
@jcamachor offline let's consider the UPDATE = INSERT + DELETE and see how hard would it be for a consumer to reconstruct the UPDATE semantics. Having that clear in our heads before meeting with @jvanstraten could help the conversation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, sorting this out in a call (at least to converge on the basics) works for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hijacking this for another thing I only just put one and one together for while writing this: I don't see how WriteRel can ever consistently work for write-only files, because it relies heavily on associative table and read-modify-write semantics. That means it's not the complementary operation of ReadRel. That's, honestly, terrible and confusing. Can we just rename it to UpdateRel or something and reserve WriteRel for a pure sink operation? AFAICT a pure WriteRel would replace CTAS, though...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jvanstraten and @jacques-n we should find time to talk!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@curino It took me this long just to figure out how to put this, but unfortunately some things unrelated to this have happened recently that have made me very wary of continuing to contribute to Substrait. I'm finding it exceedingly and increasingly difficult to formulate my arguments and opinions in ways that are not considered to be hostile or disrespectful, in the end (to no avail) spending more than a day on a single comment to not be dishonest about how frustrated I am while also not stepping on anyone's toes. Additionally, my way of working doesn't seem to mesh well with open-source development. As such I don't think it would help anyone for me to continue contributing, and I have very little desire left to do so myself. That being said, I did enjoy working with you, so I'm sorry things turned out this way :(
I suppose I'll leave this PR open because of all the relevant discussion in here and because one of the things offense was taken to was me closing my own PRs before being completed, but I do not intend to continue working on this. It should probably be converted to issues insofar that is deemed necessary and then closed when deemed appropriate. Either way I'll leave the branches up on my fork for future reference.