Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add support for DDL and INSERT/DELETE/UPDATE operations #252

Merged
merged 4 commits into from
Aug 10, 2022

Conversation

curino
Copy link
Contributor

@curino curino commented Jul 20, 2022

In collaboration with Jesus Camacho Rodriguez

@curino
Copy link
Contributor Author

curino commented Jul 20, 2022

Cleaned up git commit log (I spend most of my time fighting this linter!).

@curino
Copy link
Contributor Author

curino commented Jul 26, 2022

@jacques-n and others, how does this version look?

@jacques-n
Copy link
Contributor

Hey @curino , this is on my list to review. Will try to get to it soon.

One thing I still don't understand is the output item in writerel. Can you give a concrete example in pseudo code/description?

@curino
Copy link
Contributor Author

curino commented Jul 28, 2022

Thanks @jacques-n!

Most DBMS return information about how the insert/delete/update operation went. These takes few forms:

  1. No output (default)
  2. A simple count of how many tuples were affected by the operation
  3. Returning the set of tuples deleted/inserted
  4. Returning the before (or after, or both) image of an update
  5. An arbitrary computation of the affected tuples

What I learned reading few doc pages from various systems is that this is to simplify the common case where apps want to be able to double check how an operation went (e.g., validate that roughly the right number of tuples were affected) and potentially abort/compensate if things were off. This would otherwise required temp tables and complex multi-table transactions, so it is a rather appreciated/used feature.

Including an output term allows us to capture any of the above scenarios in a compact way. Basically we are telling the systems to use input query to determin which tuples to change and output to report on what was done per user required semantics. In some systems there are special keywords the output query can refer to such as BEFORE and AFTER (think of those as tables capturing the before and after image of the affected tuples). I think it is ok to expect that the system generating the substrait plan produces valid pairs of input and output Rels (some validation is possible, but not very deep).

I considered to simplify this and hardcode the "# of affected tuples" semantics as a return value, but this does seem to miss what I read was a rather common use case. I hope this helps clarify the rationale (happy to chat in the community call if it is easier).

Thanks,
Carlo

@curino
Copy link
Contributor Author

curino commented Aug 3, 2022

Per conversation in latest community meeting, I modified the output mechanisms to leverage a spool (via references) for the BEFORE image, and a simple enum NO_OUTPUT or MODIFIED_TUPLES to control the operator output, that we can then process (e.g., with a COUNT) to shape the output of AFTER image changes.

I also realized that in the original docs there were references to a bunch of other things (rotations, etc.) that are not currently represented (or at least I am not clear on what they mean). I would propose we converge on this patch, and then do further iterations to reflects further/fancier things (e.g., we left constraints and indexes out of table definitions as well).

@curino
Copy link
Contributor Author

curino commented Aug 8, 2022

@jacques-n ping...

DDL_OP_ALTER_TABLE = 3;

DDL_OP_CREATE_VIEW = 4;
DDL_OP_CREATE_OR_REPLACE = 5;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't feel like an operation. It fields like a subcategory of the other operation types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could have the DDL_OP being TABLE or VIEW and then have another field capturing CREATE, ALTER, DROP (where VIEW + ALTER is equivalent to the CREATE_OR_REPLACE we have now)

// Definition of which type of write operation is to be performed
oneof write_type {
NamedTableWrite named_table = 1;
ReadRel.LocalFiles local_files = 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm inclined to just include named table and extension table right now. (With the latter being completely independent of the read type.)

Amongst other things, I'm not convinced that a read spec for files is the same as a write spec.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could include tables only for now, but eventually we want WriteRel to write to file as well I think.

// The columns that will be modified (representing after-image of a schema change)
NamedStruct base_schema = 2;
// The default values for the columns (representing after-image of a schema change)
repeated Expression.Literal defaults = 3;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like it doesn't belong here. It isn't really specific to a named table, right? Maybe it should be at the write rel level?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As noted, if we move ctas, this can also exist inside of writerel. Note that we should specifically declare patterns around the requirement of including this (or not) and the behavior of it. I'd also note that after-image isn't clear here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if we move CTAS we need schema (names+types) and defaults for CREATE_TABLE right?

DDL_OP_UNSPECIFIED = 0;
DDL_OP_CREATE_TABLE = 1;
DDL_OP_DROP_TABLE = 2;
DDL_OP_ALTER_TABLE = 3;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest we move CTAS to WriteRel. This removes the write_type stuff entirely from this and keeps it only in WriteRel. (This may mean we need to come up with a different name than DDL since that generally includes CTAS.)

Separately, I think we should probably remove CREATE TABLE and ALTER TABLE since there isn't sufficient information here to actually do them. (We need schema, etc.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we keep the write_type it contains a NamedTableWrite, which includes all we need for CREATE or ALTER TABLE I think (table name, column names, column types, default values).

// A base table for writing. The list of string is used to represent namespacing (e.g., mydb.mytable).
// This assumes shared catalog between systems exchanging a message.
// it also includes a base schema, and default values
message NamedTableWrite {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned elsewhere, let's move this into WriteRel and keep it constrained to that location.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this prevents us to do all `CREATE' and 'ALTER' operators.

message NamedTableWrite {
repeated string names = 1;
// The columns that will be modified (representing after-image of a schema change)
NamedStruct base_schema = 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't figure out how to use names + base schema to map my partial input onto a broader schema. I also don't think this belongs in Named Table, feels like a generic/cross table concept. After moving properties to writerel, I think we need a better clarity around what this write is and how it maps to an underlying schema. One option:

final schema and a indexbased mapping mapping the input schema to the output. For example:

input: (a int, c int)
output: (a int, b int, c int, d int)
defaults: (1, 4, 7, 6) -- defined based on the output schema.
map: (0,2) maps: input[0] => output[0] and input[1] => output[3]

I can't see how to use a set of names to do this mapping since there could be dupes at various levels of schema. (It also seems anti-substrait-patterns to use some form of naming system to map two schemas).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also probably have an option around defaults: do we only apply for missing columns or do we also apply for null values. (The latter is a lot more expensive...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thinking is as follows:

  1. names is simply a qualified table name (like mydb.table). I am copying the pattern from ReadRel.NamedTable. Maybe we should call it qualified_table_name or something like that.
  2. base_schema describes the entire schema we want in output (after an ALTER or as a result of CREATE or INSERt operation etc.)
  3. The content is positionally (as usual) mapped to input. For example if you are doing UPDATE TABLE foo WHERE col1=3 WHERE cond, we would have in the input something like SELECT "3", col2, col3... FROM foo WHERE cond.

In hindsight we might be able to skip the defaults field altogether and use the input for that (using a set of constants if we are issuing a CREATE TABLE for example).

@curino
Copy link
Contributor Author

curino commented Aug 9, 2022

Thanks @jacques-n for the live-chat. I have addressed all we discussed, and updated some of the docs accordingly.

jacques-n
jacques-n previously approved these changes Aug 10, 2022
Copy link
Contributor

@jacques-n jacques-n left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One little leftover that should be cleaned but otherwise looking good. I'll fix and merge assuming tests pass.

proto/substrait/algebra.proto Outdated Show resolved Hide resolved
@jacques-n jacques-n merged commit cbb6c26 into substrait-io:main Aug 10, 2022
@curino
Copy link
Contributor Author

curino commented Aug 10, 2022

Thanks @jacques-n and @jcamachor for the help shaping this.

@jvanstraten
Copy link
Contributor

I'm a bit late to the party now (was on vacation, sorry), but I can't help but notice that oneof rel_type wasn't updated. AFAICT the new messages are entirely unreachable from Plan. Is that intentional at this time? I haven't read any of this in detail yet though, so I might have missed something.

jvanstraten added a commit to jvanstraten/substrait that referenced this pull request Aug 15, 2022
…ubstrait-io#284 (relation references)

 - add new relation types to rel_type to make them usable
 - add constraints to prevent cyclic relation references
 - document how relation references work in the relation basics section
 - s/tuples/records/g for naming consistency
 - move ReferenceRel out of the AggregateFunction message scope

BREAKING CHANGE: various messages and semantics that were not yet reachable from the
Plan message were changed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants