Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add support for DDL and INSERT/DELETE/UPDATE operations #252

Merged
merged 4 commits into from
Aug 10, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
112 changes: 112 additions & 0 deletions proto/substrait/algebra.proto
Original file line number Diff line number Diff line change
Expand Up @@ -364,6 +364,118 @@ message Rel {
}
}

// A base object for writing (e.g., a table or a view).
message NamedObjectWrite {
// The list of string is used to represent namespacing (e.g., mydb.mytable).
// This assumes shared catalog between systems exchanging a message.
//
repeated string names = 1;
substrait.extensions.AdvancedExtension advanced_extension = 10;
}

// A stub type that can be used to extend/introduce new table types outside
// the specification.
message ExtensionObject {
google.protobuf.Any detail = 1;
}

message DdlRel {
// Definition of which type of object we are operating on
oneof write_type {
NamedObjectWrite named_object = 1;
ExtensionObject extension_object = 2;
}

// The columns that will be modified (representing after-image of a schema change)
NamedStruct table_schema = 3;
// The default values for the columns (representing after-image of a schema change)
// E.g., in case of an ALTER TABLE that changes some of the column default values, we expect
// the table_defaults Struct to report a full list of default values reflecting the result of applying
// the ALTER TABLE operator successfully
Expression.Literal.Struct table_defaults = 4;

// Which type of object we operate on
DdlObject object = 5;

// The type of operation to perform
DdlOp op = 6;

// The body of the CREATE VIEW
Rel view_definition = 7;

enum DdlObject {
DDL_OBJECT_UNSPECIFIED = 0;
// A Table object in the system
DDL_OBJECT_TABLE = 1;
// A View object in the system
DDL_OBJECT_VIEW = 2;
}

enum DdlOp {
DDL_OP_UNSPECIFIED = 0;
// A create operation (for any object)
DDL_OP_CREATE = 1;
// A create operation if the object does not exist, or replaces it (equivalent to a DROP + CREATE) if the object already exists
DDL_OP_CREATE_OR_REPLACE = 2;
// An operation that modifies the schema (e.g., column names, types, default values) for the target object
DDL_OP_ALTER = 3;
// An operation that removes an object from the system
DDL_OP_DROP = 4;
// An operation that removes an object from the system (without throwing an exception if the object did not exist)
DDL_OP_DROP_IF_EXIST = 5;
}
//TODO add PK/constraints/indexes/etc..?
}

// The operator that modifies the content of a database (operates on 1 table at a time, but tuple-selection/source can be
// based on joining of multiple tables).
message WriteRel {
// Definition of which TABLE we are operating on
oneof write_type {
NamedObjectWrite named_table = 1;
ExtensionObject extension_table = 2;
}

// The schema of the table (must align with Rel input (e.g., number of leaf fields must match))
NamedStruct table_schema = 3;

// The type of operation to perform
WriteOp op = 4;

// The relation that determines the tuples to add/remove/modify
// the schema must match with table_schema. Default values must be explicitly stated
// in a ProjectRel at the top of the input. The match must also
// occur in case of DELETE to ensure multi-engine plans are unequivocal.
Rel input = 5;

// Output mode determines what is the output of executing this rel
OutputMode output = 6;

enum WriteOp {
WRITE_OP_UNSPECIFIED = 0;
// The insert of new tuples in a table
WRITE_OP_INSERT = 1;
// The removal of tuples from a table
WRITE_OP_DELETE = 2;
// The modification of existing tuples within a table
WRITE_OP_UPDATE = 3;
// The Creation of a new table, and the insert of new tuples in the table
WRITE_OP_CTAS = 4;
}

enum OutputMode {
OUTPUT_MODE_UNSPECIFIED = 0;
// return no tuples at all
OUTPUT_MODE_NO_OUTPUT = 1;
// this mode makes the operator return all the tuple INSERTED/DELETED/UPDATED by the operator.
// The operator returns the AFTER-image of any change. This can be further manipulated by operators upstreams
// (e.g., retunring the typical "count of modified tuples").
// For scenarios in which the BEFORE image is required, the user must implement a spool (via references to
// subplans in the body of the Rel input) and return those with anounter PlanRel.relations.
OUTPUT_MODE_MODIFIED_TUPLES = 2;
}
}

// The argument of a function
message FunctionArgument {
oneof arg_type {
Expand Down
66 changes: 51 additions & 15 deletions site/docs/relations/logical_relations.md
Original file line number Diff line number Diff line change
Expand Up @@ -310,32 +310,40 @@ If at least one grouping expression is present, the aggregation is allowed to no
%%% proto.algebra.AggregateRel %%%
```


## Write Operator

The write operator is an operator that consumes one output and writes it to storage. A simple example would be writing Parquet files. It is expected that many types of writes will be added over time.
The write operator is an operator that consumes one output and writes it to storage. This can range from writing to a Parquet file, to INSERT/DELETE/UPDATE in a database.

| Signature | Value |
| -------------------- | --------------- |
| Inputs | 1 |
| Outputs | 0 |
| Property Maintenance | N/A (no output) |
| Direct Output Order | N/A (no output) |
| Signature | Value |
| -------------------- |---------------------------------------------------------|
| Inputs | 1 |
| Outputs | 1 |
| Property Maintenance | Output depends on OutputMode (none, or modified tuples) |
| Direct Output Order | Unchanged from input |

### Write Properties

| Property | Description | Required |
| --------------------------- | ------------------------------------------------------------ | --------------------------- |
| Definition | The contents of the write property definition. | Required |
| Field names | The names of all struct fields in breadth-first order. | Required |
| Masked Complex Expression | The masking expression applied to the input record prior to write. | Optional, defaults to all |
| Rotation description fields | A list of fields that can be used for stream description whenever a stream is reset. | Optional, defaults to none. |
| Rotation indicator | An input field ID that describes when the current stream should be "rotated". Individual write definition types may support the ability to rotate the output into one or more streams. This could mean closing and opening a new file, finishing and restarting a TCP connection, etc. If a rotation indicator is available, it will be 0 except when a rotation should occur. Rotation indication are frequently defined by things like discrete partition values but could be done based on number of records or other arbitrary criteria. | Optional, defaults to none. |

| Property | Description | Required |
|----------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------|
| Write Type | Definition of which object we are operating on (e.g., a fully-qualified table name). | Required |
| CTAS Schema | The names of all the columns and their type for a CREATE TABLE AS. | Required only for CTAS |
| Write Operator | Which type of operation we are performing (INSERT/DELETE/UPDATE/CTAS). | Required |
| Rel Input | The Rel representing which tuples we will be operating on (e.g., VALUES for an INSERT, or which tuples to DELETE, or tuples and after-image of their values for UPDATE). | Required |
| Output Mode | For views that modify a DB it is important to control, which tuples to "return". Common default is NO_OUTPUT where we return nothing. Alternatively, we can return MODIFIED_TUPLES, that can be further manipulated by layering more rels ontop of this WriteRel (e.g., to "count how many tuples were updated"). This also allows to return the after-image of the change. To return before-image (or both) one can use the reference mechanisms and have multiple return values. | Required for VIEW CREATE/CREATE_OR_REPLACE/ALTER |


### Write Definition Types

Write definition types are built by the community and added to the specification. This is a portion of specification that is expected to grow rapidly.


=== "WriteRel Message"

```proto
%%% proto.algebra.WriteRel %%%
```

#### Virtual Table

| Property | Description | Required |
Expand All @@ -353,6 +361,34 @@ Write definition types are built by the community and added to the specification
| Format | Enumeration of available formats. Only current option is PARQUET. | Required |


## DDL Operator

The operator that defines modifications of a database schema (CREATE/DROP/ALTER for TABLE and VIEWS).

| Signature | Value |
| -------------------- |-----------------|
| Inputs | 1 |
| Outputs | 0 |
| Property Maintenance | N/A (no output) |
| Direct Output Order | N/A |


### DDL Properties

| Property | Description | Required |
|-----------------|-----------------------------------------------------------------|--------------------------------------------------|
| Write Type | Definition of which type of object we are operating on. | Required |
| Table Schema | The names of all the columns and their type. | Required (except for DROP operations) |
| Table Defaults | The set of default values for this table. | Required (except for DROP operations) |
| DDL Object | Which type of object we are operating on (e.g., TABLE or VIEW). | Required |
| DDL Operator | The operation to be performed (e.g., CREATE/ALTER/DROP). | Required |
| View Definition | A Rel representing the "body" of a VIEW. | Required for VIEW CREATE/CREATE_OR_REPLACE/ALTER |

=== "DdlRel Message"

```proto
%%% proto.algebra.DdlRel %%%
```

## Discussion Points

Expand Down