-
Notifications
You must be signed in to change notification settings - Fork 4
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
This introduces the graph data type into the PartiQL type system. This does not cover query/construction/representation of graph, but is the basis for such operations. Resolves #15
- Loading branch information
Showing
4 changed files
with
176 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,176 @@ | ||
- Start Date: 2022-08-18 | ||
- PartiQL Issue: [partiql/partiql-docs/#15](https://github.com/partiql/partiql-docs/issues/15) | ||
- RFC PR: [partiql/partiql-docs/#25](https://github.com/partiql/partiql-docs/issues/25) | ||
|
||
# Summary | ||
[summary]: #summary | ||
|
||
Introduces the graph data type for the PartiQL type system defining logical representation, but not operations on the | ||
data type. | ||
|
||
# Motivation | ||
[motivation]: #motivation | ||
|
||
Graph databases such as Amazon Neptune are becoming more popular for use in applications where the traditional SQL | ||
databases and NoSQL databases are inadequate to model data with sprawling relationships. Typically, these kinds of | ||
applications are doing path traversals over a graph in a way that that would be awkward to express in an equivalently | ||
modeled relational database. The question is can we and should we have a representation in the PartiQL type system that | ||
abstracts a graph, provide graph-specific query operations, and unify this with the rest of the type system as we do | ||
with relations and structs? Much like the goal of PartiQL is to unify nested data with relational, we should be thinking | ||
of graph data similarly. | ||
|
||
This proposal introduces the graph data type as a first-class type in the PartiQL type system. | ||
|
||
Out of scope for this document is the particular syntax around graph query itself (beyond straw proposals to demonstrate | ||
the model and how it would operate in PartiQL) or the syntax for expressing or serializing/de-serializing graph data | ||
directly (similar to struct or bag expressions) to/from PartiQL. | ||
|
||
# Guide-level explanation | ||
[guide-level-explanation]: #guide-level-explanation | ||
|
||
To introduce the PartiQL graph data type, we can first consider the other _aggregate_ data types (i.e., those containing | ||
PartiQL values) such as bag, list, and struct. We can use the struct and list data types as helpful examples. The | ||
struct data type is a collection of members that have an *attribute name* associated with any PartiQL value. | ||
|
||
![Example Struct](./0025-graph-data-model/struct.png) | ||
|
||
In the above, the PartiQL model models the attribute names and the association to the attribute’s values are a | ||
*property* of the struct, not a property of the value contained within the struct. This is important, and also | ||
indicative of how we extract these associations in PartiQL: | ||
|
||
```sql | ||
SELECT a, v FROM UNPIVOT my_struct AS v AT a | ||
``` | ||
|
||
In this case the we use the `UNPIVOT` operator to bind the associated attribute names to their members to variables. | ||
This query can be read informally as “find all **`a`** and **`v`** such that **`a`** and **`v`** is an attribute/value | ||
pair in **`my_struct`**.” The list data type similarly models the association of position ordinal to member value. | ||
|
||
![Example List](./0025-graph-data-model/list.png) | ||
|
||
Which is similarly extracted as variables: | ||
|
||
```sql | ||
SELECT i, v FROM my_list AS v AT i | ||
``` | ||
|
||
For the graph data type, we model something very similar. A graph is a collection of *vertices* and *edges* that connect | ||
them with associated direction. All vertices and edges have a *label* (similar to the attribute name in struct) that | ||
must be a string or `NULL`. The value of a vertex or an edge can be any PartiQL value. The following diagram illustrates | ||
the model: | ||
|
||
![Example Graph](./0025-graph-data-model/graph.png) | ||
|
||
In the above, we have a graph with three vertices, two labeled **`a`** and one labeled **`b`**. We have three edges, one | ||
labeled **`x`** and two labeled **`y`**. The relationships of the edges to their respective vertices are fully contained | ||
within the graph. All values within the graph, either an edge or vertex, can be *any* PartiQL value. This generalization | ||
is consistent with the other container types and fits nicely in PartiQL’s data model. This also means that PartiQL | ||
graphs *could* have vertices or edges that themselves be graphs, and likewise values can be as simple as scalar values. | ||
|
||
As a common example, let’s consider how we could model a *labeled property graph (LPG)* to PartiQL. For LPG, each vertex | ||
and edge are a property bag which is a struct in PartiQL. | ||
|
||
Similarly, Resource Description Framework (RDF)[^1] graphs could be modeled in PartiQL by having non-literal, non-blank | ||
vertices and edges labeled by URI strings with their values being NULL. RDF literals could be a NULL labeled node with | ||
their value being any corresponding PartiQL value (this is a generalization of RDF as literals are only strings in RDF’s | ||
model). RDF blank nodes can be denoted with a label that never conflicts with URI such as **`_:my_blank`** (**`_`** is | ||
never a valid scheme for a URI). | ||
|
||
Even though PartiQL generalizes the graph data model, it is *not required* that a database actually exposes arbitrary | ||
values at vertex or edge values, this is similar to PartiQL over a relational table, where attributes of a row are | ||
restricted to scalars. | ||
|
||
## Data Model Integration with PartiQL Query | ||
|
||
The SQL/PGQ specification is currently in progress, but has published elements of their work[^2]. It is important that | ||
PartiQL aligns to an SQL standard that arises around graph query insofar as it is practically acceptable. Let us | ||
consider a straw example of what a PartiQL graph query could look like and mean with respect to this data model. | ||
|
||
```sql | ||
SELECT the_a.name AS src, the_b.name AS dest | ||
FROM my_graph MATCH (the_a:a) -[the_y:y]-> (the_b:b) | ||
WHERE the_y.score > 10 | ||
``` | ||
|
||
In the above example, the `MATCH` sub-clause is working similar to how `UNPIVOT` works, it is effectively saying find | ||
all **`the_a`**, **`the_y`**, and **`the_b`** such that the graph pattern matching association holds. These names are | ||
then bound to variables that are then usable in other clauses. In other words, the loose specification of the `MATCH` | ||
sub-clause is that it returns a bag of variable bindings much like any other `FROM` source. Similar to the way list | ||
ordinals and struct ordinals work, the relationship matching in the graph operators are scoped to a single graph | ||
instance and has no implications outside of that value. Such a `MATCH` clause could be as complex as needed (having | ||
other sub-clauses) to perform the appropriate graph query constructs. | ||
|
||
# Drawbacks | ||
[drawbacks]: #drawbacks | ||
|
||
*TBD* | ||
|
||
# Rationale and Alternatives | ||
[rationale-and-alternatives]: #rationale-and-alternatives | ||
|
||
Some earlier discussions in graph support for PartiQL indicated that a reference type (i.e., pointer or alias) could | ||
“solve” adding graph data for PartiQL. While this primitive could be used to construct graphs, it would not have the | ||
same degree of abstraction as the proposed data type and it creates and issue that an associated value that is not fully | ||
contained within a PartiQL value. | ||
|
||
The abstraction problem can be illustrated by the LPG example, how might we solve this with references? Since scalars | ||
themselves are atomic values, references would have to be contained in some container type such as a list or struct. Now | ||
vertices **must** be a struct or a list, and now we’re defining a convention which is a substitution for strong typing. | ||
Another problem in the abstraction is how do we model properties on the edges and multiple edges for a given label? | ||
Again, we need to now model the edge property from the source as either a list/bag of references or a single reference | ||
and we need to introduce an intermediate struct between the two vertex struct values with some convention. Assuming we | ||
defined this convention, how would a different graph model work such as RDF? Another convention could be defined but now | ||
we have the problem of how do we determine if the convention is being used or not (e.g., valid for a MATCH | ||
sub-clause)—this is introducing the concept without typing the concept. If the answer is schema—that is the same as | ||
saying we have some kind of notion of static type. References are being used here to serve as a potential implementation | ||
detail that leaks into the logical model. A similar rationale could be used for the list data type. The relational model | ||
could easily represent a list as a bag of structs containing an ordinal and value—but PartiQL has a first-class type | ||
because it is often the case that we have operations directly on lists that are of value (e.g., accessing an element by | ||
ordinal). | ||
|
||
# Prior Art | ||
[prior-art]: #prior-art | ||
|
||
*TBD* | ||
|
||
# Unresolved Questions | ||
[unresolved-questions]: #unresolved-questions | ||
|
||
## Data Model | ||
|
||
As defined, a graph's label *must* be a value, should it be allowed that a graph label can be `MISSING`? | ||
|
||
## Graph Construction | ||
|
||
While outside the scope of this document to define such syntax, it is important to consider how graph data types might | ||
be serialized or constructed. A database could implement a view over a relational representation of a graph with this | ||
data type. This pattern is seen in databases such as Oracle, where a set of tables can be treated as a graph. Likewise, | ||
PartiQL could adopt minimal syntax extensions from something like Cypher[^3] to unify its DML with graph manipulation. | ||
Also, similar to bag, list, and struct constructor expressions, we could introduce graph constructor expressions to | ||
create graph values (e.g., literals in expressions). | ||
|
||
# Future possibilities | ||
[future-possibilities]: #future-possibilities | ||
|
||
Think about what the natural extension and evolution of your proposal would | ||
be and how it would affect the language and project as a whole in a holistic | ||
way. Try to use this section as a tool to more fully consider all possible | ||
interactions with the project and language in your proposal. | ||
Also consider how this all fits into the roadmap for the project. | ||
|
||
This is also a good place to "dump ideas", if they are out of scope for the | ||
RFC you are writing but otherwise related. | ||
|
||
If you have tried and cannot think of any future possibilities, | ||
you may simply state that you cannot think of anything. | ||
|
||
Note that having something written down in the future-possibilities section | ||
is not a reason to accept the current or a future RFC; such notes should be | ||
in the section on motivation or rationale in this or subsequent RFCs. | ||
The section merely provides additional information. | ||
|
||
[^1]: <https://www.w3.org/TR/rdf11-concepts/> | ||
|
||
[^2]: <https://arxiv.org/pdf/2112.06217.pdf> | ||
|
||
[^3]: <https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf> |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.