-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor SQL generation - graph-first approach #176
Conversation
pkg/diff/sql_graph.go
Outdated
} | ||
|
||
type sqlVertexGenerator[S schema.Object, Diff diff[S]] interface { | ||
Add(S) (partialSQLGraph, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now, SQL generators can return entire graphs for any one operation. I.e. when a table is altered, it can return n addresable SQL vertex nodes, one for each column, that SQL vertex generators will be able to globally reference
febb538
to
e03076e
Compare
e03076e
to
512395c
Compare
statements: nil, | ||
} | ||
|
||
// To maintain the correctness of the graph, we will add a dummy vertex for the missing dependencies |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably want to remove this behavior in the future, but that's out-of-scope of this PR
512395c
to
51df181
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good on the approach! Some comments but nothing huge
|
||
// sqlPriority is an enum for the priority of a statement in the SQL graph, i.e., whether it should be run sooner | ||
// or later in the topological sort of the graph | ||
type sqlPriority int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: Why do we need priority?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prioritize index adds ahead of any sort of index delete (not index replacement). We could alternatively build hard dependencies on all index deletes such that they depend on index deletes, but that's definitely out of scope.
Imagine the following:
User adds index Foobar (created_at, id)
and deletes index Fizzbuzz(id)
. These two indexes are totally independent, but it's important that Foobar
is created before Fizzbuzz
. It is much easier to use priority than build hard dependencies, albeit the latter is probably more robust.
BLUF:
- Used to prioritize indexes adds over indexes deletes
- Fixing this with dependencies is definitely out-of-scope for this PR. This system existed before the refactor
func (s sqlVertex) GetPriority() int { | ||
// Prioritize adds/alters over deletes. Weight by number of statements. A 0 statement delete should be | ||
// prioritized over a 1 statement delete | ||
return len(s.statements) * int(s.priority) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Statement count probably shouldn't be able to override a configured priority which would be surprising. I would only expect statement count to break ties. Some options:
- Expose a comparison function for sqlVertex instead of GetPriority, so you can order things first by priority then by statement count
- Otherwise find some scheme to encode priority and statement count losslessly in an integer (ie priority * 1000 + len(statements), though we'd need a limit on statement count)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Statement count probably shouldn't be able to override a configured priority which would be surprising. I would only expect statement count to break ties.
The way it's coded, that's exactly what it does, since sqlPriority
only changes direction and not magnitude.
If we break out "layers" of priority, then we can do this. But until that point, we only have a "prioritize as late as possible" and "prioritize as soon as possible", which only effects direction. In other words, there is never a situation where two nodes sqlPriority are different and the len(s.statements)
multiplier actually affects the outcome.
graph := newSqlGraph() | ||
for _, vertex := range parts.vertices { | ||
// It's possible the node already exists. merge it if it does | ||
if graph.HasVertexWithId(vertex.GetId()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also need to remove the vertex that's already in there along with adding the merged one right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, it will just override the existing vertex. This is the exact same behavior as before.
return statements | ||
} | ||
|
||
func concatPartialGraphs(parts ...partialSQLGraph) partialSQLGraph { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thought: It might be nice to get rid of concatPartialGraphs and just have graphFromPartials take a list of partials. That way we could potentially add validation on partial construction (ie no dupe deps or vertices). It also might be nice to have each separated partial right up until the point where we merge them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can get rid of concatPartialGraphs
because there may be some "nested SQL generators", where we would have concat multiple partials into a bigger partial.
- TableSQLGenerator -> (ColumnSQLGenerator | CheckConstraintSQLGenerator | etc). Each returns a partial that would unioned, still be a partial, because they might have dependenciecs on higher level SQL generators
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing I like about concating partial graphs as we go is it follows a sort-of builder pattern. I think it's a bit more confusing if you have to carry local variables tens of lines down to be included in one function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I figure that sqlgenerators could return lists of partials, but yeah that's fair enough. I'm not too opinionated about this.
Made the suggested changes with exception to sql priority and concatting partial graphs. See my comments for explanations of the current behavior! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Responses sgtm, and change look great!
* Refactor SQL generation such that the SQL generators take a graph-first approach
Description
Refactor SQL generation such that SQL generators return nested graphs instead of flattened lists of statements. This allows nested hierarchies to be diffed while having root level SQL generators build dependencies on those nested sql generators.
Follow-up PRs:
Motivation
#131
Testing
Acceptance tests pass