Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor SQL generation - graph-first approach #176

Merged
merged 3 commits into from
Oct 2, 2024

Conversation

bplunkett-stripe
Copy link
Collaborator

@bplunkett-stripe bplunkett-stripe commented Sep 27, 2024

Description

Refactor SQL generation such that SQL generators return nested graphs instead of flattened lists of statements. This allows nested hierarchies to be diffed while having root level SQL generators build dependencies on those nested sql generators.

Follow-up PRs:

  1. Leverage this primitive in the table sql generator such that column statements can be referenced from root-level sql generators
  2. Make a proof-of-concept by improving the policy sql generators to depend on individual column statements
  3. Re-organize code per Cleanup balls of mud #101
  4. Implement view support

Motivation

#131

Testing

Acceptance tests pass

}

type sqlVertexGenerator[S schema.Object, Diff diff[S]] interface {
Add(S) (partialSQLGraph, error)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now, SQL generators can return entire graphs for any one operation. I.e. when a table is altered, it can return n addresable SQL vertex nodes, one for each column, that SQL vertex generators will be able to globally reference

@bplunkett-stripe bplunkett-stripe force-pushed the bplunkett/refactor-sql-generation branch 2 times, most recently from febb538 to e03076e Compare October 1, 2024 07:08
@bplunkett-stripe bplunkett-stripe marked this pull request as ready for review October 1, 2024 07:08
@bplunkett-stripe bplunkett-stripe changed the title Refactor SQL generation such that the SQL generators take a graph-fir… Refactor SQL generation Oct 1, 2024
@bplunkett-stripe bplunkett-stripe force-pushed the bplunkett/refactor-sql-generation branch from e03076e to 512395c Compare October 1, 2024 07:12
@bplunkett-stripe bplunkett-stripe changed the title Refactor SQL generation Refactor SQL generation - graph-first approach Oct 1, 2024
statements: nil,
}

// To maintain the correctness of the graph, we will add a dummy vertex for the missing dependencies
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably want to remove this behavior in the future, but that's out-of-scope of this PR

@bplunkett-stripe bplunkett-stripe force-pushed the bplunkett/refactor-sql-generation branch from 512395c to 51df181 Compare October 1, 2024 07:39
Copy link
Collaborator

@alexaub-stripe alexaub-stripe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good on the approach! Some comments but nothing huge


// sqlPriority is an enum for the priority of a statement in the SQL graph, i.e., whether it should be run sooner
// or later in the topological sort of the graph
type sqlPriority int
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Why do we need priority?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prioritize index adds ahead of any sort of index delete (not index replacement). We could alternatively build hard dependencies on all index deletes such that they depend on index deletes, but that's definitely out of scope.

Imagine the following:
User adds index Foobar (created_at, id) and deletes index Fizzbuzz(id). These two indexes are totally independent, but it's important that Foobar is created before Fizzbuzz. It is much easier to use priority than build hard dependencies, albeit the latter is probably more robust.

BLUF:

  • Used to prioritize indexes adds over indexes deletes
  • Fixing this with dependencies is definitely out-of-scope for this PR. This system existed before the refactor

func (s sqlVertex) GetPriority() int {
// Prioritize adds/alters over deletes. Weight by number of statements. A 0 statement delete should be
// prioritized over a 1 statement delete
return len(s.statements) * int(s.priority)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Statement count probably shouldn't be able to override a configured priority which would be surprising. I would only expect statement count to break ties. Some options:

  • Expose a comparison function for sqlVertex instead of GetPriority, so you can order things first by priority then by statement count
  • Otherwise find some scheme to encode priority and statement count losslessly in an integer (ie priority * 1000 + len(statements), though we'd need a limit on statement count)

Copy link
Collaborator Author

@bplunkett-stripe bplunkett-stripe Oct 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Statement count probably shouldn't be able to override a configured priority which would be surprising. I would only expect statement count to break ties.

The way it's coded, that's exactly what it does, since sqlPriority only changes direction and not magnitude.

If we break out "layers" of priority, then we can do this. But until that point, we only have a "prioritize as late as possible" and "prioritize as soon as possible", which only effects direction. In other words, there is never a situation where two nodes sqlPriority are different and the len(s.statements) multiplier actually affects the outcome.

pkg/diff/sql_graph.go Show resolved Hide resolved
pkg/diff/sql_graph.go Outdated Show resolved Hide resolved
graph := newSqlGraph()
for _, vertex := range parts.vertices {
// It's possible the node already exists. merge it if it does
if graph.HasVertexWithId(vertex.GetId()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need to remove the vertex that's already in there along with adding the merged one right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, it will just override the existing vertex. This is the exact same behavior as before.

return statements
}

func concatPartialGraphs(parts ...partialSQLGraph) partialSQLGraph {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: It might be nice to get rid of concatPartialGraphs and just have graphFromPartials take a list of partials. That way we could potentially add validation on partial construction (ie no dupe deps or vertices). It also might be nice to have each separated partial right up until the point where we merge them.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can get rid of concatPartialGraphs because there may be some "nested SQL generators", where we would have concat multiple partials into a bigger partial.

  • TableSQLGenerator -> (ColumnSQLGenerator | CheckConstraintSQLGenerator | etc). Each returns a partial that would unioned, still be a partial, because they might have dependenciecs on higher level SQL generators

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I like about concating partial graphs as we go is it follows a sort-of builder pattern. I think it's a bit more confusing if you have to carry local variables tens of lines down to be included in one function.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figure that sqlgenerators could return lists of partials, but yeah that's fair enough. I'm not too opinionated about this.

pkg/diff/sql_vertex_generator.go Outdated Show resolved Hide resolved
pkg/diff/sql_vertex_generator.go Show resolved Hide resolved
@bplunkett-stripe
Copy link
Collaborator Author

Made the suggested changes with exception to sql priority and concatting partial graphs. See my comments for explanations of the current behavior!

Copy link
Collaborator

@alexaub-stripe alexaub-stripe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Responses sgtm, and change look great!

@bplunkett-stripe bplunkett-stripe merged commit 9216a8f into main Oct 2, 2024
7 checks passed
@bplunkett-stripe bplunkett-stripe deleted the bplunkett/refactor-sql-generation branch October 2, 2024 19:07
aleclarson pushed a commit to pg-nano/pg-schema-diff that referenced this pull request Nov 11, 2024
* Refactor SQL generation such that the SQL generators take a graph-first approach
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants