Refactor SQL generation - graph-first approach #176

bplunkett-stripe · 2024-09-27T07:48:55Z

Description

Refactor SQL generation such that SQL generators return nested graphs instead of flattened lists of statements. This allows nested hierarchies to be diffed while having root level SQL generators build dependencies on those nested sql generators.

Follow-up PRs:

Leverage this primitive in the table sql generator such that column statements can be referenced from root-level sql generators
Make a proof-of-concept by improving the policy sql generators to depend on individual column statements
Re-organize code per Cleanup balls of mud #101
Implement view support

Motivation

#131

Testing

Acceptance tests pass

bplunkett-stripe · 2024-09-27T07:51:00Z

pkg/diff/sql_graph.go

+}
+
+type sqlVertexGenerator[S schema.Object, Diff diff[S]] interface {
+	Add(S) (partialSQLGraph, error)


Now, SQL generators can return entire graphs for any one operation. I.e. when a table is altered, it can return n addresable SQL vertex nodes, one for each column, that SQL vertex generators will be able to globally reference

…st approach

bplunkett-stripe · 2024-10-01T07:34:33Z

pkg/diff/sql_vertex_generator.go

+			statements: nil,
+		}
+
+		// To maintain the correctness of the graph, we will add a dummy vertex for the missing dependencies


We probably want to remove this behavior in the future, but that's out-of-scope of this PR

alexaub-stripe

Looking good on the approach! Some comments but nothing huge

alexaub-stripe · 2024-10-01T19:10:18Z

pkg/diff/sql_graph.go

+
+// sqlPriority is an enum for the priority of a statement in the SQL graph, i.e., whether it should be run sooner
+// or later in the topological sort of the graph
+type sqlPriority int


question: Why do we need priority?

Prioritize index adds ahead of any sort of index delete (not index replacement). We could alternatively build hard dependencies on all index deletes such that they depend on index deletes, but that's definitely out of scope.

Imagine the following:
User adds index Foobar (created_at, id) and deletes index Fizzbuzz(id). These two indexes are totally independent, but it's important that Foobar is created before Fizzbuzz. It is much easier to use priority than build hard dependencies, albeit the latter is probably more robust.

BLUF:

Used to prioritize indexes adds over indexes deletes

Fixing this with dependencies is definitely out-of-scope for this PR. This system existed before the refactor

alexaub-stripe · 2024-10-01T21:12:42Z

pkg/diff/sql_graph.go

+func (s sqlVertex) GetPriority() int {
+	// Prioritize adds/alters over deletes. Weight by number of statements. A 0 statement delete should be
+	// prioritized over a 1 statement delete
+	return len(s.statements) * int(s.priority)


Statement count probably shouldn't be able to override a configured priority which would be surprising. I would only expect statement count to break ties. Some options:

Expose a comparison function for sqlVertex instead of GetPriority, so you can order things first by priority then by statement count

Otherwise find some scheme to encode priority and statement count losslessly in an integer (ie priority * 1000 + len(statements), though we'd need a limit on statement count)

Statement count probably shouldn't be able to override a configured priority which would be surprising. I would only expect statement count to break ties.

The way it's coded, that's exactly what it does, since sqlPriority only changes direction and not magnitude.

If we break out "layers" of priority, then we can do this. But until that point, we only have a "prioritize as late as possible" and "prioritize as soon as possible", which only effects direction. In other words, there is never a situation where two nodes sqlPriority are different and the len(s.statements) multiplier actually affects the outcome.

pkg/diff/sql_graph.go

alexaub-stripe · 2024-10-01T21:19:19Z

pkg/diff/sql_vertex_generator.go

+	graph := newSqlGraph()
+	for _, vertex := range parts.vertices {
+		// It's possible the node already exists. merge it if it does
+		if graph.HasVertexWithId(vertex.GetId()) {


We also need to remove the vertex that's already in there along with adding the merged one right?

Nope, it will just override the existing vertex. This is the exact same behavior as before.

alexaub-stripe · 2024-10-01T21:21:09Z

pkg/diff/sql_vertex_generator.go

+	return statements
+}
+
+func concatPartialGraphs(parts ...partialSQLGraph) partialSQLGraph {


thought: It might be nice to get rid of concatPartialGraphs and just have graphFromPartials take a list of partials. That way we could potentially add validation on partial construction (ie no dupe deps or vertices). It also might be nice to have each separated partial right up until the point where we merge them.

I don't think we can get rid of concatPartialGraphs because there may be some "nested SQL generators", where we would have concat multiple partials into a bigger partial.

TableSQLGenerator -> (ColumnSQLGenerator | CheckConstraintSQLGenerator | etc). Each returns a partial that would unioned, still be a partial, because they might have dependenciecs on higher level SQL generators

One thing I like about concating partial graphs as we go is it follows a sort-of builder pattern. I think it's a bit more confusing if you have to carry local variables tens of lines down to be included in one function.

I figure that sqlgenerators could return lists of partials, but yeah that's fair enough. I'm not too opinionated about this.

pkg/diff/sql_vertex_generator.go

bplunkett-stripe · 2024-10-01T23:12:04Z

Made the suggested changes with exception to sql priority and concatting partial graphs. See my comments for explanations of the current behavior!

alexaub-stripe

Responses sgtm, and change look great!

* Refactor SQL generation such that the SQL generators take a graph-first approach

bplunkett-stripe commented Sep 27, 2024

View reviewed changes

Refactor SQL generation such that the SQL generators take a graph-fir…

24b6beb

…st approach

bplunkett-stripe force-pushed the bplunkett/refactor-sql-generation branch 2 times, most recently from febb538 to e03076e Compare October 1, 2024 07:08

bplunkett-stripe marked this pull request as ready for review October 1, 2024 07:08

bplunkett-stripe changed the title ~~Refactor SQL generation such that the SQL generators take a graph-fir…~~ Refactor SQL generation Oct 1, 2024

bplunkett-stripe force-pushed the bplunkett/refactor-sql-generation branch from e03076e to 512395c Compare October 1, 2024 07:12

bplunkett-stripe added the tech debt label Oct 1, 2024

bplunkett-stripe requested a review from alexaub-stripe October 1, 2024 07:15

bplunkett-stripe changed the title ~~Refactor SQL generation~~ Refactor SQL generation - graph-first approach Oct 1, 2024

bplunkett-stripe commented Oct 1, 2024

View reviewed changes

Update dependency system

51df181

bplunkett-stripe force-pushed the bplunkett/refactor-sql-generation branch from 512395c to 51df181 Compare October 1, 2024 07:39

alexaub-stripe requested changes Oct 1, 2024

View reviewed changes

Suggested changes

238c267

bplunkett-stripe requested a review from alexaub-stripe October 1, 2024 23:11

alexaub-stripe approved these changes Oct 2, 2024

View reviewed changes

bplunkett-stripe merged commit 9216a8f into main Oct 2, 2024
7 checks passed

bplunkett-stripe deleted the bplunkett/refactor-sql-generation branch October 2, 2024 19:07

aleclarson pushed a commit to pg-nano/pg-schema-diff that referenced this pull request Nov 11, 2024

Refactor SQL generation - graph-first approach (stripe#176)

d20f7c4

* Refactor SQL generation such that the SQL generators take a graph-first approach

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor SQL generation - graph-first approach #176

Refactor SQL generation - graph-first approach #176

bplunkett-stripe commented Sep 27, 2024 •

edited

Loading

bplunkett-stripe Sep 27, 2024

bplunkett-stripe Oct 1, 2024

alexaub-stripe left a comment

alexaub-stripe Oct 1, 2024

bplunkett-stripe Oct 1, 2024

alexaub-stripe Oct 1, 2024

bplunkett-stripe Oct 1, 2024 •

edited

Loading

alexaub-stripe Oct 1, 2024

bplunkett-stripe Oct 1, 2024

alexaub-stripe Oct 1, 2024

bplunkett-stripe Oct 1, 2024

bplunkett-stripe Oct 1, 2024

alexaub-stripe Oct 2, 2024

bplunkett-stripe commented Oct 1, 2024

alexaub-stripe left a comment

Refactor SQL generation - graph-first approach #176

Refactor SQL generation - graph-first approach #176

Conversation

bplunkett-stripe commented Sep 27, 2024 • edited Loading

Description

Motivation

Testing

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alexaub-stripe left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bplunkett-stripe Oct 1, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bplunkett-stripe commented Oct 1, 2024

alexaub-stripe left a comment

Choose a reason for hiding this comment

bplunkett-stripe commented Sep 27, 2024 •

edited

Loading

bplunkett-stripe Oct 1, 2024 •

edited

Loading