Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move subqueries to use the operator model #13750

Merged
merged 110 commits into from
Sep 29, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
110 commits
Select commit Hold shift + click to select a range
a344c3a
renamed planbuilder, operator and engine primitive
systay Aug 11, 2023
98be673
wip - handle EXISTS and push it down through an ApplyJoin
systay Aug 15, 2023
ecb83d8
wip - change subquery pushing
systay Aug 15, 2023
f359ed0
wip - semiJoin from EXISTS now planner correctly
systay Aug 16, 2023
756e935
more work on semijoin and exists
systay Aug 16, 2023
e0f00ce
handle IN queries usign SemiJoins
systay Aug 17, 2023
f7d01e7
only add the planner phases that are needed for the current query
systay Aug 17, 2023
04ddc38
remove SubQueryInner
systay Aug 17, 2023
8aebcb9
clean up UncorrelatedSubquery
systay Aug 17, 2023
9902d54
handle uncorrelated subqueries better
systay Aug 18, 2023
20e1340
better support for PulloutValue subqueries
systay Aug 18, 2023
0761a7a
move the query signature to the semantic analysis
systay Aug 19, 2023
7df2fdd
use constructor
systay Aug 19, 2023
347a36a
handle subquery filter
systay Aug 22, 2023
337ec81
handle exists well in merging
systay Aug 22, 2023
c988fad
make merging great again
systay Aug 22, 2023
71e19ad
go back to single expression AddColumn
systay Aug 22, 2023
eb06afd
feat: refactor and fix subqery merge logic
harshit-gangal Aug 23, 2023
3da93bc
feat: ignore columns for subquery merge when comparison is not equal …
harshit-gangal Aug 23, 2023
60dddd5
fix: set pullout variables on operator to set them on engine primitiv…
harshit-gangal Aug 23, 2023
7800ca9
feat: push filter on subquery filter
harshit-gangal Aug 23, 2023
f26eb69
handle pushing subqueries when the subquery depends on both sides of …
systay Aug 24, 2023
24ab078
update some plantests
systay Aug 24, 2023
876f557
push projections under subqueries
systay Aug 24, 2023
1bc2b4c
move the subquery-filter to be closer to the ApplyJoin
systay Aug 24, 2023
ca8e7c9
make it possible to push aggregation through subquery
systay Aug 24, 2023
8f2e397
lots of work on recursive subqueries
systay Aug 28, 2023
126a37f
Merge remote-tracking branch 'upstream/main' into subq-op
harshit-gangal Aug 29, 2023
8d7811f
fail for cases when correlated subquery is not an exists query
harshit-gangal Aug 29, 2023
8055ea7
calculate the join columns and outer columns needed on demand
systay Aug 29, 2023
f6b25b3
updated plantests
systay Aug 29, 2023
3856cb3
push subqueries to the side that they have connection to
systay Aug 29, 2023
c39c5ce
remove ExtractedSubquery and uses
systay Aug 30, 2023
6eb509e
make the query rewriting work again
systay Aug 30, 2023
52c5778
dont push limit under subqueries
systay Aug 30, 2023
b0c0e7b
handle subqueries on join predicates
systay Aug 30, 2023
cee43e9
refactor: extract shared code between ExistsExpr and ComparisonExpr
systay Aug 30, 2023
97868fd
handle more types of subqueries in the WHERE clause
systay Aug 30, 2023
45ef83c
Merge remote-tracking branch 'upstream/main' into subq-op
systay Aug 30, 2023
f38f5dd
refactor: remove SubQuery interface
systay Aug 30, 2023
eb1d289
Merge remote-tracking branch 'upstream/main' into subq-op
systay Sep 4, 2023
9dbd7b4
change SQC to return the subquery
systay Sep 4, 2023
419dbfc
typo
systay Sep 4, 2023
b1991cb
minor refactoring
systay Sep 4, 2023
8213ea6
add support for subquery projections
systay Sep 5, 2023
b455f80
update more tests
systay Sep 6, 2023
407636e
feat: merge subqueries with more predicates after it
GuptaManan100 Sep 6, 2023
0b36163
catch the first subquery, not the last
systay Sep 6, 2023
ca7e38e
Fix not exist subqueries
frouioui Sep 6, 2023
87f9eb6
handle stars in projections when possible
systay Sep 11, 2023
d4008e8
tiny fix and update plan-tests
systay Sep 11, 2023
29cc45e
Merge remote-tracking branch 'upstream/main' into subq-op
systay Sep 11, 2023
b2f5aea
improve support for correlated subqueries
frouioui Sep 11, 2023
5bcb900
add the capability of skipping plan tests
systay Sep 12, 2023
868c6ba
move subquery code to one file
systay Sep 12, 2023
c2e05c9
fail correlated projection subqueries
systay Sep 12, 2023
b21ab58
fix some projections and break others
systay Sep 12, 2023
e8cb962
make all ProjExpr implement the interface by ref
systay Sep 12, 2023
d2a4443
move selectExprs closer to the source
systay Sep 12, 2023
b823e7d
refactor: clean up projections
systay Sep 13, 2023
89c5431
refactor: small type refactoring
systay Sep 13, 2023
8de4d9a
Merge remote-tracking branch 'upstream/main' into subq-op
harshit-gangal Sep 14, 2023
860f702
handle comments and locks using an operator
systay Sep 14, 2023
58fcb33
handle UPDATE with subqueries in the SET clause
systay Sep 15, 2023
97023e8
fix subquery that is merged on the RHS of a join
frouioui Sep 15, 2023
5f90c4b
clean up argument name creation
systay Sep 16, 2023
17402f9
Merge remote-tracking branch 'upstream/main' into subq-op
systay Sep 16, 2023
741649a
DRYify the project operator - add columns refactoring
systay Sep 16, 2023
df79570
small simplification
systay Sep 16, 2023
e9f2315
handle weightstrings on projected columns
systay Sep 16, 2023
16a31fc
get the tabletSet for the subquery
systay Sep 16, 2023
5a9de19
handle subquery merging better
systay Sep 17, 2023
252ee15
rewrite not expressions
systay Sep 17, 2023
5520a31
clean up subquery handling
systay Sep 17, 2023
386da80
only merge subqueries that are at the top-level
systay Sep 17, 2023
972c1c3
refactor: clean up of code
systay Sep 17, 2023
5a9de9c
minor fixups
systay Sep 19, 2023
8864385
remove precalculated field and build it as needed instead
systay Sep 20, 2023
e70c5f2
refactoring of how we handle the pushing of subqueries through joins
systay Sep 20, 2023
5b74e3e
Merge remote-tracking branch 'upstream/main' into subq-op
systay Sep 20, 2023
108a624
do the join column calculation upfront
frouioui Sep 20, 2023
6d00c9a
search the ON conditions for predicates binding inner and outer subqu…
systay Sep 21, 2023
1fb146a
remove unneeded check
systay Sep 21, 2023
d3d426e
first union query inside subquery passing
systay Sep 21, 2023
49aefe5
handle predicates inside unions inside subqueries
systay Sep 21, 2023
54d6bd9
updated tpch plan tests
systay Sep 21, 2023
f0c76ee
handle EXISTS projections correctly
systay Sep 21, 2023
155dc02
create subquery planner for having clause
harshit-gangal Sep 22, 2023
6cc5a6d
Merge remote-tracking branch 'upstream/main' into subq-op
systay Sep 23, 2023
792ce30
last couple of tests turned green
systay Sep 26, 2023
a89e3d5
check that subqueries used in comparisons have the correct number of …
systay Sep 26, 2023
f9705b4
remove column number check from engine primitive and remove outdated …
systay Sep 26, 2023
b756846
restore the space between exists and the parens
systay Sep 26, 2023
df4b8fa
remove invalid queries
systay Sep 26, 2023
ba8ea10
make sure the engine primitive handles the new plans correctly
systay Sep 26, 2023
9ae46d1
comments and renaming for clarification
systay Sep 26, 2023
bcb657d
refactoring & remove silly test
systay Sep 26, 2023
e66336a
Rename `pushDown` to `push` in the operator package
systay Sep 26, 2023
7ab4d8c
remove another query with subq in outer join condition
systay Sep 27, 2023
894823e
bug: check the output columns on commented queries
systay Sep 27, 2023
a104a2d
test: fix test expectation and add a comment explaining it
GuptaManan100 Sep 27, 2023
173ea54
feat: fix pushing order by underneath an aggregation
GuptaManan100 Sep 27, 2023
eb23151
bug: fix the subquery merging logic
systay Sep 27, 2023
baeeabd
address review comments
systay Sep 27, 2023
b804997
extract subquery building from subquery container
systay Sep 28, 2023
5f4b40d
allow merging but not routing if predicates are deep in expression tree
systay Sep 28, 2023
b513c99
clean up projection subquery planning
systay Sep 29, 2023
02afec8
Merge remote-tracking branch 'upstream/main' into subq-op
systay Sep 29, 2023
826a3bb
handle subquery with vindex value on update better with blocking merge
harshit-gangal Sep 29, 2023
77e4cb7
make sure to handle Exists in projections correctly
systay Sep 29, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions go/slice/slice.go
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ func Any[T any](s []T, fn func(T) bool) bool {
return false
}

// Map applies a function to each element of a slice and returns a new slice
func Map[From, To any](in []From, f func(From) To) []To {
if in == nil {
return nil
Expand All @@ -49,6 +50,7 @@ func Map[From, To any](in []From, f func(From) To) []To {
return result
}

// MapWithError applies a function to each element of a slice and returns a new slice, or an error
func MapWithError[From, To any](in []From, f func(From) (To, error)) (result []To, err error) {
if in == nil {
return nil, nil
Expand All @@ -62,3 +64,17 @@ func MapWithError[From, To any](in []From, f func(From) (To, error)) (result []T
}
return
}

// Filter returns a new slice containing only the elements for which the predicate returns true
func Filter[T any](in []T, f func(T) bool) []T {
if in == nil {
return nil
}
systay marked this conversation as resolved.
Show resolved Hide resolved
result := make([]T, 0, len(in))
for _, col := range in {
if f(col) {
result = append(result, col)
}
}
return result
}
45 changes: 18 additions & 27 deletions go/test/endtoend/vtgate/gen4/gen4_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -60,25 +60,6 @@ func TestCorrelatedExistsSubquery(t *testing.T) {
utils.AssertMatches(t, mcmp.VtConn, `select id from t1 where id in (select id from t2) order by id`,
`[[INT64(1)] [INT64(100)]]`)

utils.AssertMatches(t, mcmp.VtConn, `
select id
from t1
where exists(
select t2.id, count(*)
from t2
where t1.col = t2.tcol2
having count(*) > 0
)`,
`[[INT64(100)]]`)
utils.AssertMatches(t, mcmp.VtConn, `
select id
from t1
where exists(
select t2.id, count(*)
from t2
where t1.col = t2.tcol1
) order by id`,
`[[INT64(1)] [INT64(4)] [INT64(100)]]`)
systay marked this conversation as resolved.
Show resolved Hide resolved
Comment on lines -63 to -81
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason for removing these test cases

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, they are not valid. they are not following the ONLY_FULL_GROUP_BY directive, but this test doesn't change sql_mode on the connection, so the query fails.

utils.AssertMatchesNoOrder(t, mcmp.VtConn, `
select id
from t1
Expand Down Expand Up @@ -178,24 +159,34 @@ func TestSubQueries(t *testing.T) {
utils.AssertMatches(t, mcmp.VtConn, `select t2.tcol1, t2.tcol2 from t2 where t2.id IN (select id from t3) order by t2.id`, `[[VARCHAR("A") VARCHAR("A")] [VARCHAR("B") VARCHAR("C")] [VARCHAR("A") VARCHAR("C")] [VARCHAR("C") VARCHAR("A")] [VARCHAR("A") VARCHAR("A")] [VARCHAR("B") VARCHAR("C")] [VARCHAR("B") VARCHAR("A")] [VARCHAR("C") VARCHAR("B")]]`)
utils.AssertMatches(t, mcmp.VtConn, `select t2.tcol1, t2.tcol2 from t2 where t2.id IN (select t3.id from t3 join t2 on t2.id = t3.id) order by t2.id`, `[[VARCHAR("A") VARCHAR("A")] [VARCHAR("B") VARCHAR("C")] [VARCHAR("A") VARCHAR("C")] [VARCHAR("C") VARCHAR("A")] [VARCHAR("A") VARCHAR("A")] [VARCHAR("B") VARCHAR("C")] [VARCHAR("B") VARCHAR("A")] [VARCHAR("C") VARCHAR("B")]]`)

utils.AssertMatches(t, mcmp.VtConn, `select u_a.a from u_a left join t2 on t2.id IN (select id from t2)`, `[]`)
// inserting some data in u_a
utils.Exec(t, mcmp.VtConn, `insert into u_a(id, a) values (1, 1)`)

// execute same query again.
qr := utils.Exec(t, mcmp.VtConn, `select u_a.a from u_a left join t2 on t2.id IN (select id from t2)`)
assert.EqualValues(t, 8, len(qr.Rows))
for index, row := range qr.Rows {
assert.EqualValues(t, `[INT64(1)]`, fmt.Sprintf("%v", row), "does not match for row: %d", index+1)
}

systay marked this conversation as resolved.
Show resolved Hide resolved
// fail as projection subquery is not scalar
_, err := utils.ExecAllowError(t, mcmp.VtConn, `select (select id from t2) from t2 order by id`)
assert.EqualError(t, err, "subquery returned more than one row (errno 1105) (sqlstate HY000) during query: select (select id from t2) from t2 order by id")

utils.AssertMatches(t, mcmp.VtConn, `select (select id from t2 order by id limit 1) from t2 order by id limit 2`, `[[INT64(1)] [INT64(1)]]`)
}

func TestSubQueriesOnOuterJoinOnCondition(t *testing.T) {
t.Skip("not supported")
mcmp, closer := start(t)
defer closer()

utils.Exec(t, mcmp.VtConn, `insert into t2(id, tcol1, tcol2) values (1, 'A', 'A'),(2, 'B', 'C'),(3, 'A', 'C'),(4, 'C', 'A'),(5, 'A', 'A'),(6, 'B', 'C'),(7, 'B', 'A'),(8, 'C', 'B')`)
utils.Exec(t, mcmp.VtConn, `insert into t3(id, tcol1, tcol2) values (1, 'A', 'A'),(2, 'B', 'C'),(3, 'A', 'C'),(4, 'C', 'A'),(5, 'A', 'A'),(6, 'B', 'C'),(7, 'B', 'A'),(8, 'C', 'B')`)

utils.AssertMatches(t, mcmp.VtConn, `select u_a.a from u_a left join t2 on t2.id IN (select id from t2)`, `[]`)
// inserting some data in u_a
utils.Exec(t, mcmp.VtConn, `insert into u_a(id, a) values (1, 1)`)
qr := utils.Exec(t, mcmp.VtConn, `select u_a.a from u_a left join t2 on t2.id IN (select id from t2)`)
assert.EqualValues(t, 8, len(qr.Rows))
for index, row := range qr.Rows {
assert.EqualValues(t, `[INT64(1)]`, fmt.Sprintf("%v", row), "does not match for row: %d", index+1)
}
}

func TestPlannerWarning(t *testing.T) {
mcmp, closer := start(t)
defer closer()
Expand Down
4 changes: 2 additions & 2 deletions go/test/endtoend/vtgate/queries/derived/derived_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,6 @@ func TestDerivedTableWithOrderByLimit(t *testing.T) {
}

func TestDerivedAggregationOnRHS(t *testing.T) {
t.Skip("skipped for now, issue: https://github.com/vitessio/vitess/issues/11703")
mcmp, closer := start(t)
defer closer()

Expand Down Expand Up @@ -85,7 +84,8 @@ func TestDerivedTableWithHaving(t *testing.T) {
mcmp.Exec("insert into user(id, name) values(1,'toto'), (2,'tata'), (3,'titi'), (4,'tete'), (5,'foo')")

mcmp.Exec("set sql_mode = ''")
mcmp.AssertMatchesAnyNoCompare("select /*vt+ PLANNER=Gen4 */ * from (select id from user having count(*) >= 1) s", "[[INT64(1)]]", "[[INT64(4)]]")
// For the given query, we can get any id back, because we aren't grouping by it.
mcmp.AssertMatchesAnyNoCompare("select /*vt+ PLANNER=Gen4 */ * from (select id from user having count(*) >= 1) s", "[[INT64(1)]]", "[[INT64(2)]]", "[[INT64(3)]]", "[[INT64(4)]]", "[[INT64(5)]]")
}

func TestDerivedTableColumns(t *testing.T) {
Expand Down
16 changes: 0 additions & 16 deletions go/vt/sqlparser/ast.go
Original file line number Diff line number Diff line change
Expand Up @@ -2484,21 +2484,6 @@ type (
Fsp int // fractional seconds precision, integer from 0 to 6 or an Argument
}

// ExtractedSubquery is a subquery that has been extracted from the original AST
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing this struct was a big part of the refactoring. Before this refactoring, we would replace all subqueries with this AST struct early on in the semantic analysis process, and use it during planning. All of that logic is now gone from semantic analysis and this struct is no longer needed.

// This is a struct that the parser will never produce - it's written and read by the gen4 planner
// CAUTION: you should only change argName and hasValuesArg through the setter methods
ExtractedSubquery struct {
Original Expr // original expression that was replaced by this ExtractedSubquery
OpCode int // this should really be engine.PulloutOpCode, but we cannot depend on engine :(
Subquery *Subquery
OtherSide Expr // represents the side of the comparison, this field will be nil if Original is not a comparison
Merged bool // tells whether we need to rewrite this subquery to Original or not

hasValuesArg string
argName string
alternative Expr // this is what will be used to Format this struct
}

// JSONPrettyExpr represents the function and argument for JSON_PRETTY()
// https://dev.mysql.com/doc/refman/8.0/en/json-utility-functions.html#function_json-pretty
JSONPrettyExpr struct {
Expand Down Expand Up @@ -3176,7 +3161,6 @@ func (*CharExpr) iExpr() {}
func (*ConvertUsingExpr) iExpr() {}
func (*MatchExpr) iExpr() {}
func (*Default) iExpr() {}
func (*ExtractedSubquery) iExpr() {}
func (*TrimFuncExpr) iExpr() {}
func (*JSONSchemaValidFuncExpr) iExpr() {}
func (*JSONSchemaValidationReportFuncExpr) iExpr() {}
Expand Down
17 changes: 0 additions & 17 deletions go/vt/sqlparser/ast_clone.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

32 changes: 0 additions & 32 deletions go/vt/sqlparser/ast_copy_on_rewrite.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

30 changes: 0 additions & 30 deletions go/vt/sqlparser/ast_equals.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 0 additions & 8 deletions go/vt/sqlparser/ast_format.go
Original file line number Diff line number Diff line change
Expand Up @@ -2422,14 +2422,6 @@ func (node *RenameTable) Format(buf *TrackedBuffer) {
}
}

// Format formats the node.
// If an extracted subquery is still in the AST when we print it,
// it will be formatted as if the subquery has been extracted, and instead
// show up like argument comparisons
func (node *ExtractedSubquery) Format(buf *TrackedBuffer) {
node.alternative.Format(buf)
}

func (node *JSONTableExpr) Format(buf *TrackedBuffer) {
buf.astPrintf(node, "json_table(%v, %v columns(\n", node.Expr, node.Filter)
sz := len(node.Columns)
Expand Down
8 changes: 0 additions & 8 deletions go/vt/sqlparser/ast_format_fast.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

59 changes: 5 additions & 54 deletions go/vt/sqlparser/ast_funcs.go
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,11 @@ import (
// If postVisit returns true, the underlying nodes
// are also visited. If it returns an error, walking
// is interrupted, and the error is returned.
func Walk(visit Visit, nodes ...SQLNode) error {
func Walk(visit Visit, first SQLNode, nodes ...SQLNode) error {
err := VisitSQLNode(first, visit)
if err != nil {
return err
}
Comment on lines +40 to +44
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why first field is required. Won't the first element of nodes otherwise be the first to run?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is to avoid accidental use of this method with no nodes to visit. We found one case where this was done by accident.

Copy link
Member

@GuptaManan100 GuptaManan100 Sep 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we then just check the length of nodes and panic/error on 0 length?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is that preferable to handling it like this?

for _, node := range nodes {
err := VisitSQLNode(node, visit)
if err != nil {
Expand Down Expand Up @@ -2093,59 +2097,6 @@ func GetAllSelects(selStmt SelectStatement) []*Select {
panic("[BUG]: unknown type for SelectStatement")
}

// SetArgName sets argument name.
func (es *ExtractedSubquery) SetArgName(n string) {
es.argName = n
es.updateAlternative()
}

// SetHasValuesArg sets has_values argument.
func (es *ExtractedSubquery) SetHasValuesArg(n string) {
es.hasValuesArg = n
es.updateAlternative()
}

// GetArgName returns argument name.
func (es *ExtractedSubquery) GetArgName() string {
return es.argName
}

// GetHasValuesArg returns has values argument.
func (es *ExtractedSubquery) GetHasValuesArg() string {
return es.hasValuesArg

}

func (es *ExtractedSubquery) updateAlternative() {
switch original := es.Original.(type) {
case *ExistsExpr:
es.alternative = NewArgument(es.hasValuesArg)
case *Subquery:
es.alternative = NewArgument(es.argName)
case *ComparisonExpr:
// other_side = :__sq
cmp := &ComparisonExpr{
Left: es.OtherSide,
Right: NewArgument(es.argName),
Operator: original.Operator,
}
var expr Expr = cmp
switch original.Operator {
case InOp:
// :__sq_has_values = 1 and other_side in ::__sq
cmp.Right = NewListArg(es.argName)
hasValue := &ComparisonExpr{Left: NewArgument(es.hasValuesArg), Right: NewIntLiteral("1"), Operator: EqualOp}
expr = AndExpressions(hasValue, cmp)
case NotInOp:
// :__sq_has_values = 0 or other_side not in ::__sq
cmp.Right = NewListArg(es.argName)
hasValue := &ComparisonExpr{Left: NewArgument(es.hasValuesArg), Right: NewIntLiteral("0"), Operator: EqualOp}
expr = &OrExpr{hasValue, cmp}
}
es.alternative = expr
}
}

// ColumnName returns the alias if one was provided, otherwise prints the AST
func (ae *AliasedExpr) ColumnName() string {
if !ae.As.IsEmpty() {
Expand Down
Loading
Loading