Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge join selects do not filter left join #1568

Merged
merged 3 commits into from
Jan 31, 2023

Conversation

max-hoffman
Copy link
Contributor

@max-hoffman max-hoffman commented Jan 30, 2023

A join filter that evaluates to false or nil can still return rows in certain cases. One of the cases we evaluated incorrectly were LEFT_MERGE_JOINs with multiple filters, which should always return the left row even if a filter returns false. MERGE_JOIN's first filter is specially selected because its Left/Right expressions reference attributes for two tables that will arrive sorted given the indexes we chose to read. We use that first filter for the merge comparison direction.

Copy link
Member

@zachmu zachmu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be fine but it's pretty mysterious. Want to help me out?

enginetest/join_op_tests.go Outdated Show resolved Hide resolved
filters := expression.SplitConjunction(j.Filter)
children = append(children, fmt.Sprintf("cmp: %s", filters[0]))
if len(filters) > 1 {
children = append(children, fmt.Sprintf("sel: %s", expression.JoinAnd(filters[1:]...)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what this is doing. I don't understand what cmp and sel mean in this context.

If the first predicate in the join condition is semantically meaningful, why doesn't the data model node reflect that? I.e. why isn't there are MergeJoinNode that has different semantics for its Filters?

Also, why JoinNode.Filter? Isn't is JoinNode.JoinCond?

@@ -72,16 +72,14 @@ func newMergeJoinIter(ctx *sql.Context, j *JoinNode, row sql.Row) (sql.RowIter,
var iter sql.RowIter = &mergeJoinIter{
left: l,
right: r,
filters: filters[1:],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This merits a little explanation

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like relying on a strict ordering of the predicates like this is fragile, who's to say we maintain their order throughout various transformations in the rest of the analyzer?


res, err := i.expr.Compare(ctx, i.fullRow)
func (i *mergeJoinIter) Next(ctx *sql.Context) (sql.Row, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is maybe a little too gross to ship. Gotos really should be limited in their application to skip certain parts of nested loop structures, not the primary control flow. Unless you have a really, really good reason. Which as far as I can tell, you don't.

Structure this as a state machine inside a for loop, something like:

var nextState = init
for nextState != end {
  switch nextState {
    case init:
       ...
       nextState = exhaustCheck
  }
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can define a custom type for your state machine states and consts for all possible values

@max-hoffman
Copy link
Contributor Author

@zachmu i added more comments to make merge's first filter less mysterious, and rewrote the goto join iters as switch statement state machines.

Copy link
Member

@zachmu zachmu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, this is easier to understand

@max-hoffman max-hoffman merged commit 8e0c643 into main Jan 31, 2023
@max-hoffman max-hoffman deleted the max/merge-join-nullable-filters branch January 31, 2023 01:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants