-
-
Notifications
You must be signed in to change notification settings - Fork 210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge join selects do not filter left join #1568
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be fine but it's pretty mysterious. Want to help me out?
filters := expression.SplitConjunction(j.Filter) | ||
children = append(children, fmt.Sprintf("cmp: %s", filters[0])) | ||
if len(filters) > 1 { | ||
children = append(children, fmt.Sprintf("sel: %s", expression.JoinAnd(filters[1:]...))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what this is doing. I don't understand what cmp
and sel
mean in this context.
If the first predicate in the join condition is semantically meaningful, why doesn't the data model node reflect that? I.e. why isn't there are MergeJoinNode that has different semantics for its Filters?
Also, why JoinNode.Filter
? Isn't is JoinNode.JoinCond
?
@@ -72,16 +72,14 @@ func newMergeJoinIter(ctx *sql.Context, j *JoinNode, row sql.Row) (sql.RowIter, | |||
var iter sql.RowIter = &mergeJoinIter{ | |||
left: l, | |||
right: r, | |||
filters: filters[1:], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This merits a little explanation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like relying on a strict ordering of the predicates like this is fragile, who's to say we maintain their order throughout various transformations in the rest of the analyzer?
|
||
res, err := i.expr.Compare(ctx, i.fullRow) | ||
func (i *mergeJoinIter) Next(ctx *sql.Context) (sql.Row, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is maybe a little too gross to ship. Gotos really should be limited in their application to skip certain parts of nested loop structures, not the primary control flow. Unless you have a really, really good reason. Which as far as I can tell, you don't.
Structure this as a state machine inside a for loop, something like:
var nextState = init
for nextState != end {
switch nextState {
case init:
...
nextState = exhaustCheck
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can define a custom type for your state machine states and consts for all possible values
@zachmu i added more comments to make merge's first filter less mysterious, and rewrote the goto join iters as switch statement state machines. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, this is easier to understand
A join filter that evaluates to false or nil can still return rows in certain cases. One of the cases we evaluated incorrectly were LEFT_MERGE_JOINs with multiple filters, which should always return the left row even if a filter returns false. MERGE_JOIN's first filter is specially selected because its Left/Right expressions reference attributes for two tables that will arrive sorted given the indexes we chose to read. We use that first filter for the merge comparison direction.