-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-32299] [SQL] Decide SMJ Join Orientation adaptively #29097
Conversation
This reverts commit 89664b4.
Can one of the admins verify this patch? |
That also depends on the data values, right? Not always faster. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have similar concern with @gatorsmile . I think this also depends on the run-time cardinality of data.
E.g., if left side is smaller than right side, but every row from left side is same, and every row from right side is not same (unique). We should buffer right side here even though ride side is larger, because if we buffer left side, we essentially need to read all left side into the buffer.
In addition, this PR is swapping left and right side based on total size. However, during run-time, each task/partition can have different amount of data per left + right side. I think simply swapping left and right side here might cause some tasks to regress but some tasks to improve.
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
What changes were proposed in this pull request?
To change SortMergeJoin orientation at runtime using adaptive query execution
Why are the changes needed?
For SortMerge join of type EquiJoin, the left and right side of the joins are decided on the basis of the user order. In SMJ, the left side of the join is streamed and the right side is buffered (matching values). Because of this, B SMJ A would perform better than A SMJ B if, sizeOf(B) > sizeOf(A)
With adaptive query execution, once both ShuffleQueryStages corresponding to the join have completed and if none of them have sizes lesser than the broadcast threshold (the join will not be converted to BroadcastHashJoin), join orientation can be changed at run time.
Does this PR introduce any user-facing change?
No
-->
How was this patch tested?