[SPARK-32299] [SQL] Decide SMJ Join Orientation adaptively #29097

mayurdb · 2020-07-14T08:11:12Z

What changes were proposed in this pull request?

To change SortMergeJoin orientation at runtime using adaptive query execution

Why are the changes needed?

For SortMerge join of type EquiJoin, the left and right side of the joins are decided on the basis of the user order. In SMJ, the left side of the join is streamed and the right side is buffered (matching values). Because of this, B SMJ A would perform better than A SMJ B if, sizeOf(B) > sizeOf(A)

With adaptive query execution, once both ShuffleQueryStages corresponding to the join have completed and if none of them have sizes lesser than the broadcast threshold (the join will not be converted to BroadcastHashJoin), join orientation can be changed at run time.

Does this PR introduce any user-facing change?

No
-->

How was this patch tested?

Added unit tests
Ran AdaptiveQueryExecSuite

This reverts commit 89664b4.

AmplabJenkins · 2020-07-14T08:18:34Z

Can one of the admins verify this patch?

mayurdb · 2020-07-14T11:35:31Z

cc @maryannxue @cloud-fan

gatorsmile · 2020-07-16T16:55:07Z

That also depends on the data values, right? Not always faster.

c21

I have similar concern with @gatorsmile . I think this also depends on the run-time cardinality of data.

E.g., if left side is smaller than right side, but every row from left side is same, and every row from right side is not same (unique). We should buffer right side here even though ride side is larger, because if we buffer left side, we essentially need to read all left side into the buffer.

In addition, this PR is swapping left and right side based on total size. However, during run-time, each task/partition can have different amount of data per left + right side. I think simply swapping left and right side here might cause some tasks to regress but some tasks to improve.

github-actions · 2020-12-01T00:45:50Z

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

Mayur Bhosale added 8 commits March 16, 2020 10:52

Additional checks on deciding the pruning side

89664b4

Merge https://github.com/apache/spark

e6188a5

Revert "Additional checks on deciding the pruning side"

a3019bd

This reverts commit 89664b4.

Orient SMJ based on the adaptive stats

bcb6c09

Nit

0110527

Changed formatting

c80d8d8

Do not change join order if SMJ will be converted to a BHJ

dd5fb36

Fixed imports

e5c7db3

probot-autolabeler bot added the SQL label Jul 14, 2020

mayurdb changed the title ~~Spark 32299~~ [SPARK-32299] [SQL] Decide SMJ Join Orientation adaptively Jul 14, 2020

c21 reviewed Aug 22, 2020

View reviewed changes

github-actions bot added the Stale label Dec 1, 2020

github-actions bot closed this Dec 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-32299] [SQL] Decide SMJ Join Orientation adaptively #29097

[SPARK-32299] [SQL] Decide SMJ Join Orientation adaptively #29097

mayurdb commented Jul 14, 2020

AmplabJenkins commented Jul 14, 2020

mayurdb commented Jul 14, 2020

gatorsmile commented Jul 16, 2020 •

edited

Loading

c21 left a comment

github-actions bot commented Dec 1, 2020

[SPARK-32299] [SQL] Decide SMJ Join Orientation adaptively #29097

[SPARK-32299] [SQL] Decide SMJ Join Orientation adaptively #29097

Conversation

mayurdb commented Jul 14, 2020

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

AmplabJenkins commented Jul 14, 2020

mayurdb commented Jul 14, 2020

gatorsmile commented Jul 16, 2020 • edited Loading

c21 left a comment

Choose a reason for hiding this comment

github-actions bot commented Dec 1, 2020

gatorsmile commented Jul 16, 2020 •

edited

Loading