-
Notifications
You must be signed in to change notification settings - Fork 916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Accelerate conditional inner joins with larger right tables #9523
Accelerate conditional inner joins with larger right tables #9523
Conversation
Benchmarks
Before:
After:
|
Codecov Report
@@ Coverage Diff @@
## branch-21.12 #9523 +/- ##
================================================
- Coverage 10.79% 10.67% -0.12%
================================================
Files 116 117 +1
Lines 18869 19716 +847
================================================
+ Hits 2036 2104 +68
- Misses 16833 17612 +779
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work. I decided to pick up this review for @devavret to give him a break and keep myself up to date on AST code. 👍
@vyasr I ran with this patch locally and it does what we need. Before the patch:
After the patch:
I can reproduce similar fast times than with our spark-side prototype I had discussed here #9461. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the perf standpoint I got very good times from this patch. I basically stopped our join with the old patch because it takes 30 minutes to run, whereas with the patch it is ~1.4 minutes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vyasr LGTM aside from a couple missed replacements. Please fix those, otherwise approved!
Thank you for finding all my find-and-replace and copy-paste errors! |
@gpucibot merge |
This PR introduces reconfigures the conditional join kernels to launch one thread for each row in the larger of the two tables rather than always launching one thread for each row in the left table. Swapping the table on which the kernel is ordered helps improve performance in cases where the right table is significantly larger than the left table.
This PR is related to #9461 but it does not completely address its since this change only works for inner joins. Other join kinds are less straightforward since they require keeping track of rows for which no matches were found, which is easy to do if all the work for a given row is performed on the same thread (the current approach) but will require a significantly modified or entirely new kernel to also incorporate the inter-thread communication needed if the table ordering is swapped.