-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Left join implementation is incorrect for 0 or multiple batches on the right side #238
Conversation
Codecov Report
@@ Coverage Diff @@
## master #238 +/- ##
==========================================
+ Coverage 76.46% 76.76% +0.30%
==========================================
Files 135 134 -1
Lines 23250 23248 -2
==========================================
+ Hits 17777 17847 +70
+ Misses 5473 5401 -72
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I pulled this branch locally and confirmed that this fixes the failing tests. Thanks, @Dandandan!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yap, the implementation makes a lot of sense. Thanks a lot @Dandandan !
Merging this when it's green |
* Add first reverse support for partitioned join conversion * Minor changes * minor changes * Minor changes * Add ordering requirement propogation * Remove wrong check * Simplifications * Simplifications * Minor changes * Minor changes * add test case * Review * Propagate group by and aggregate through join * Minor changes * Minor changes * Simplifications * Buggy state * Minor changes * Simplifications * Add comments * Update comments * Update join_pipeline_selection.rs * Mini * Update comments * Fix formatting * Review --------- Co-authored-by: metesynnada <100111937+metesynnada@users.noreply.github.com> Co-authored-by: Mehmet Ozan Kabak <ozankabak@gmail.com>
* Add first reverse support for partitioned join conversion * Minor changes * minor changes * Minor changes * Add ordering requirement propogation * Remove wrong check * Simplifications * Simplifications * Minor changes * Minor changes * add test case * Review * Propagate group by and aggregate through join * Minor changes * Minor changes * Simplifications * Buggy state * Minor changes * Simplifications * Add comments * Update comments * Update join_pipeline_selection.rs * Mini * Update comments * Fix formatting * Review --------- Co-authored-by: metesynnada <100111937+metesynnada@users.noreply.github.com> Co-authored-by: Mehmet Ozan Kabak <ozankabak@gmail.com>
Which issue does this PR close?
Closes #235
Closes #239
Closes #143
Rationale for this change
Fixes behavior of left join with regard to multiple batches on the right side and 0 right side batches.
This way it also also likely much faster as it avoids keeping generating / keeping / indexing into a
HashSet
for each right side batch.What changes are included in this PR?
Vec<bool>
to keep track of left-side rows that didn't match with the right side. (number o left rows bytes extra memory usage for left joins, not that much compared to storing the left side data and hashmap + indices. It could be more memory-efficient by using a bitmap instead.Are there any user-facing changes?
Only correctness improvements