Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix right, full join handling when having multiple non-matching rows at the left side #845

Merged
merged 5 commits into from
Aug 10, 2021

Conversation

Dandandan
Copy link
Contributor

@Dandandan Dandandan commented Aug 9, 2021

Which issue does this PR close?

Closes #843

Rationale for this change

Right and hull join had an error where if they wouldn't match multiple left side values, for each value it didn't match it would add a row. This is triggered by the change to #842 which maps any value to a 0 hash. Normally values and null values have different hashes, so they wouldn't have this problem.
Instead the correct behavior is to add a single row if it doesn't match any.

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added the datafusion Changes in the datafusion crate label Aug 9, 2021
@Dandandan Dandandan requested a review from alamb August 9, 2021 16:03
@Dandandan Dandandan added the bug Something isn't working label Aug 9, 2021
@@ -745,9 +747,12 @@ fn build_join_indexes(
&keys_values,
)? {
left_indices.append_value(i)?;
} else {
left_indices.append_null()?;
right_indices.append_value(row as u32)?;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the fix, instead of adding the (left=null, right) to every non-match, it does it only once at the end if there wasn't a single match.

@Dandandan Dandandan changed the title Fix right, full join handling Fix right, full join handling when having multiple non-matching rows at the left side Aug 9, 2021
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Talk about a speedy bugfix 🐎 ! I barely filed the ticket and there is a fix ready :)

One thing that might help readability would be to rename from double negative (no_match false) to has_match or something

Nice work @Dandandan

datafusion/src/physical_plan/hash_join.rs Show resolved Hide resolved
Dandandan and others added 2 commits August 9, 2021 18:30
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Copy link
Member

@houqp houqp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch 👍

@Dandandan Dandandan merged commit 5cadc6a into apache:master Aug 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working datafusion Changes in the datafusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incorrect results for joins on hash collisions
3 participants