-
Notifications
You must be signed in to change notification settings - Fork 889
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update mixed_join
to use experimental row hasher and comparator
#13028
Update mixed_join
to use experimental row hasher and comparator
#13028
Conversation
Co-authored-by: Nghia Truong <nghiatruong.vn@gmail.com>
This reverts commit fa8f639.
Benchmarks
mixed_inner_join_32bit[0] Tesla V100-SXM2-32GB
mixed_inner_join_64bit[0] Tesla V100-SXM2-32GB
mixed_inner_join_32bit_nulls[0] Tesla V100-SXM2-32GB
mixed_inner_join_64bit_nulls[0] Tesla V100-SXM2-32GB
mixed_left_join_32bit[0] Tesla V100-SXM2-32GB
mixed_left_join_64bit[0] Tesla V100-SXM2-32GB
mixed_left_join_32bit_nulls[0] Tesla V100-SXM2-32GB
mixed_left_join_64bit_nulls[0] Tesla V100-SXM2-32GB
mixed_full_join_32bit[0] Tesla V100-SXM2-32GB
mixed_full_join_64bit[0] Tesla V100-SXM2-32GB
mixed_full_join_32bit_nulls[0] Tesla V100-SXM2-32GB
mixed_full_join_64bit_nulls[0] Tesla V100-SXM2-32GB
mixed_left_semi_join_32bit[0] Tesla V100-SXM2-32GB
mixed_left_semi_join_64bit[0] Tesla V100-SXM2-32GB
mixed_left_semi_join_32bit_nulls[0] Tesla V100-SXM2-32GB
mixed_left_semi_join_64bit_nulls[0] Tesla V100-SXM2-32GB
mixed_left_anti_join_32bit[0] Tesla V100-SXM2-32GB
mixed_left_anti_join_64bit[0] Tesla V100-SXM2-32GB
mixed_left_anti_join_32bit_nulls[0] Tesla V100-SXM2-32GB
mixed_left_anti_join_64bit_nulls[0] Tesla V100-SXM2-32GB
|
These benchmark results are very (pleasantly) surprising. Without any further investigation, there are two possibilities that I see for why the new code is faster:
Will need to look back at the structs to see which is more likely or if it could be something else entirely. The second is definitely more likely IMO but also seems harder to justify offhand. |
auto& build_conditional_view = swap_tables ? left_conditional_view : right_conditional_view; | ||
row_equality equality_probe{ | ||
cudf::nullate::DYNAMIC{has_nulls}, *probe_view, *build_view, compare_nulls}; | ||
auto& probe = swap_tables ? right_equality : left_equality; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we have swap_tables
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The smaller table is used as the build
table. Since mixed_join
doesn't have a stateful API like hash_join
, you see the swapping happening at every step whereas hash_join
does the swapping in its constructor/outer API
Thank you @divyegala for suggesting this change. The automated microbenchmarks look really good! We see about 10-15% faster performance for mixed joins with nulls, and no impact on the "not nullable" variants. 🎉 |
…into mixed-join-row-operators
Are we waiting on anything here? Do we want to do some further investigation to understand the benchmarks before moving forward? We might discover something helpful to improve performance of other PRs using the new comparator/hasher, but I doubt it since the characteristics are likely to be very algorithm-specific. |
@vyasr just waiting on reviews. If there are any insights to be gleaned from this PR that could be applied to other algorithms, that should be async. Please review if you have some time! |
Does this not require new unit tests? A description for this PR would be helpful I think. |
@davidwendt I updated the description |
/merge |
Part of #11844
mixed_join
cannot support nested types as the conditional part relies on AST. This PR adds no new tests or benchmarks for this reason.Benchmarks
Checklist