Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Leverage cudf conditional nested loop join to implement semi/anti hash join with condition #4309

Closed
jlowe opened this issue Dec 6, 2021 · 1 comment
Assignees
Labels
feature request New feature or request P1 Nice to have for release performance A performance related task/issue

Comments

@jlowe
Copy link
Member

jlowe commented Dec 6, 2021

Is your feature request related to a problem? Please describe.
Currently semi and anti hash joins with a condition fallback to the CPU because cudf does not support a hash-based semi or anti join with a condition, and unlike inner join we cannot simply apply the condition as a post-filter on the join. However cudf does support semi/anti joins with a condition using conditional_left_semi_join and conditional_left_anti_join. Despite the fact that a nested loop join is algorithmically worse than a hash join, it may be more performant in practice because it could avoid the overhead of CPU<->GPU transitions in the plan.

Describe the solution you'd like
The RAPIDS Accelerator leverages the same code as used for broadcast nested loop joins to check if the semi/anti join condition can be supported by the cudf AST. If it can, it replaces the CPU semi/anti join in the plan with a GPU nested loop semi/anti join using the AST condition.

Describe alternatives you've considered
Adding AST conditional support to hash-based joins which is being worked on in cudf separately, but that may take some time to realize and this may be a useful interim solution.

@jlowe jlowe added feature request New feature or request ? - Needs Triage Need team to review and classify performance A performance related task/issue labels Dec 6, 2021
@jbrennan333 jbrennan333 self-assigned this Dec 7, 2021
@Salonijain27 Salonijain27 added P1 Nice to have for release and removed ? - Needs Triage Need team to review and classify labels Dec 7, 2021
@jbrennan333
Copy link
Contributor

Based on experiment in #4345, we found that using conditional nested loop join in this instance was dramatically slower.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request P1 Nice to have for release performance A performance related task/issue
Projects
None yet
3 participants