[FEA] Leverage cudf conditional nested loop join to implement semi/anti hash join with condition #4309
Labels
feature request
New feature or request
P1
Nice to have for release
performance
A performance related task/issue
Is your feature request related to a problem? Please describe.
Currently semi and anti hash joins with a condition fallback to the CPU because cudf does not support a hash-based semi or anti join with a condition, and unlike inner join we cannot simply apply the condition as a post-filter on the join. However cudf does support semi/anti joins with a condition using
conditional_left_semi_join
andconditional_left_anti_join
. Despite the fact that a nested loop join is algorithmically worse than a hash join, it may be more performant in practice because it could avoid the overhead of CPU<->GPU transitions in the plan.Describe the solution you'd like
The RAPIDS Accelerator leverages the same code as used for broadcast nested loop joins to check if the semi/anti join condition can be supported by the cudf AST. If it can, it replaces the CPU semi/anti join in the plan with a GPU nested loop semi/anti join using the AST condition.
Describe alternatives you've considered
Adding AST conditional support to hash-based joins which is being worked on in cudf separately, but that may take some time to realize and this may be a useful interim solution.
The text was updated successfully, but these errors were encountered: