[FEA] Conditional hash join for left semi and left anti joins #9695

jlowe · 2021-11-16T15:10:51Z

Is your feature request related to a problem? Please describe.
Currently the RAPIDS Accelerator for Apache Spark does not accelerate left semi or anti joins that have both an equality and an inequality condition. Currently the only way to accelerate this is by using a nested loop join with an AST condition, but this often performs far worse than the CPU implementation which first implements a hash-based lookup on the equality condition then evaluates the inequality condition before processing it as a "join hit."

Describe the solution you'd like
libcudf supports accepting a supplemental condition represented as an AST expression to the existing left semi and left anti hash-based join APIs. Internally the join kernel performs the hash lookup based on the equality keys then if there's a hit in the hash lookup it evaluates the AST expression to produce a boolean result indicating whether the row should be considered a join match or not. The API can take two sets of table_view pairs, one pair for the left and right equality keys to use and one pair to use for the AST expression evaulation. The result is a gather map for the left table as it is for the hash-based semi/anti join today.

Describe alternatives you've considered
The interface could specify a just two table_views for the left/right table and separately two vectors of ints to specify which columns are the equality keys, but it seems simpler to pass separate table_views for equality vs. AST expression, especially when the application needs to generate the result of an expression for the equality portion of the join (e.g.: t1_col2 + 1 == t2_col3 * 2)

Additional context
This is part of #5401

The text was updated successfully, but these errors were encountered:

nvliyuan · 2021-11-18T06:47:35Z

Maybe TPC-H Q21 is a demo for LeftAnti join, I get the following unsupported log while running the benchmark:
!Exec <SortMergeJoinExec> cannot run on GPU because LeftAnti joins currently do not support conditions

github-actions · 2022-01-15T02:17:51Z

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

This PR is a follow-up to #9917 and should be merged after that PR. This resolves #9695 and resolves #5401. The implementation here is only a first pass, but in the interest of prioritizing a working feature for the upcoming release I'm postponing making various additional changes (including some breaking ones). Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Jason Lowe (https://github.com/jlowe) - David Wendt (https://github.com/davidwendt) - Robert Maynard (https://github.com/robertmaynard) URL: #10037

jlowe added feature request New feature or request Needs Triage Need team to review and classify libcudf Affects libcudf (C++/CUDA) code. Spark Functionality that helps Spark RAPIDS labels Nov 16, 2021

jrhemstad assigned vyasr Nov 16, 2021

beckernick removed the Needs Triage Need team to review and classify label Nov 19, 2021

vyasr added this to the Conditional Joins milestone Dec 16, 2021

vyasr mentioned this issue Jan 5, 2022

[FEA] Address performance regression in semi/anti joins from switching to cuco #9973

Open

vyasr mentioned this issue Jan 13, 2022

Implement mixed equality/conditional semi/anti joins #10037

Merged

github-actions bot added the inactive-30d label Jan 15, 2022

jlowe removed the inactive-30d label Jan 18, 2022

rapids-bot bot closed this as completed in #10037 Jan 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Conditional hash join for left semi and left anti joins #9695

[FEA] Conditional hash join for left semi and left anti joins #9695

jlowe commented Nov 16, 2021

nvliyuan commented Nov 18, 2021

github-actions bot commented Jan 15, 2022

[FEA] Conditional hash join for left semi and left anti joins #9695

[FEA] Conditional hash join for left semi and left anti joins #9695

Comments

jlowe commented Nov 16, 2021

nvliyuan commented Nov 18, 2021

github-actions bot commented Jan 15, 2022