Revert "Use cudf to compute exact hash join output row sizes (#3288)" #3657
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This reverts commit 25bad3d.
Fixes #3640. When we switched to building the hash table explicitly, we lost the ability to be dynamic with which table is used as the build-side table for an inner join. It's definitely something we can do ourselves, but it will be tricky to do properly given how the join code assumes the table designated as the build-side will be used for a hash and the stream side is the only one that is splittable.
Since we're in the process of finishing up 21.10, I think it's prudent to revert #3288 and tackle this in a future release.
#2354 tracks solving the real root issue which is requiring everything on an arbitrarily-chosen build side of an inner join to be pulled in at once. Ideally this should try to examine both sides to choose the best build side and have a fallback if we cannot pull in either table completely to build the final hash table.