Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support build right with HashJoin in DataFusion #9603

Closed
Tracked by #10517
viirya opened this issue Mar 13, 2024 · 7 comments · Fixed by #10702
Closed
Tracked by #10517

Support build right with HashJoin in DataFusion #9603

viirya opened this issue Mar 13, 2024 · 7 comments · Fixed by #10702
Assignees
Labels
enhancement New feature or request

Comments

@viirya
Copy link
Member

viirya commented Mar 13, 2024

Is your feature request related to a problem or challenge?

See the discussion at Comet: apache/datafusion-comet#194 (comment)

Yea, in DataFusion, only left side could be the build side. But in Spark, the HashJoin operator has a build side parameter to indicate which side is build side. The operator will do right thing accordingly internally. So currently we cannot just create a DataFusion HashJoin operator with right side as build side.

It can be swapped between left and right side, only if we also swap outputs and also column binding in joining keys and joining filter. I'd like to relax the build side constraint in DataFusion instead of doing the swap in Comet.

Describe the solution you'd like

HashJoin supports build right option.

Describe alternatives you've considered

No response

Additional context

No response

@metesynnada
Copy link
Contributor

Is there a technical problem with swapping join sides in Comet?

@viirya
Copy link
Member Author

viirya commented Mar 14, 2024

No, I think technically we can do the swapping on joining keys, join filter, join output, etc. in Comet. I think it would be good if we can make DataFusion HashJoin more flexible to remove this constraint.

@metesynnada
Copy link
Contributor

Got it, I was figuring out if this could be a blocker.

@comphead
Copy link
Contributor

No, I think technically we can do the swapping on joining keys, join filter, join output, etc. in Comet. I think it would be good if we can make DataFusion HashJoin more flexible to remove this constraint.

That sounds cool, and such initiative will help DF to support its own join order hints instead of hardcoded join order

@edmondop
Copy link
Contributor

@comphead how would this work? do you think we should extend the SQL parser to support hints?

@viirya
Copy link
Member Author

viirya commented May 14, 2024

I think this is not related to SQL parser. The SQL query looks the same. But in Spark, it can decide hash side based on relation statistics or the hint given by users.

@viirya
Copy link
Member Author

viirya commented May 14, 2024

I will work on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants