Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider inequality joins #576

Open
richox opened this issue Sep 18, 2024 · 0 comments
Open

Consider inequality joins #576

richox opened this issue Sep 18, 2024 · 0 comments
Labels
feature required Functionalities must have

Comments

@richox
Copy link
Collaborator

richox commented Sep 18, 2024

Is your feature request related to a problem? Please describe.
currently we do not support inequality joins (hash join and sort-merge join). it is hard to implement such feature because datafusion has no direct supports to row-based evaluation.

Describe the solution you'd like
Describe alternatives you've considered

  1. simulate row-based evaluation with one-row columnar evaluation, which has super low performance in practice. in cases where the equality pred has filtered away most records, this method may work. but if the equality pred takes no effects (like tpcds q72). the query will hang.
  2. supports limited row-based filter in datafusion. currently datafusion already has some supports like make_comparator to build a row-based evaluator. we can extend it to support more row-based evaluations, like make_binary_op etc.
  3. fallback the post-filter evaluation to spark, and use codegen to speedup the evaluation. but we also have to consider the fallback overheads.

Additional context

@richox richox pinned this issue Sep 18, 2024
@richox richox added the feature required Functionalities must have label Sep 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature required Functionalities must have
Projects
None yet
Development

No branches or pull requests

1 participant