-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support optional filter in Join #2509
Comments
A related issue #2496. |
Reopened and renamed to track sort-merge join filter as well. |
@yjshen do you mind if I close this ticket and reopen another describing the support needed for Sort-merge? I think it might be clearer to a future reader that we just needed to extend the support in HashJoin to MergeJoin whereas the description of this ticket now may confuse people as it talks about differing implementation possibilities |
Get it; I will open a new issue instead. |
Thanks @yjshen ! |
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
It would be necessary to support filters in the join operator, instead of a join operator followed by a filter, the necessity comes from two folds:
Describe the solution you'd like
pub type FilterOn = (Vec<Column>, Vec<Column>, Arc<dyn PhysicalExpr>)
; andOption<FilterOn>
for JoinExec.Describe alternatives you've considered
pub type FilterOn = Vec<(Column, Column, datafusion_expr::Operator)>;
t1.a + t2.b > 100
tot1.a > 100 - t2.b
. evaluatesa
,100-b
separately as two columns and apply binary expr calculation logic.But the approach would be quite limited since it greatly limits the expressions that could be used in a join filter.
Additional context
Consider Part of TPC-DS query-95's SparkSQL plan as an example:
The text was updated successfully, but these errors were encountered: