You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It is very common to have scan -> filter as inputs to a join. The copying of data in the filter can be expensive when the batch contains strings and complex types, and the result of the filter is discarded after the join.
I believe that it would be more efficient to have the join use a selection vector to read inputs from the scanned batch rather than perform a filter.
This issue is for tracking the work to create a small prototype to demonstrate. If succesful, then we can discuss making changes in upstream DataFusion to add support for a new ColumnarValue::ArrayWithSelectionVector and then add a specialization in SortMergeJoin to take advantage of this.
Describe the potential solution
No response
Additional context
No response
The text was updated successfully, but these errors were encountered:
What is the problem the feature request solves?
It is very common to have scan -> filter as inputs to a join. The copying of data in the filter can be expensive when the batch contains strings and complex types, and the result of the filter is discarded after the join.
I believe that it would be more efficient to have the join use a selection vector to read inputs from the scanned batch rather than perform a filter.
This issue is for tracking the work to create a small prototype to demonstrate. If succesful, then we can discuss making changes in upstream DataFusion to add support for a new
ColumnarValue::ArrayWithSelectionVector
and then add a specialization in SortMergeJoin to take advantage of this.Describe the potential solution
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: