Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't add filters to projection in TableScan #7670

Merged
merged 1 commit into from
Sep 28, 2023

Conversation

Dandandan
Copy link
Contributor

@Dandandan Dandandan commented Sep 27, 2023

Which issue does this PR close?

Closes #7683

Rationale for this change

We don't want to scan columns not needed by the rest of the plan.

What changes are included in this PR?

We change the scan implementations to apply the filters before the projection.
This way we don't need to add the filters to the projection in to make the plan correct.

Are these changes tested?

Existing tests + new test.

Are there any user-facing changes?

@Dandandan Dandandan changed the title WIP Don't add filters to used columns WIP Don't add filters to used columns in TableScan Sep 27, 2023
@github-actions github-actions bot added logical-expr Logical plan and expressions optimizer Optimizer rules core Core DataFusion crate labels Sep 27, 2023
@Dandandan Dandandan marked this pull request as draft September 28, 2023 07:24

let expected = "\
Projection: Int32(1) AS a\
\n TableScan: test projection=[a], full_filters=[b = Int32(1)]";
Copy link
Contributor Author

@Dandandan Dandandan Sep 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the important fix - before this would add b to the projection (even if it was not needed in the plan above). This would lead to unnecessary scanning columns supported by the filter.

@Dandandan Dandandan marked this pull request as ready for review September 28, 2023 09:43
@Dandandan Dandandan changed the title WIP Don't add filters to used columns in TableScan Don't add filters to used columns in TableScan Sep 28, 2023
@Dandandan Dandandan changed the title Don't add filters to used columns in TableScan Don't add filters to projection in TableScan Sep 28, 2023
@@ -147,10 +147,6 @@ impl OptimizerRule for PushDownProjection {
if !scan.projected_schema.fields().is_empty() =>
{
let mut used_columns: HashSet<Column> = HashSet::new();
// filter expr may not exist in expr in projection.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This added filters to the projection since #5188

FYI @jackwener

Add test

WIP fix

Fix filter after scan

Totally reemove filter to column extraction

Fix test

Update tests 1

Update tests 2

Update tests 3
@Dandandan
Copy link
Contributor Author

FYI @alamb this might help with enabling parquet filter pushdown as well by default

Copy link
Contributor

@thinkharderdev thinkharderdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great @Dandandan -- thank you so much ❤️

@@ -2759,8 +2705,7 @@ Projection: a, b
// For right anti, filter of the left side can be pushed down.
let expected = "RightAnti Join: test1.a = test2.a Filter: test2.b > UInt32(2)\
\n Projection: test1.a, test1.b\
\n Filter: test1.b > UInt32(1)\
\n TableScan: test1\
\n TableScan: test1, full_filters=[test1.b > UInt32(1)]\
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is great to see the filters pushed into the scans as part of this test

@Dandandan Dandandan merged commit 4b2b7dc into apache:main Sep 28, 2023
22 checks passed
Ted-Jiang pushed a commit to Ted-Jiang/arrow-datafusion that referenced this pull request Oct 7, 2023
Add test

WIP fix

Fix filter after scan

Totally reemove filter to column extraction

Fix test

Update tests 1

Update tests 2

Update tests 3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate logical-expr Logical plan and expressions optimizer Optimizer rules
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Filters of TableScan are added to projection when not needed
3 participants