Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove qualifiers on pushed down predicates / Fix parquet pruning #689

Merged
merged 3 commits into from
Jul 7, 2021

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Jul 6, 2021

Which issue does this PR close?

Closes ##656

Based on #657 (test for parquet pruning) so review that first

Rationale for this change

Predicate pushdown for parquet is currently broken because the filtering logic gets columns that refer to foo.bar but the table provider only knows about columns (e.g. only about bar)

What changes are included in this PR?

  1. unnormalize_expr function to remove qualifiers from expressions
  2. tests

Are there any user-facing changes?

Pruning works again

@github-actions github-actions bot added the datafusion Changes in the datafusion crate label Jul 6, 2021
@alamb alamb force-pushed the alamb/fix_parquet_pruning branch from eb980a5 to 432ce23 Compare July 6, 2021 16:42
@alamb alamb marked this pull request as ready for review July 6, 2021 16:42
// Remove all qualifiers from the scan as the provider
// doesn't know (nor should care) how the relation was
// referred to in the query
let filters = unnormalize_cols(filters.iter().cloned());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the actual fix for parquet pruning.

@@ -1762,4 +1794,76 @@ mod tests {
}
}
}

#[test]
fn normalize_cols() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some additional unit test coverage here for normalize when I was writing the unnormalize versions

@@ -1120,7 +1120,7 @@ pub fn columnize_expr(e: Expr, input_schema: &DFSchema) -> Expr {

/// Recursively call [`Column::normalize`] on all Column expressions
/// in the `expr` expression tree.
pub fn normalize_col(e: Expr, schemas: &[&DFSchemaRef]) -> Result<Expr> {
pub fn normalize_col(expr: Expr, schemas: &[&DFSchemaRef]) -> Result<Expr> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to use the name expr to be more consistent with the rest of the codebase

@alamb
Copy link
Contributor Author

alamb commented Jul 6, 2021

FYI @houqp

Copy link
Member

@houqp houqp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@alamb alamb force-pushed the alamb/fix_parquet_pruning branch from 20f7472 to de6de17 Compare July 7, 2021 12:13
@alamb alamb force-pushed the alamb/fix_parquet_pruning branch from de6de17 to c2971f9 Compare July 7, 2021 13:19
@alamb alamb merged commit 79d60f9 into apache:master Jul 7, 2021
@alamb alamb deleted the alamb/fix_parquet_pruning branch July 7, 2021 13:43
@houqp houqp added the bug Something isn't working label Jul 31, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working datafusion Changes in the datafusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants