Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Minor] extract const and add doc and more tests for in_list pruning #8815

Merged
merged 9 commits into from
Jan 11, 2024

Conversation

Ted-Jiang
Copy link
Member

Which issue does this PR close?

Closes #.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@Ted-Jiang Ted-Jiang requested a review from alamb January 10, 2024 10:12
@github-actions github-actions bot added the core Core DataFusion crate label Jan 10, 2024
/// Translate logical filter expression into pruning predicate
/// expression that will evaluate to FALSE if it can be determined no
/// rows between the min/max values could pass the predicates.
///
/// Returns the pruning predicate as an [`PhysicalExpr`]
///
/// Notice: For [`InListExpr`] if in list values more than 20, it will be rewritten to TRUE
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extract const and add explain for InListExpr ,

@@ -1958,6 +1964,90 @@ mod tests {
Ok(())
}

#[test]
fn row_group_predicate_between() -> Result<()> {
Copy link
Member Author

@Ted-Jiang Ted-Jiang Jan 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add some test case.

ID IN (46402206,
                         201143645,
                         1147370581,
....
                         242375670,
                         38453705)

Before found MAX_LIST_VALUE_SIZE_REWRITE I though this filter not push down is wrong 😭

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a nice improvement to me -- thank you @Ted-Jiang

I left some potential improvements, but I don't think they are necessary to merge this PR

@@ -960,7 +964,9 @@ fn build_predicate_expression(
}
}
if let Some(in_list) = expr_any.downcast_ref::<phys_expr::InListExpr>() {
if !in_list.list().is_empty() && in_list.list().len() < 20 {
if !in_list.list().is_empty()
&& in_list.list().len() <= MAX_LIST_VALUE_SIZE_REWRITE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

datafusion/core/src/physical_optimizer/pruning.rs Outdated Show resolved Hide resolved
Field::new("c2", DataType::Int32, false),
]);
// test c1 in(1, 2)
let expr1 = Expr::InList(InList::new(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could potentially use https://docs.rs/datafusion/latest/datafusion/logical_expr/enum.Expr.html#method.in_list

expr1 = col("cl").in_list(vec![lit(1), lit(2)], false)

fn row_group_predicate_in_list_to_many_values() -> Result<()> {
let schema = Schema::new(vec![Field::new("c1", DataType::Int32, false)]);
// test c1 in(1..21)
// in pruning.rs has MAX_LIST_VALUE_SIZE_REWRITE = 20, more than this value will be rewrite
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is good to document the current behavior in a test, but I feel like we can do better -- like #7869 says even if we can't prove the expression is not true due to a large inlist, we shouldn't just disable pruning entirely

@alamb alamb changed the title [Minor] extract const and add doc for in_list pruning [Minor] extract const and add more tests for in_list pruning Jan 10, 2024
datafusion/core/src/physical_optimizer/pruning.rs Outdated Show resolved Hide resolved
fn row_group_predicate_in_list_to_many_values() -> Result<()> {
let schema = Schema::new(vec![Field::new("c1", DataType::Int32, false)]);
// test c1 in(1..21)
// in pruning.rs has MAX_LIST_VALUE_SIZE_REWRITE = 20, more than this value will be rewrite
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is good to document the current behavior in a test, but I feel like we can do better -- like #7869 says even if we can't prove the expression is not true due to a large inlist, we shouldn't just disable pruning entirely

@alamb alamb changed the title [Minor] extract const and add more tests for in_list pruning [Minor] extract const and add doc and more tests for in_list pruning Jan 10, 2024
Ted-Jiang and others added 5 commits January 11, 2024 10:16
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
@Ted-Jiang Ted-Jiang merged commit 5e0970a into apache:main Jan 11, 2024
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants