-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move wildcard expansions to the analyzer #11681
Conversation
bdf474e
to
59cf620
Compare
We performed many optimizations and validations based on the expanded expression when planning SQL. Ideally, we should move these behind the ExpandWildcardRule to avoid additional expansions for the schema fields. |
fn apply_required_rule(logical_plan: LogicalPlan) -> Result<LogicalPlan> { | ||
let options = ConfigOptions::default(); | ||
Analyzer::with_rules(vec![Arc::new(ExpandWildcardRule::new())]).execute_and_check( | ||
logical_plan, | ||
&options, | ||
|_, _| {}, | ||
) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To generate the correct schema, we need to apply ExpandWildcardRule
for the plan of the view first
if let Some(replace) = options.opt_replace { | ||
let replace_expr = replace | ||
.items | ||
.iter() | ||
.map(|item| { | ||
Ok(self.sql_select_to_rex( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to plan the expression of replace
first and then expand them in the ExpandWildcardRule
.
Hi @jayzhan211 @alamb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good to me!
datafusion/expr/src/expr.rs
Outdated
|
||
#[derive(Clone, PartialEq, Eq, Hash, Debug, Default)] | ||
pub struct PlannedReplaceSelectItem { | ||
pub items: Vec<Box<ReplaceSelectElement>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need Box here. Vec<ReplaceSelectElement>
datafusion/expr/src/utils.rs
Outdated
.map(|c| c.flat_name()) | ||
.collect(); | ||
Ok::<_, DataFusionError>( | ||
(0..wildcard_schema.fields().len()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can use wildcard_schema.field_names()
.zip(replace.expressions().iter()) | ||
.find(|(item, _)| item.column_name.value == *name) | ||
{ | ||
*expr = Expr::Alias(Alias { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can try new_expr.alias(name)
datafusion/sql/src/select.rs
Outdated
items: replace.items, | ||
planned_expressions: replace_expr, | ||
}; | ||
Ok(WildcardOptions { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it is worth to have a with_replace
function, so we just need options.with_replace(planned_option)
@@ -61,6 +64,15 @@ impl ViewTable { | |||
Ok(view) | |||
} | |||
|
|||
fn apply_required_rule(logical_plan: LogicalPlan) -> Result<LogicalPlan> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
datafusion/expr/src/expr.rs
Outdated
@@ -970,6 +976,60 @@ impl GroupingSet { | |||
} | |||
} | |||
|
|||
#[derive(Clone, PartialEq, Eq, Hash, Debug, Default)] | |||
pub struct WildcardOptions { | |||
pub opt_ilike: Option<IlikeSelectItem>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think opt_xxx
is redundant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks pretty epic -- thank you @goldmedal and @jayzhan211
@@ -970,6 +976,72 @@ impl GroupingSet { | |||
} | |||
} | |||
|
|||
#[derive(Clone, PartialEq, Eq, Hash, Debug, Default)] | |||
pub struct WildcardOptions { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as a follow on PR, can we add some doc comments to this struct explaining why this structure is needed and how to interpret it?
It seems like the core rationale is that wildcards have different semantics depending on what part of the query they appear in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I added descriptions for the purpose and source of each option.
datafusion/expr/src/expr_fn.rs
Outdated
/// use datafusion_common::TableReference; | ||
/// use datafusion_expr::{qualified_wildcard}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you put a #
in front of lines in an example they are not shown but still compiled
/// use datafusion_common::TableReference; | |
/// use datafusion_expr::{qualified_wildcard}; | |
/// # use datafusion_common::TableReference; | |
/// # use datafusion_expr::{qualified_wildcard}; |
🚀 Thanks again @goldmedal and @jayzhan211 |
Thanks @alamb @jayzhan211 ❤️ |
Man I could spend all day reviewing DataFusion PRs these days. There is so much good stuff going on |
Which issue does this PR close?
Closes #11639 .
Rationale for this change
As discussed in #11639 (comment), I tried to solve the to-do issue for moving wildcard expansion to the analyzer.
datafusion/datafusion/expr/src/logical_plan/builder.rs
Lines 1443 to 1458 in 7db4213
I keep the wildcard expression when planning the SQL and implementing
ExpandWildcardRule
.What changes are included in this PR?
This change impacts many parts. I did the required modifications for them.
Expanding expressions to fields for the schema of the Projection plan
When planning the SQL, we base the schema on a plan to perform multiple validations and optimizations. Even if we move the expansion of
Expr::Wildcard
to the analyzer, the schema should contain the actual column information instead of the wildcard field.We should be careful with
calc_func_dependencies_for_project
.functional_dependencies
is related to the implementation of constraints (#7040) and also affects the implicit grouping keys optimization (#11903 (comment)).Handle
WildacrdAddiotionsOption
for replace, exclude, or expect, ...Datafusion supports options for wildcard expressions. Due to moving the expansion to the analyzer, we should handle
WildcardOptions
withinExpr::Wildcard
. We need to consider these options when expanding the wildcard in the following:ExpandWildcardRule
utils::exprlist_to_fields
utils::exprlist_len
Type_coercion for union
Previously, we will do the type coercion when planning the union. We should do it again after expanding the wildcard expression.
Unparsing Expr::Wildcard
I also implement the unparsing for the
Expr::Wildcard
to pass the tests. However, I leave a to-do issue for unparsingWildcardOptions
Are these changes tested?
yes
Are there any user-facing changes?
no.