Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consolidate Filter::remove_aliases into Expr::unalias_nested #11001

Merged
merged 10 commits into from
Jul 2, 2024

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Jun 19, 2024

Which issue does this PR close?

Follow on to #10835

Rationale for this change

As part of #10835, @peter-toth noted that the nested unaliasing code in FilterExec probably belonged in Exrp to make it more discovrable: https://github.com/apache/datafusion/pull/10835/files#r1644664005

I wonder if it make sense to move this method into Expr?

It turns out when I looked into doing so, there was already an existing implementation that could be used

What changes are included in this PR?

  1. Remove Filter::remove_aliases and fold its implementation into Expr::unalias_nested

Are these changes tested?

They are covered by existing tests / CI

Are there any user-facing changes?

Expr::Alias(alias) => alias.expr.unalias_nested(),
_ => self,
}
pub fn unalias_nested(self) -> Transformed<Expr> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is an API change (it returns Transformed rather than just Expr) but I think that makes more sense

@alamb alamb marked this pull request as ready for review June 19, 2024 15:08
@comphead comphead added the api change Changes the API exposed to users of the crate label Jun 19, 2024
Expr::Alias(Alias { expr, .. }) => {
// directly recurse into the alias expression
// as we just modified the tree structure
Ok(Transformed::yes(expr.unalias_nested().data))
Copy link
Contributor

@peter-toth peter-toth Jun 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to directly recurse into the aliased expression? Why don't just retrun Transformed::yes(*expr) (that implies TreeNodeRecursion::Continue)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is what my first version did -- and it turns out that when Transformed::yes(*expr) is returned the recursion doesn't traverse into the *expr (so nested aliases like (x as foo) as bar are only rewritten to (x as bar)

I tried to convince myself this was a bug in tree node, but think the current behavior makes sense as the newly transformed node wasn't (strictly speaking) part of the original tree and thus automatically traversing down into it may not make senes

Copy link
Contributor

@peter-toth peter-toth Jun 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, this is because of how transform_down() works when you replace the current alias to its child that is also an alias.
In the case of (x as foo) as bar when we return Transformed::yes(*expr) then ... as bar gets replaced to x as foo (so x as foo becomes the current node) and then recursion continues into x (the current node's children).
So currently with the expr.unalias_nested() call and the TreeNodeRecursion::Continue there is actually a double recursion.

I think transform_up() should be ok, but then you can't skip on subqueries.

So I think we have 2 options here:

  1. Use transform_down() but then you should return Ok(Transformed::new(expr.unalias_nested(), true, TreeNodeRecursion::Jump)) to avoid double recursion.
  2. Or use transform_down_up() and in f_down handle the subqueries and in f_up handle alias with returning Ok(Transformed::yes(*expr)).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated my comment above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the code to use transform_down_up -- I think it looks nice now

@alamb
Copy link
Contributor Author

alamb commented Jun 30, 2024

@comphead I wonder if you have time to re-review this PR?

// f_down: skip subqueries. Check in f_down to avoid recursing into them
let recursion = if matches!(
expr,
Expr::Exists { .. } | Expr::ScalarSubquery(_) | Expr::InSubquery(_)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @alamb
I'm not sure if I read this line correctly, I just checked some queries with duckdb and it doesn't allow setting the alias for Exists or In Subqueries

D with t as (select 1 a) select 1 where exists (select * from t) as x;
Error: Parser Error: syntax error at or near "as"
LINE 1: ...elect 1 where exists (select * from t) as x;
                                                  ^
D select 1 where 1 in (1, 2) as x;
Error: Parser Error: syntax error at or near "as"
LINE 1: select 1 where 1 in (1, 2) as x;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I don't think aliases can be set for such plans via SQL -- I think they are (or were) sometimes set programatically (e.g. via some other optimizer pass or rewrite)

Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm thanks @alamb there is some comment, I'm not sure how is it relevant though

@alamb alamb merged commit 09cdb78 into apache:main Jul 2, 2024
23 checks passed
@alamb
Copy link
Contributor Author

alamb commented Jul 2, 2024

Thanks again @peter-toth and @comphead

@alamb alamb deleted the alamb/consolidate_unalias branch July 2, 2024 10:56
findepi pushed a commit to findepi/datafusion that referenced this pull request Jul 16, 2024
…che#11001)

* Consolidate Expr::unalias_nested

* fix doc test

* Clarify unreachable code

* Use transform_down_up to handle recursion
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api change Changes the API exposed to users of the crate logical-expr Logical plan and expressions optimizer Optimizer rules
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants