-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve volatile expression handling in CommonSubexprEliminate
#11265
Conversation
CommonSubexprEliminate
ruleCommonSubexprEliminate
4b395e1
to
6532b97
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @peter-toth -- this looks quite cool. I had one question about the test and some comment suggestions. The test is the only thing I think is needed before merging this PR
cc @waynexia and @haohuaijin in case you have comments
It may also be worth writing a few .slt tests showing this in action and how it interacts with multiple subexpressions
/// The flag is propagated up from children to parent. (E.g. volatile expressions | ||
/// are not valid and can't be extracted, but non-volatile children of volatile | ||
/// expressions can be extracted.) | ||
ExprItem(Identifier<'n>, bool), | ||
} | ||
|
||
impl<'n> ExprIdentifierVisitor<'_, 'n> { | ||
/// Find the first `EnterMark` in the stack, and accumulates every `ExprItem` | ||
/// before it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you possibly update this doc string to explain the 4-tuple that is now returned? Specifically document the two different bool
values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, good idea, I've added the comment in 79b2e02
let plan = LogicalPlanBuilder::from(table_scan.clone()) | ||
.project(vec![ | ||
not_extracted_volatile_short_circuit_1.clone().alias("c1"), | ||
not_extracted_volatile_short_circuit_1.alias("c2"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 I was thinking -- why can't we extract the a = 0
part of a = 0 OR random() = 0
out?
It seems like it would be ok to rewrite
SELECT
a = 0 OR random() = 0 as c1,
a = 0 OR random() = 0 as c2,
..
to something like this:
SELECT
__subexpr_1 OR random() as c1,
__subexpr_1 OR random() as c2,
...
FROM (
SELECT
a=0 as __subexpr_1
FROM
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This question is similar to #11197 (comment). We can extract the surely evaluated expressions (1st legs) from those short circuiting Or
, And
and CaseWhen
expressions, but we are not there yet with this PR.
My 3rd PR in the #11194 epic will implement that logic, but I want to add the improvements gradually as it will require to recurse into only certain children of a parent and that's not straightforward with the current treenode APIs (i.e. we need to stop recursion at the parent with a Jump
and start a new recursion only on the interresting children).
let mut expr_id = None; | ||
let mut is_valid = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please document what "valid" means in this context? I think it means "is valid for CSE" as in "this sub expression could potentially be removed via CSE" but I am not quite sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a comment to the method in 79b2e02, I think it explains what "valid" means in this case, but let me know if we should document the variable as well.
Sure, I will add some tests tomorrow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thanks @peter-toth and @alamb.
I've added CSE tests in 1aa8848, it contais TODOs for #11265 (comment), but I will fix them in the next PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @peter-toth and @haohuaijin
I merged up from main and updated the unit test to avoid a new dependency on datafusion-functions.
Thanks again @peter-toth and @haohuaijin |
Thanks @alamb and @haohuaijin for the review! |
…che#11265) * Improve volatile expression handling in `CommonSubexprEliminate` rule * fix volatile handling with short circuits * fix comments * add slt tests for CSE * Avoid adding datafusion function dependency * revert changes to datafusion-cli.lock --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
…che#11265) * Improve volatile expression handling in `CommonSubexprEliminate` rule * fix volatile handling with short circuits * fix comments * add slt tests for CSE * Avoid adding datafusion function dependency * revert changes to datafusion-cli.lock --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Which issue does this PR close?
Part of #11194.
Rationale for this change
Curently
CommonSubexprEliminate
always skips expressions doesn't allow volatile expressions to be extracted, but this can be improved by allowing non-volatile subexpressions of volatile expressions to be extracted:I.e. from the plan:
the expression
a + b
can be extracted as common expression despite the fact that(a + b) + rand()
is volatile.What changes are included in this PR?
Changes the rule to handle volatile expressions better by adding a new boolean flag to
VisitRecord::ExprItem
and propagating up volatility (validity for CSE).Are these changes tested?
Yes, added new UT.
Are there any user-facing changes?
No.