-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conflicting optimization rules: common_sub_expression_eliminate
and push_down_projection
#8296
Comments
@jonahgao I am working on a PR to optimize projections. (It is almost ready, in our interval review process. You can find it here). I think this PR will solve this problem. (I didn't verify it though). I suggest you to try whether that PR solves this issue. If that is the case, it should be ready in couple of days for upstream. In this way, we wouldn't duplicate efforts. |
Thank you @mustafasrepo I have tested on this branch, and it looks really nice.
The new However, the I’m wondering if the new |
I think we can cover merge functionality in the new rule also. I will try it out, then let you know about the result. |
I have examined how we can support merge_projections under this PR. It seems that we can support this feature, with this PR all of the optimizations related to the Also as part of that PR, I have changed projection merge logic, such that it doesn't always merges consecutive projections(when previous projection is used to cache complex computation for subsequent projection). With this support, the problem in this issue is resolved also. See test |
A great job 🎉 ! |
Describe the bug
Rule
push_down_projection
negates the optimization results ofcommon_sub_expression_eliminate
, but leaves the useless aliases introduced bycommon_sub_expression_eliminate
. Those added aliases change the signature of the logical plan, also causing the optimizer to never reach the fixed point.In summary, there are two problems:
common_sub_expression_elimination
did not work.datafusion.optimizer.max_passes
.To Reproduce
Run
explain verbose select a/2, a/2 + 1 from t
in CLIcommon_sub_expression_eliminate
generates a new child projection which includes common expressions under the original projection, andpush_down_projection
merges them into one.The final projection is:
a/2
: t.a / Int64(2) AS t.a / Int64(2) AS t.a / Int64(2) AS t.a / Int64(2)a/2+1
: t.a / Int64(2) AS t.a / Int64(2) AS t.a / Int64(2) AS t.a / Int64(2) + Int64(1)Three duplicate aliases appeared, corresponding to
datafusion.optimizer.max_passes=3
.If I set
datafusion.optimizer.max_passes=10
One of the expr in the final projection will be
t.a / Int64(2) AS t.a / Int64(2) AS t.a / Int64(2) AS t.a / Int64(2) AS t.a / Int64(2) AS t.a / Int64(2) AS t.a / Int64(2) AS t.a / Int64(2) AS t.a / Int64(2) AS t.a / Int64(2) AS t.a / Int64(2)
.There are too many unnecessary aliases.
Expected behavior
The logic
merge_projection
inside the rulepush_down_projection
should not undocommon_sub_expression_eliminate
.Additional context
I plan to work on this in the next few days.
My initial thought is to fix it in the
push_down_projection
side.Do not execute
merge_projection
if an expression in the child projection satisfies the following conditions:common_sub_expression_eliminate
.Column
orLiteral
.The text was updated successfully, but these errors were encountered: