-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regression: Logical optimizer causes invalid query result with case expression #8942
Comments
I further isolated the issue to |
It seems like when the entering plan's innermost projection: Projection: ?table?.id, t, CASE WHEN ?table?.id = Int32(1) THEN Int32(10) ELSE t END AS t2
Projection: ?table?.id, CASE WHEN ?table?.id = Int32(1) THEN Int32(10) ELSE t END AS t
Projection: ?table?.id, Int32(NULL) AS t
TableScan: ?table? is being rewritten, this evaluation : Trying out a naive solution like @@ -867,7 +867,7 @@ fn rewrite_projection_given_requirements(
return if let Some(input) =
optimize_projections(&proj.input, config, &required_indices)?
{
- if &projection_schema(&input, &exprs_used)? == input.schema() {
+ if &projection_schema(&input, &exprs_used)? == input.schema() && exprs_used.iter().all(is_expr_trivial) {
Ok(Some(input))
} else {
Projection::try_new(exprs_used, Arc::new(input)) does solve this particular problem but then it fails to eliminate unneeded projections in some other tests cases (notably in |
Opened #8951 as a potential solution, though I'm not sure that's the best approach here. |
Thank you so much for investigating, @gruuya! It seems that the offending statement appeared in PR #8340. It was a big refactoring and I can't tell if this line migrated from somewhere else or was introduced. @mustafasrepo could you kindly take a look at this issue and the proposed solution? Based on @gruuya's findings it looks like any computation that replaces an existing column without changing the schema will be eliminated by the optimizer, which seems like a major issue (perhaps necessitating a patch release). |
Proposed PR to fix: #8960 |
Describe the bug
When logical optimization is enabled datafusion
v34
started producing incorrect results.To Reproduce
Here's the minimal repro case I found so far:
Code above will show:
which is correct.
Now comment out the
with_optimizer_rules(vec![])
and you will get a very different result:Note that despite
t
andt2
having identical expressions, columnt
is now different.Perhaps the fact that
t
column is being replaced with expression that depends on previous value oft
is what triggers the issue.Expected behavior
Logical optimization does not produce incorrect results.
Additional context
This broke in
datafusion 34
, version33
worked fine.The text was updated successfully, but these errors were encountered: