Regression: Logical optimizer causes invalid query result with case expression #8942

sergiimk · 2024-01-22T01:46:16Z

Describe the bug

When logical optimization is enabled datafusion v34 started producing incorrect results.

To Reproduce

Here's the minimal repro case I found so far:

let config = SessionConfig::new();
let runtime = Arc::new(RuntimeEnv::default());
let state = SessionState::new_with_config_rt(config, runtime).with_optimizer_rules(vec![]);
let ctx = SessionContext::new_with_state(state);

let schema = Arc::new(Schema::new(vec![Field::new("id", DataType::Int32, false)]));

let batch =
    RecordBatch::try_new(schema, vec![Arc::new(array::Int32Array::from(vec![0, 1]))]).unwrap();

let df = ctx.read_batch(batch).unwrap();
df.clone().show().await.unwrap();

// Add `t` column full of nulls
let df = df
    .with_column("t", cast(Expr::Literal(ScalarValue::Null), DataType::Int32))
    .unwrap();
df.clone().show().await.unwrap();

let df = df
    // (case when id = 1 then 10 else t) as t
    .with_column(
        "t",
        when(col("id").eq(lit(1)), lit(10))
            .otherwise(col("t"))
            .unwrap(),
    )
    .unwrap()
    // (case when id = 1 then 10 else t) as t2
    .with_column(
        "t2",
        when(col("id").eq(lit(1)), lit(10))
            .otherwise(col("t"))
            .unwrap(),
    )
    .unwrap();

df.clone().show().await.unwrap();

Code above will show:

+----+----+----+
| id | t  | t2 |
+----+----+----+
| 0  |    |    |
| 1  | 10 | 10 |
+----+----+----+

which is correct.

Now comment out the with_optimizer_rules(vec![]) and you will get a very different result:

+----+---+----+
| id | t | t2 |
+----+---+----+
| 0  |   |    |
| 1  |   | 10 |
+----+---+----+

Note that despite t and t2 having identical expressions, column t is now different.

Perhaps the fact that t column is being replaced with expression that depends on previous value of t is what triggers the issue.

Expected behavior

Logical optimization does not produce incorrect results.

Additional context

This broke in datafusion 34, version 33 worked fine.

The text was updated successfully, but these errors were encountered:

sergiimk · 2024-01-22T01:52:50Z

I further isolated the issue to OptimizeProjections optimizer step.

gruuya · 2024-01-22T14:55:22Z

It seems like when the entering plan's innermost projection:

Projection: ?table?.id, t, CASE WHEN ?table?.id = Int32(1) THEN Int32(10) ELSE t END AS t2
  Projection: ?table?.id, CASE WHEN ?table?.id = Int32(1) THEN Int32(10) ELSE t END AS t
    Projection: ?table?.id, Int32(NULL) AS t
      TableScan: ?table?

is being rewritten, this evaluation :
https://github.com/apache/arrow-datafusion/blob/2b218be67a6c412629530b812836a6cec76efc32/datafusion/optimizer/src/optimize_projections.rs#L867-L871
concludes that its and its input schema (the bottom most projection) are identical, and so it just discards the projection (proj and its exprs_used) even though it has non-trivial computation on top.

Trying out a naive solution like

@@ -867,7 +867,7 @@ fn rewrite_projection_given_requirements(
     return if let Some(input) =
         optimize_projections(&proj.input, config, &required_indices)?
     {
-        if &projection_schema(&input, &exprs_used)? == input.schema() {
+        if &projection_schema(&input, &exprs_used)? == input.schema() && exprs_used.iter().all(is_expr_trivial) {
             Ok(Some(input))
         } else {
             Projection::try_new(exprs_used, Arc::new(input))

does solve this particular problem but then it fails to eliminate unneeded projections in some other tests cases (notably in test_infinite_source_partition_by which ends up with a bunch of interleaved projections).

gruuya · 2024-01-22T17:09:55Z

Opened #8951 as a potential solution, though I'm not sure that's the best approach here.

sergiimk · 2024-01-22T18:25:16Z

Thank you so much for investigating, @gruuya!

It seems that the offending statement appeared in PR #8340. It was a big refactoring and I can't tell if this line migrated from somewhere else or was introduced.

@mustafasrepo could you kindly take a look at this issue and the proposed solution?

Based on @gruuya's findings it looks like any computation that replaces an existing column without changing the schema will be eliminated by the optimizer, which seems like a major issue (perhaps necessitating a patch release).

alamb · 2024-01-24T15:07:58Z

Proposed PR to fix: #8960

sergiimk added the bug Something isn't working label Jan 22, 2024

gruuya mentioned this issue Jan 22, 2024

Handle nested projection with derived column optimization #8951

Closed

mustafasrepo mentioned this issue Jan 23, 2024

Fix optimize projections bug #8960

Merged

alamb changed the title ~~Logical optimizer causes invalid query result with case expression~~ Regression: Logical optimizer causes invalid query result with case expression Jan 24, 2024

alamb added the regression Something that used to work no longer does label Jan 24, 2024

mustafasrepo closed this as completed in #8960 Jan 25, 2024

alamb mentioned this issue May 7, 2024

Stop copying LogicalPlan and Exprs in OptimizeProjections (2% faster planning) #10405

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regression: Logical optimizer causes invalid query result with case expression #8942

Regression: Logical optimizer causes invalid query result with case expression #8942

sergiimk commented Jan 22, 2024

sergiimk commented Jan 22, 2024

gruuya commented Jan 22, 2024

gruuya commented Jan 22, 2024

sergiimk commented Jan 22, 2024 •

edited

Loading

alamb commented Jan 24, 2024

Regression: Logical optimizer causes invalid query result with case expression #8942

Regression: Logical optimizer causes invalid query result with case expression #8942

Comments

sergiimk commented Jan 22, 2024

Describe the bug

To Reproduce

Expected behavior

Additional context

sergiimk commented Jan 22, 2024

gruuya commented Jan 22, 2024

gruuya commented Jan 22, 2024

sergiimk commented Jan 22, 2024 • edited Loading

alamb commented Jan 24, 2024

sergiimk commented Jan 22, 2024 •

edited

Loading