Make `CommonSubexprEliminate` top-down like #11683

peter-toth · 2024-07-27T12:04:46Z

Which issue does this PR close?

Part of #11194.

Rationale for this change

This PR contains 2 ideas:

In Stop copying LogicalPlan and Exprs in CommonSubexprEliminate (2-3% planning speed improvement) #10835 the CommonSubexprEliminate rule was converted to a ApplyOrder::TopDown rule, but also the self.rewrite() call was kept.

The main problem with the top-down conversion is that in some cases CommonSubexprEliminate collects adjacent nodes (e.g. Window) into groups and once we eliminated subexpression among those groups we transform back the result into adjacent nodes. This kind of transformations can't be definied with a simple top-down optimizer rule. (Most likely this is why the rule handled recursion itself before Stop copying LogicalPlan and Exprs in CommonSubexprEliminate (2-3% planning speed improvement) #10835.)

This means that CommonSubexprEliminate should not be a ApplyOrder::TopDown rule, but it should handle recursion itself. But if we reverse the top-down conversion then there is another issue:
Improve CommonSubexprEliminate identifier management (10% faster planning) #10473 changed to_arrays() to return a boolean flag if it make sense to execute the 2nd rewriting traversal, that does the actual common expression extraction. E.g. if found_common is false rewrite_expr() is not executed:

datafusion/datafusion/optimizer/src/common_subexpr_eliminate.rs

Lines 609 to 622 in 204e1bc

    
           let mut expr_stats = ExprStats::new(); 
        
           let (found_common, id_arrays) = 
        
               self.to_arrays(&expr, &mut expr_stats, ExprMask::Normal)?; 
        
           if found_common { 
        
               let rewritten = self.rewrite_expr( 
        
                   // Must clone as Identifiers use references to original expressions so we 
        
                   // have to keep the original expressions intact. 
        
                   vec![expr.clone()], 
        
                   vec![id_arrays], 
        
                   input, 
        
                   &expr_stats, 
        
                   config, 
        
               )?;

The problem is that calling self.rewrite() is currently in rewrite_expr():

datafusion/datafusion/optimizer/src/common_subexpr_eliminate.rs

Lines 278 to 285 in 204e1bc

    
           let new_input = self.rewrite(input, config)?; 
        
           transformed |= new_input.transformed; 
        
           let mut new_input = new_input.data; 
        
           if !common_exprs.is_empty() { 
        
               assert!(transformed); 
        
               new_input = build_common_expr_project_plan(new_input, common_exprs)?; 
        
           }

(I.e. Improve CommonSubexprEliminate identifier management (10% faster planning) #10473 ruined the self recursion handling of the rule.)

So what we need to do is:

Convert the rule back to handle recursion itself.
Move self.rewrite() call out of the found_common check.

The current rule is not optimal as extracted common expressions are not sub-expression eliminated in the current rule exection. This is because the rule recurses into the child plan nodes with self.rewrite() and then adds the new projection from the extracted common expressions:

The issue with this approach is that even the new projection can contain sub-expressions to eliminate.
E.g. a plan like
```
Projection: (test.a + test.b) * (test.a + test.b) AS c1, (test.a + test.b) * (test.a + test.b) AS c2
  ...
```
can be rewritten to:
```
Projection: __common_expr_1 AS c1, __common_expr_1 AS c2
  Projection: __common_expr_2 * __common_expr_2 AS __common_expr_1, test.a, test.b, test.c
    Projection: test.a + test.b AS __common_expr_2, test.a, test.b, test.c
      ...
```
but the current rule requires 2 rule executions (optimizer cycles) to reach the final plan.
This can be improved by swapping the order of adding the new project and calling self.rewrite().

What changes are included in this PR?

This PR:

Reverts apply_order() to return None.
Changes rewrite_expr() into find_common_exprs() to extract common sub-expressions and rewrite an expression list. The step of recursing into child plan nodes is moved out from this method. This way find_common_exprs() can safely leverage the boolean of to_arrays() to skip the 2nd traversal.
Refactors try_unary_plan(), try_optimize_aggregate() and try_optimize_window().

Are these changes tested?

Yes, added new UTs.

Are there any user-facing changes?

Yes, it fixes a possible performance regression.

peter-toth · 2024-07-27T12:48:03Z

datafusion/optimizer/src/common_subexpr_eliminate.rs

+                        Some((common_aggr_exprs, mut aggr_list)) => {
+                            let new_aggr_expr = aggr_list.pop().unwrap();
+
+                            let mut agg_exprs = common_aggr_exprs


This part is basically the same as it was: https://github.com/apache/datafusion/pull/11683/files#diff-351499880963d6a383c92e156e75019cd9ce33107724a9635853d7d4cd1898d0L522

peter-toth · 2024-07-27T12:50:55Z

datafusion/optimizer/src/common_subexpr_eliminate.rs

+            .map_data(|(new_window_expr_list, new_input, window_expr_list)| {
+                // If there were common expressions extracted, then we need to make sure
+                // we restore the original column names.
+                // TODO: Although `find_common_exprs()` inserts aliases around extracted


I wanted to fix the previous TODO (https://github.com/apache/datafusion/pull/11683/files#diff-351499880963d6a383c92e156e75019cd9ce33107724a9635853d7d4cd1898d0L563) but realized that preserving names is still required, so I added 2 TODOs where we have that logic.
I will try to get rid of them in a follow-up PR.

peter-toth · 2024-07-27T12:51:26Z

cc @alamb

peter-toth · 2024-07-27T12:52:40Z

datafusion/optimizer/src/common_subexpr_eliminate.rs

@@ -1963,6 +1944,52 @@ mod test {
        Ok(())
    }

+    #[test]
+    fn test_non_top_level_common_expression() -> Result<()> {


New test for the possible perf regression (1.).

I verified without the code chages in this PR this test fails like this:

failed to optimize plan thread 'common_subexpr_eliminate::test::test_non_top_level_common_expression' panicked at datafusion/optimizer/src/common_subexpr_eliminate.rs:1215:9: failed to optimize plan stack backtrace: 0: rust_begin_unwind at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/std/src/panicking.rs:652:5 1: core::panicking::panic_fmt at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/core/src/panicking.rs:72:14 2: datafusion_optimizer::common_subexpr_eliminate::test::assert_optimized_plan_eq at ./src/common_subexpr_eliminate.rs:1215:9 3: datafusion_optimizer::common_subexpr_eliminate::test::test_non_top_level_common_expression at ./src/common_subexpr_eliminate.rs:1984:9 4: datafusion_optimizer::common_subexpr_eliminate::test::test_non_top_level_common_expression::{{closure}} at ./src/common_subexpr_eliminate.rs:1967:50 5: core::ops::function::FnOnce::call_once at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/core/src/ops/function.rs:250:5 6: core::ops::function::FnOnce::call_once at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/core/src/ops/function.rs:250:5 note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

peter-toth · 2024-07-27T12:53:06Z

datafusion/optimizer/src/common_subexpr_eliminate.rs

+    }
+
+    #[test]
+    fn test_nested_common_expression() -> Result<()> {


New test for the top-down improvement (2.).

I verified this test fails without the changes in this PR:

assertion `left == right` failed left: "Projection: __common_expr_1 AS c1, __common_expr_1 AS c2\n Projection: __common_expr_2 * __common_expr_2 AS __common_expr_1, test.a, test.b, test.c\n Projection: test.a + test.b AS __common_expr_2, test.a, test.b, test.c\n TableScan: test" right: "Projection: __common_expr_1 AS c1, __common_expr_1 AS c2\n Projection: (test.a + test.b) * (test.a + test.b) AS __common_expr_1, test.a, test.b, test.c\n TableScan: test" <Click to see difference> thread 'common_subexpr_eliminate::test::test_nested_common_expression' panicked at datafusion/optimizer/src/common_subexpr_eliminate.rs:1218:9: assertion `left == right` failed left: "Projection: __common_expr_1 AS c1, __common_expr_1 AS c2\n Projection: __common_expr_2 * __common_expr_2 AS __common_expr_1, test.a, test.b, test.c\n Projection: test.a + test.b AS __common_expr_2, test.a, test.b, test.c\n TableScan: test" right: "Projection: __common_expr_1 AS c1, __common_expr_1 AS c2\n Projection: (test.a + test.b) * (test.a + test.b) AS __common_expr_1, test.a, test.b, test.c\n TableScan: test" stack backtrace: 0: rust_begin_unwind at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/std/src/panicking.rs:652:5 1: core::panicking::panic_fmt at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/core/src/panicking.rs:72:14 2: core::panicking::assert_failed_inner 3: core::panicking::assert_failed at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/core/src/panicking.rs:363:5 4: datafusion_optimizer::common_subexpr_eliminate::test::assert_optimized_plan_eq at ./src/common_subexpr_eliminate.rs:1218:9 5: datafusion_optimizer::common_subexpr_eliminate::test::test_nested_common_expression at ./src/common_subexpr_eliminate.rs:2007:9 6: datafusion_optimizer::common_subexpr_eliminate::test::test_nested_common_expression::{{closure}} at ./src/common_subexpr_eliminate.rs:1990:43 7: core::ops::function::FnOnce::call_once at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/core/src/ops/function.rs:250:5 8: core::ops::function::FnOnce::call_once at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/core/src/ops/function.rs:250:5 note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

Namely the output projection looks like

Projection: __common_expr_1 AS c1, __common_expr_1 AS c2 Projection: (test.a + test.b) * (test.a + test.b) AS __common_expr_1, test.a, test.b, test.c TableScan: test

Rather than

Projection: __common_expr_1 AS c1, __common_expr_1 AS c2 Projection: __common_expr_2 * __common_expr_2 AS __common_expr_1, test.a, test.b, test.c Projection: test.a + test.b AS __common_expr_2, test.a, test.b, test.c TableScan: test

Yes the 2nd is a better plan and after this PR we reach it with executing the rule only once. (Before this PR the optimizer had to execute the rule one more time to reach the 2nd plan from the 1st.)

alamb · 2024-07-29T16:53:45Z

Thank you @peter-toth -- this is on my review list

alamb · 2024-07-30T19:34:08Z

Interestingly, I ran the planning benchmarks and this branch actually seems to be ever so slightly slower. I'll rerun to see if I can reproduce the same results.

cargo bench --bench sql_planner

++ critcmp main make-cse-top-down-like
group                                         main                                   make-cse-top-down-like
-----                                         ----                                   ----------------------
logical_aggregate_with_join                   1.00  1135.6±20.84µs        ? ?/sec    1.01  1149.4±92.43µs        ? ?/sec
logical_plan_tpcds_all                        1.00    163.4±1.30ms        ? ?/sec    1.01    164.9±1.10ms        ? ?/sec
logical_plan_tpch_all                         1.00     17.7±0.18ms        ? ?/sec    1.01     17.9±0.21ms        ? ?/sec
logical_select_all_from_1000                  1.00     18.0±0.13ms        ? ?/sec    1.00     17.9±0.12ms        ? ?/sec
logical_select_one_from_700                   1.00   835.8±30.14µs        ? ?/sec    1.00    837.2±9.13µs        ? ?/sec
logical_trivial_join_high_numbered_columns    1.00   786.5±10.01µs        ? ?/sec    1.00   785.2±21.12µs        ? ?/sec
logical_trivial_join_low_numbered_columns     1.00    767.4±8.46µs        ? ?/sec    1.01   772.1±18.00µs        ? ?/sec
physical_plan_tpcds_all                       1.00   1124.5±5.74ms        ? ?/sec    1.04   1168.5±4.72ms        ? ?/sec
physical_plan_tpch_all                        1.00     72.9±0.62ms        ? ?/sec    1.03     75.2±1.28ms        ? ?/sec
physical_plan_tpch_q1                         1.00      2.6±0.02ms        ? ?/sec    1.05      2.7±0.02ms        ? ?/sec
physical_plan_tpch_q10                        1.00      3.6±0.03ms        ? ?/sec    1.04      3.7±0.03ms        ? ?/sec
physical_plan_tpch_q11                        1.00      3.1±0.02ms        ? ?/sec    1.02      3.2±0.04ms        ? ?/sec
physical_plan_tpch_q12                        1.00      2.5±0.02ms        ? ?/sec    1.04      2.6±0.03ms        ? ?/sec
physical_plan_tpch_q13                        1.00  1846.8±16.66µs        ? ?/sec    1.04  1912.5±60.02µs        ? ?/sec
physical_plan_tpch_q14                        1.00      2.2±0.02ms        ? ?/sec    1.03      2.2±0.02ms        ? ?/sec
physical_plan_tpch_q16                        1.00      3.1±0.06ms        ? ?/sec    1.04      3.2±0.03ms        ? ?/sec
physical_plan_tpch_q17                        1.00      2.8±0.02ms        ? ?/sec    1.02      2.9±0.03ms        ? ?/sec
physical_plan_tpch_q18                        1.00      3.3±0.02ms        ? ?/sec    1.03      3.4±0.03ms        ? ?/sec
physical_plan_tpch_q19                        1.00      4.9±0.04ms        ? ?/sec    1.03      5.1±0.06ms        ? ?/sec
physical_plan_tpch_q2                         1.00      6.3±0.06ms        ? ?/sec    1.02      6.4±0.23ms        ? ?/sec
physical_plan_tpch_q20                        1.00      3.7±0.03ms        ? ?/sec    1.03      3.8±0.06ms        ? ?/sec
physical_plan_tpch_q21                        1.00      5.0±0.05ms        ? ?/sec    1.02      5.1±0.05ms        ? ?/sec
physical_plan_tpch_q22                        1.00      2.8±0.02ms        ? ?/sec    1.03      2.9±0.04ms        ? ?/sec
physical_plan_tpch_q3                         1.00      2.6±0.02ms        ? ?/sec    1.03      2.7±0.03ms        ? ?/sec
physical_plan_tpch_q4                         1.00      2.0±0.02ms        ? ?/sec    1.02      2.0±0.02ms        ? ?/sec
physical_plan_tpch_q5                         1.00      3.8±0.04ms        ? ?/sec    1.03      3.9±0.05ms        ? ?/sec
physical_plan_tpch_q6                         1.00  1360.3±23.73µs        ? ?/sec    1.05  1430.4±38.00µs        ? ?/sec
physical_plan_tpch_q7                         1.00      4.7±0.06ms        ? ?/sec    1.01      4.8±0.05ms        ? ?/sec
physical_plan_tpch_q8                         1.00      5.8±0.06ms        ? ?/sec    1.02      5.9±0.07ms        ? ?/sec
physical_plan_tpch_q9                         1.00      4.4±0.03ms        ? ?/sec    1.02      4.5±0.04ms        ? ?/sec
physical_select_all_from_1000                 1.00     44.7±0.20ms        ? ?/sec    1.00     44.6±0.24ms        ? ?/sec
physical_select_one_from_700                  1.00      3.4±0.02ms        ? ?/sec    1.00      3.4±0.01ms        ? ?/sec

peter-toth · 2024-07-30T20:22:21Z

Interestingly, I ran the planning benchmarks and this branch actually seems to be ever so slightly slower. I'll rerun to see if I can reproduce the same results.

Yeah, this is basically the trade-off between the time we spend in planning and execution. These benchmarks only measure planning time and that might be a bit more with this PR. #10473 could cause regression due to no longer extracting common expressions from non top level nodes and so causing longer execution times in some cases. (This side effect was not intentional, I just didn't notice it with the test available at that time.) This PR restores the common expression elimination from non top level nodes and so fix the possible execution time regression, but it comes with a small cost in planning time.

BTW, the TPCH queries are not affected by the regression (there are no non top level nodes with common expressions in them) so this PR only makes their planning slower but doesn't improve their execution times. But I still think that the possible execution time regression that #10473 can cause needs to be fixed.

Update: I've ellaborated on 1. in the PR description.

alamb · 2024-07-30T20:26:45Z

I reran the planning benchmarks and I see the slowdown again

Details

group                                         main                                   make-cse-top-down-like
-----                                         ----                                   ----------------------
logical_aggregate_with_join                   1.00  1135.2±13.55µs        ? ?/sec    1.00  1134.3±12.87µs        ? ?/sec
logical_plan_tpcds_all                        1.00    164.2±1.33ms        ? ?/sec    1.00    163.8±1.33ms        ? ?/sec
logical_plan_tpch_all                         1.00     17.8±0.15ms        ? ?/sec    1.00     17.9±0.21ms        ? ?/sec
logical_select_all_from_1000                  1.00     18.0±0.16ms        ? ?/sec    1.01     18.2±0.89ms        ? ?/sec
logical_select_one_from_700                   1.00   840.5±20.94µs        ? ?/sec    1.00    837.6±9.76µs        ? ?/sec
logical_trivial_join_high_numbered_columns    1.00   785.9±17.46µs        ? ?/sec    1.01   793.5±38.70µs        ? ?/sec
logical_trivial_join_low_numbered_columns     1.00    774.2±8.94µs        ? ?/sec    1.00   775.4±13.41µs        ? ?/sec
physical_plan_tpcds_all                       1.00   1130.6±4.40ms        ? ?/sec    1.03   1166.8±6.32ms        ? ?/sec
physical_plan_tpch_all                        1.00     73.4±0.77ms        ? ?/sec    1.03     75.7±1.00ms        ? ?/sec
physical_plan_tpch_q1                         1.00      2.6±0.03ms        ? ?/sec    1.05      2.7±0.02ms        ? ?/sec
physical_plan_tpch_q10                        1.00      3.7±0.04ms        ? ?/sec    1.04      3.8±0.07ms        ? ?/sec
physical_plan_tpch_q11                        1.00      3.1±0.03ms        ? ?/sec    1.01      3.2±0.03ms        ? ?/sec
physical_plan_tpch_q12                        1.00      2.5±0.03ms        ? ?/sec    1.04      2.6±0.03ms        ? ?/sec
physical_plan_tpch_q13                        1.00  1865.1±22.15µs        ? ?/sec    1.03  1925.7±21.68µs        ? ?/sec
physical_plan_tpch_q14                        1.00      2.2±0.03ms        ? ?/sec    1.03      2.3±0.03ms        ? ?/sec
physical_plan_tpch_q16                        1.00      3.1±0.03ms        ? ?/sec    1.04      3.2±0.04ms        ? ?/sec
physical_plan_tpch_q17                        1.01      2.9±0.19ms        ? ?/sec    1.00      2.9±0.03ms        ? ?/sec
physical_plan_tpch_q18                        1.00      3.3±0.04ms        ? ?/sec    1.04      3.5±0.05ms        ? ?/sec
physical_plan_tpch_q19                        1.00      5.0±0.07ms        ? ?/sec    1.03      5.2±0.09ms        ? ?/sec
physical_plan_tpch_q2                         1.00      6.3±0.05ms        ? ?/sec    1.03      6.5±0.08ms        ? ?/sec
physical_plan_tpch_q20                        1.00      3.7±0.04ms        ? ?/sec    1.05      3.8±0.05ms        ? ?/sec
physical_plan_tpch_q21                        1.00      5.1±0.06ms        ? ?/sec    1.04      5.3±0.09ms        ? ?/sec
physical_plan_tpch_q22                        1.00      2.8±0.02ms        ? ?/sec    1.04      2.9±0.05ms        ? ?/sec
physical_plan_tpch_q3                         1.00      2.6±0.03ms        ? ?/sec    1.03      2.7±0.02ms        ? ?/sec
physical_plan_tpch_q4                         1.00      2.0±0.02ms        ? ?/sec    1.02      2.1±0.02ms        ? ?/sec
physical_plan_tpch_q5                         1.00      3.8±0.04ms        ? ?/sec    1.03      3.9±0.06ms        ? ?/sec
physical_plan_tpch_q6                         1.00  1373.4±33.44µs        ? ?/sec    1.04  1429.7±13.05µs        ? ?/sec
physical_plan_tpch_q7                         1.00      4.8±0.08ms        ? ?/sec    1.00      4.8±0.07ms        ? ?/sec
physical_plan_tpch_q8                         1.00      5.9±0.08ms        ? ?/sec    1.00      5.9±0.11ms        ? ?/sec
physical_plan_tpch_q9                         1.00      4.5±0.07ms        ? ?/sec    1.01      4.5±0.06ms        ? ?/sec
physical_select_all_from_1000                 1.00     44.6±0.26ms        ? ?/sec    1.01     44.9±0.23ms        ? ?/sec
physical_select_one_from_700                  1.00      3.4±0.03ms        ? ?/sec    1.00      3.4±0.02ms        ? ?/sec

alamb · 2024-08-05T20:36:51Z

I am sorry for the delay in reviewing this PR -- my big hesitation is that after all the work we have done to improve planning time this makes planning time worse

I understand that there is a tradeoff wher the plan execution time should decrease for other plans. However, there are no changes to existing tests, thus suggesting that this isn't a widely applicable optimziation

So what I am hoping to do (or maybe someone will beat me to it) is figure out how to have my cake and eat it too (aka optimize this code so it doesn't slow down planning but still makes better plans)

peter-toth · 2024-08-06T08:40:38Z

@alamb , I just realized that CommonSubexprEliminate is a ApplyOrder::TopDown rule: https://github.com/apache/datafusion/blob/main/datafusion/optimizer/src/common_subexpr_eliminate.rs#L690-L692

I have no idea how I missed that fact. Maybe because I was working on parts of the rule where it does call self.rewrite() explicitely (organizes recursion itself).
I don't get why this rule became TopDown in #10835 and if it is really TopDown then why the current code can't deal with test_non_top_level_common_expression. Let me convert this PR to draft and look into those...

…ify behavior on plans

peter-toth · 2024-08-06T13:56:42Z

I believe I understood what happened here.
In #10835 the CommonSubexprEliminate rule was converted to a ApplyOrder::TopDown rule, but also the self.rewrite() call was kept and caused a weird plan traversal where some of the nodes are traversed 2 times.

The main problem with the top-down conversion is that in some cases CommonSubexprEliminate collects adjacent nodes (e.g. Window) into groups and once we eliminated subexpression among those groups we transform back the result into adjacent nodes. This kind of transformations can't be definied with a simple top-down optimizer rule. Most likely this is why the rule handled recursion itself before #10835.

Then I added test_non_top_level_common_expression test in this PR and noticed that a non root project node was not CSEd. But I made a mistake as the unit tests in common_subexpr_eliminate.rs don't actually use an Optimizer (configured to use CommonSubexprEliminate) but just invoke CommonSubexprEliminate::rewrite() directly. So despite the CommonSubexprEliminate was top-down, the test couldn't succeed.

So I think the right fix is to:

Let the rule handle recursion (i.e. change apply_order() to return None) to avoid double recursion.
Change assert_optimized_plan_eq to use an Optimizer to be able to test cases like test_non_top_level_common_expression.

I've rebased the previous commit on the latest main and added a 2nd commit with the above 2 points.

I ran some benchmarks and this PR is now better than main:

% critcmp main make-cse-top-down-like
group                                         main                                    make-cse-top-down-like
-----                                         ----                                    ----------------------
logical_aggregate_with_join                   1.01   565.4±14.92µs        ? ?/sec     1.00    561.0±9.20µs        ? ?/sec
logical_plan_tpcds_all                        1.03     77.5±2.96ms        ? ?/sec     1.00     75.2±1.10ms        ? ?/sec
logical_plan_tpch_all                         1.00      7.7±0.09ms        ? ?/sec     1.00      7.7±0.08ms        ? ?/sec
logical_select_all_from_1000                  1.00     15.0±0.26ms        ? ?/sec     1.01     15.2±0.18ms        ? ?/sec
logical_select_one_from_700                   1.00   405.6±10.82µs        ? ?/sec     1.00   404.3±10.31µs        ? ?/sec
logical_trivial_join_high_numbered_columns    1.01    402.7±5.89µs        ? ?/sec     1.00    399.7±6.28µs        ? ?/sec
logical_trivial_join_low_numbered_columns     1.01    383.7±7.82µs        ? ?/sec     1.00    378.9±5.03µs        ? ?/sec
physical_plan_tpcds_all                       1.09  590.3±107.45ms        ? ?/sec     1.00    543.2±3.93ms        ? ?/sec
physical_plan_tpch_all                        1.01     34.1±0.47ms        ? ?/sec     1.00     33.9±0.58ms        ? ?/sec
physical_plan_tpch_q1                         1.00  1105.2±12.43µs        ? ?/sec     1.00  1110.3±23.70µs        ? ?/sec
physical_plan_tpch_q10                        1.01  1550.2±38.57µs        ? ?/sec     1.00  1537.9±19.03µs        ? ?/sec
physical_plan_tpch_q11                        1.01  1328.8±16.32µs        ? ?/sec     1.00  1313.9±17.56µs        ? ?/sec
physical_plan_tpch_q12                        1.01  1108.5±18.29µs        ? ?/sec     1.00  1096.9±16.36µs        ? ?/sec
physical_plan_tpch_q13                        1.00   755.8±10.98µs        ? ?/sec     1.00   752.3±10.62µs        ? ?/sec
physical_plan_tpch_q14                        1.04   942.4±73.52µs        ? ?/sec     1.00   908.6±11.28µs        ? ?/sec
physical_plan_tpch_q16                        1.01  1345.2±26.82µs        ? ?/sec     1.00  1334.1±34.09µs        ? ?/sec
physical_plan_tpch_q17                        1.01  1200.3±50.67µs        ? ?/sec     1.00  1188.0±22.75µs        ? ?/sec
physical_plan_tpch_q18                        1.02  1433.4±102.83µs        ? ?/sec    1.00  1402.0±15.42µs        ? ?/sec
physical_plan_tpch_q19                        1.02      2.6±0.20ms        ? ?/sec     1.00      2.6±0.17ms        ? ?/sec
physical_plan_tpch_q2                         1.00      2.9±0.04ms        ? ?/sec     1.00      2.9±0.04ms        ? ?/sec
physical_plan_tpch_q20                        1.04  1673.2±79.81µs        ? ?/sec     1.00  1616.4±20.52µs        ? ?/sec
physical_plan_tpch_q21                        1.04      2.4±0.10ms        ? ?/sec     1.00      2.3±0.03ms        ? ?/sec
physical_plan_tpch_q22                        1.02  1165.8±27.17µs        ? ?/sec     1.00  1147.1±35.94µs        ? ?/sec
physical_plan_tpch_q3                         1.00  1102.3±16.77µs        ? ?/sec     1.00  1099.9±12.63µs        ? ?/sec
physical_plan_tpch_q4                         1.00    846.3±8.77µs        ? ?/sec     1.00    844.0±9.90µs        ? ?/sec
physical_plan_tpch_q5                         1.01  1661.4±23.93µs        ? ?/sec     1.00  1641.8±19.28µs        ? ?/sec
physical_plan_tpch_q6                         1.00    584.7±9.64µs        ? ?/sec     1.00   583.2±20.63µs        ? ?/sec
physical_plan_tpch_q7                         1.00      2.1±0.03ms        ? ?/sec     1.00      2.1±0.21ms        ? ?/sec
physical_plan_tpch_q8                         1.01      2.7±0.04ms        ? ?/sec     1.00      2.6±0.04ms        ? ?/sec
physical_plan_tpch_q9                         1.01  1932.5±23.29µs        ? ?/sec     1.00  1906.6±21.08µs        ? ?/sec
physical_select_all_from_1000                 1.00     35.2±1.20ms        ? ?/sec     1.01     35.5±0.48ms        ? ?/sec
physical_select_one_from_700                  1.00  1781.3±59.97µs        ? ?/sec     1.01  1805.4±43.25µs        ? ?/sec

peter-toth · 2024-08-06T14:38:52Z

I've updated the PR description and PR is ready for review again.

alamb · 2024-08-06T20:02:55Z

Thanks @peter-toth

I also reran the benchmarks and no longer see any regression. I will get this PR reviewed carefully shortly (hopefully later today or tomorrow)

Details

++ critcmp main make-cse-top-down-like
group                                         main                                   make-cse-top-down-like
-----                                         ----                                   ----------------------
logical_aggregate_with_join                   1.00  1356.5±10.54µs        ? ?/sec    1.01  1366.6±61.02µs        ? ?/sec
logical_plan_tpcds_all                        1.00    186.6±1.00ms        ? ?/sec    1.00    187.3±1.09ms        ? ?/sec
logical_plan_tpch_all                         1.00     22.9±0.17ms        ? ?/sec    1.00     22.8±0.18ms        ? ?/sec
logical_select_all_from_1000                  1.00     18.0±0.11ms        ? ?/sec    1.00     18.0±0.11ms        ? ?/sec
logical_select_one_from_700                   1.00  1096.5±13.51µs        ? ?/sec    1.00  1093.5±12.52µs        ? ?/sec
logical_trivial_join_high_numbered_columns    1.02  1045.5±15.11µs        ? ?/sec    1.00  1029.4±14.83µs        ? ?/sec
logical_trivial_join_low_numbered_columns     1.01  1028.2±16.71µs        ? ?/sec    1.00  1018.4±12.42µs        ? ?/sec
physical_plan_tpcds_all                       1.00   1110.8±3.48ms        ? ?/sec    1.00   1110.1±3.55ms        ? ?/sec
physical_plan_tpch_all                        1.00     76.9±0.42ms        ? ?/sec    1.00     76.6±0.40ms        ? ?/sec
physical_plan_tpch_q1                         1.00      2.8±0.03ms        ? ?/sec    1.00      2.8±0.02ms        ? ?/sec
physical_plan_tpch_q10                        1.00      3.8±0.02ms        ? ?/sec    1.00      3.8±0.02ms        ? ?/sec
physical_plan_tpch_q11                        1.00      3.3±0.02ms        ? ?/sec    1.00      3.3±0.01ms        ? ?/sec
physical_plan_tpch_q12                        1.00      2.7±0.02ms        ? ?/sec    1.00      2.7±0.01ms        ? ?/sec
physical_plan_tpch_q13                        1.00      2.1±0.02ms        ? ?/sec    1.00      2.1±0.01ms        ? ?/sec
physical_plan_tpch_q14                        1.00      2.4±0.02ms        ? ?/sec    1.00      2.4±0.01ms        ? ?/sec
physical_plan_tpch_q16                        1.00      3.3±0.02ms        ? ?/sec    1.00      3.3±0.02ms        ? ?/sec
physical_plan_tpch_q17                        1.00      3.0±0.02ms        ? ?/sec    1.00      3.0±0.02ms        ? ?/sec
physical_plan_tpch_q18                        1.00      3.5±0.02ms        ? ?/sec    1.00      3.5±0.02ms        ? ?/sec
physical_plan_tpch_q19                        1.00      5.1±0.02ms        ? ?/sec    1.01      5.1±0.03ms        ? ?/sec
physical_plan_tpch_q2                         1.00      6.4±0.03ms        ? ?/sec    1.00      6.4±0.03ms        ? ?/sec
physical_plan_tpch_q20                        1.00      3.9±0.02ms        ? ?/sec    1.01      3.9±0.02ms        ? ?/sec
physical_plan_tpch_q21                        1.00      5.2±0.03ms        ? ?/sec    1.00      5.2±0.02ms        ? ?/sec
physical_plan_tpch_q22                        1.00      3.0±0.02ms        ? ?/sec    1.01      3.0±0.07ms        ? ?/sec
physical_plan_tpch_q3                         1.00      2.8±0.02ms        ? ?/sec    1.00      2.8±0.02ms        ? ?/sec
physical_plan_tpch_q4                         1.01      2.2±0.02ms        ? ?/sec    1.00      2.2±0.01ms        ? ?/sec
physical_plan_tpch_q5                         1.00      3.9±0.02ms        ? ?/sec    1.00      3.9±0.02ms        ? ?/sec
physical_plan_tpch_q6                         1.00  1577.6±12.49µs        ? ?/sec    1.01  1588.0±15.26µs        ? ?/sec
physical_plan_tpch_q7                         1.00      4.9±0.03ms        ? ?/sec    1.00      4.9±0.03ms        ? ?/sec
physical_plan_tpch_q8                         1.00      5.9±0.02ms        ? ?/sec    1.00      5.9±0.02ms        ? ?/sec
physical_plan_tpch_q9                         1.00      4.6±0.03ms        ? ?/sec    1.00      4.6±0.02ms        ? ?/sec
physical_select_all_from_1000                 1.00     43.6±0.11ms        ? ?/sec    1.00     43.5±0.19ms        ? ?/sec
physical_select_one_from_700                  1.00      3.5±0.02ms        ? ?/sec    1.00      3.6±0.02ms        ? ?/sec

alamb · 2024-08-06T20:03:33Z

CI failure seemed infra related https://github.com/apache/datafusion/actions/runs/10267531872/job/28408513630?pr=11683 so I restarted it on this PR

alamb · 2024-08-07T21:59:24Z

I am sorry -- something came up at work today and I am super behind on reviews. this is very high on my list for tomorow

peter-toth · 2024-08-08T07:34:04Z

I am sorry -- something came up at work today and I am super behind on reviews. this is very high on my list for tomorow

No worries @alamb, this PR can wait.

alamb

Thank you @peter-toth

I reviewed this PR quite carefully and I think it is very nice 👌

I had a suggestion on readability (use a named struct) in peter-toth#4 but we could also merge that as a follow on PR (or never)

alamb · 2024-08-08T14:30:47Z

datafusion/optimizer/src/common_subexpr_eliminate.rs

@@ -688,7 +702,10 @@ impl OptimizerRule for CommonSubexprEliminate {
    }

    fn apply_order(&self) -> Option<ApplyOrder> {
-        Some(ApplyOrder::TopDown)
+        // This rule handles recursion itself in a `ApplyOrder::TopDown` like manner.


alamb · 2024-08-08T14:32:32Z

datafusion/optimizer/src/common_subexpr_eliminate.rs

-    fn rewrite_expr(
+    /// 2. An optional tuple that contains the extracted common sub-expressions and the
+    ///    original `exprs_list`.
+    fn find_common_exprs(


This is a very nice refactor / rename -- calling it find_common_exprs is 💯 and I think makes it much easier to follow

alamb · 2024-08-08T15:03:22Z

datafusion/optimizer/src/common_subexpr_eliminate.rs

-    /// Rewrites the expression in `exprs_list` with common sub-expressions
-    /// replaced with a new column and adds a ProjectionExec on top of `input`
-    /// which computes any replaced common sub-expressions.
+    /// Extracts common sub-expressions and rewrites `exprs_list`.
    ///
    /// Returns a tuple of:
    /// 1. The rewritten expressions


Suggested change

/// 1. The rewritten expressions

/// 1. The (potentially) rewritten expressions

alamb · 2024-08-08T15:08:45Z

datafusion/optimizer/src/common_subexpr_eliminate.rs

-                            Expr::Column(Column::from_name(expr_alias)).alias(out_name),
-                        );
+        let input = unwrap_arc(input);
+        // Extract common sub-expressions from the aggregate and grouping expressions.


this new structure is very nice

alamb · 2024-08-08T15:22:08Z

datafusion/optimizer/src/common_subexpr_eliminate.rs

-        )?;
-        transformed |= rewrite_exprs.transformed;
+        expr_mask: ExprMask,
+    ) -> Result<Transformed<(Vec<Vec<Expr>>, FindCommonExprResult)>> {


I found reasoning about this return type (Vec<Vec<Expr>>, FindCommonExprResult) hard (I couldn't keep track in my head of the three potential fields that were returned)

I made a PR here with a proposal to move it into a named enum: peter-toth#4

(I can also make that a PR to main if we would prefer to merge this PR as is)

Thanks for PR, I've just merged it.

* Extract the result of find_common_exprs into a struct * Make naming consistent

alamb · 2024-08-08T16:48:45Z

Thanks @peter-toth -- I plan to merge this PR tomorrow (after the #11476 RC is cut)

# Conflicts: # datafusion/optimizer/src/common_subexpr_eliminate.rs

alamb · 2024-08-09T13:43:17Z

Thanks again @peter-toth -- the code continues to look nicer and nicer

github-actions bot added the optimizer Optimizer rules label Jul 27, 2024

peter-toth commented Jul 27, 2024

View reviewed changes

This was referenced Aug 1, 2024

DataFusion weekly project plan (Andrew Lamb) - July 29, 2024 #11710

Closed

DataFusion weekly project plan (Andrew Lamb) - Aug 5, 2024 #11826

Closed

peter-toth marked this pull request as draft August 6, 2024 08:40

peter-toth added 2 commits August 6, 2024 14:14

Make CommonSubexprEliminate top-down like

7ee369f

fix top-down recursion, fix unit tests to use real a Optimizer to ver…

6a62811

…ify behavior on plans

peter-toth force-pushed the make-cse-top-down-like branch from 1b67311 to 6a62811 Compare August 6, 2024 13:30

peter-toth marked this pull request as ready for review August 6, 2024 14:37

alamb mentioned this pull request Aug 8, 2024

Extract result of find_common_exprs into a struct peter-toth/datafusion#4

Merged

alamb approved these changes Aug 8, 2024

View reviewed changes

Extract result of find_common_exprs into a struct (#4)

5fa5457

* Extract the result of find_common_exprs into a struct * Make naming consistent

Merge branch 'main' into make-cse-top-down-like

0d3c81b

# Conflicts: # datafusion/optimizer/src/common_subexpr_eliminate.rs

alamb merged commit b5d7931 into apache:main Aug 9, 2024
24 checks passed

alamb mentioned this pull request Aug 14, 2024

DataFusion weekly project plan (Andrew Lamb) - Aug 12, 2024 #11986

Closed

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make `CommonSubexprEliminate` top-down like #11683

Make `CommonSubexprEliminate` top-down like #11683

peter-toth commented Jul 27, 2024 •

edited

Loading

peter-toth Jul 27, 2024

peter-toth Jul 27, 2024

peter-toth commented Jul 27, 2024

peter-toth Jul 27, 2024

alamb Jul 30, 2024

peter-toth Jul 27, 2024

alamb Jul 30, 2024

peter-toth Jul 30, 2024

alamb commented Jul 29, 2024

alamb commented Jul 30, 2024

peter-toth commented Jul 30, 2024 •

edited

Loading

alamb commented Jul 30, 2024

alamb commented Aug 5, 2024

peter-toth commented Aug 6, 2024 •

edited

Loading

peter-toth commented Aug 6, 2024 •

edited

Loading

peter-toth commented Aug 6, 2024 •

edited

Loading

alamb commented Aug 6, 2024

alamb commented Aug 6, 2024

alamb commented Aug 7, 2024

peter-toth commented Aug 8, 2024 •

edited

Loading

alamb left a comment

alamb Aug 8, 2024

alamb Aug 8, 2024

alamb Aug 8, 2024

alamb Aug 8, 2024

alamb Aug 8, 2024

peter-toth Aug 8, 2024

alamb commented Aug 8, 2024

alamb commented Aug 9, 2024

	let mut expr_stats = ExprStats::new();
	let (found_common, id_arrays) =
	self.to_arrays(&expr, &mut expr_stats, ExprMask::Normal)?;

	if found_common {
	let rewritten = self.rewrite_expr(
	// Must clone as Identifiers use references to original expressions so we
	// have to keep the original expressions intact.
	vec![expr.clone()],
	vec![id_arrays],
	input,
	&expr_stats,
	config,
	)?;

	let new_input = self.rewrite(input, config)?;
	transformed \|= new_input.transformed;
	let mut new_input = new_input.data;

	if !common_exprs.is_empty() {
	assert!(transformed);
	new_input = build_common_expr_project_plan(new_input, common_exprs)?;
	}

	/// 1. The rewritten expressions
	/// 1. The (potentially) rewritten expressions

Make CommonSubexprEliminate top-down like #11683

Make CommonSubexprEliminate top-down like #11683

Conversation

peter-toth commented Jul 27, 2024 • edited Loading

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

peter-toth commented Jul 27, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alamb commented Jul 29, 2024

alamb commented Jul 30, 2024

peter-toth commented Jul 30, 2024 • edited Loading

alamb commented Jul 30, 2024

alamb commented Aug 5, 2024

peter-toth commented Aug 6, 2024 • edited Loading

peter-toth commented Aug 6, 2024 • edited Loading

peter-toth commented Aug 6, 2024 • edited Loading

alamb commented Aug 6, 2024

alamb commented Aug 6, 2024

alamb commented Aug 7, 2024

peter-toth commented Aug 8, 2024 • edited Loading

alamb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alamb commented Aug 8, 2024

alamb commented Aug 9, 2024

Make `CommonSubexprEliminate` top-down like #11683

Make `CommonSubexprEliminate` top-down like #11683

peter-toth commented Jul 27, 2024 •

edited

Loading

peter-toth commented Jul 30, 2024 •

edited

Loading

peter-toth commented Aug 6, 2024 •

edited

Loading

peter-toth commented Aug 6, 2024 •

edited

Loading

peter-toth commented Aug 6, 2024 •

edited

Loading

peter-toth commented Aug 8, 2024 •

edited

Loading