Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make CommonSubexprEliminate top-down like #11683

Merged
merged 4 commits into from
Aug 9, 2024

Conversation

peter-toth
Copy link
Contributor

@peter-toth peter-toth commented Jul 27, 2024

Which issue does this PR close?

Part of #11194.

Rationale for this change

This PR contains 2 ideas:

  1. In Stop copying LogicalPlan and Exprs in CommonSubexprEliminate (2-3% planning speed improvement) #10835 the CommonSubexprEliminate rule was converted to a ApplyOrder::TopDown rule, but also the self.rewrite() call was kept.

    The main problem with the top-down conversion is that in some cases CommonSubexprEliminate collects adjacent nodes (e.g. Window) into groups and once we eliminated subexpression among those groups we transform back the result into adjacent nodes. This kind of transformations can't be definied with a simple top-down optimizer rule. (Most likely this is why the rule handled recursion itself before Stop copying LogicalPlan and Exprs in CommonSubexprEliminate (2-3% planning speed improvement) #10835.)

    This means that CommonSubexprEliminate should not be a ApplyOrder::TopDown rule, but it should handle recursion itself. But if we reverse the top-down conversion then there is another issue:
    Improve CommonSubexprEliminate identifier management (10% faster planning) #10473 changed to_arrays() to return a boolean flag if it make sense to execute the 2nd rewriting traversal, that does the actual common expression extraction. E.g. if found_common is false rewrite_expr() is not executed:

    let mut expr_stats = ExprStats::new();
    let (found_common, id_arrays) =
    self.to_arrays(&expr, &mut expr_stats, ExprMask::Normal)?;
    if found_common {
    let rewritten = self.rewrite_expr(
    // Must clone as Identifiers use references to original expressions so we
    // have to keep the original expressions intact.
    vec![expr.clone()],
    vec![id_arrays],
    input,
    &expr_stats,
    config,
    )?;

    The problem is that calling self.rewrite() is currently in rewrite_expr():

    let new_input = self.rewrite(input, config)?;
    transformed |= new_input.transformed;
    let mut new_input = new_input.data;
    if !common_exprs.is_empty() {
    assert!(transformed);
    new_input = build_common_expr_project_plan(new_input, common_exprs)?;
    }

    (I.e. Improve CommonSubexprEliminate identifier management (10% faster planning) #10473 ruined the self recursion handling of the rule.)

    So what we need to do is:

    • Convert the rule back to handle recursion itself.
    • Move self.rewrite() call out of the found_common check.
  2. The current rule is not optimal as extracted common expressions are not sub-expression eliminated in the current rule exection. This is because the rule recurses into the child plan nodes with self.rewrite() and then adds the new projection from the extracted common expressions:

    The issue with this approach is that even the new projection can contain sub-expressions to eliminate.
    E.g. a plan like

    Projection: (test.a + test.b) * (test.a + test.b) AS c1, (test.a + test.b) * (test.a + test.b) AS c2
      ...
    

    can be rewritten to:

    Projection: __common_expr_1 AS c1, __common_expr_1 AS c2
      Projection: __common_expr_2 * __common_expr_2 AS __common_expr_1, test.a, test.b, test.c
        Projection: test.a + test.b AS __common_expr_2, test.a, test.b, test.c
          ...
    

    but the current rule requires 2 rule executions (optimizer cycles) to reach the final plan.
    This can be improved by swapping the order of adding the new project and calling self.rewrite().

What changes are included in this PR?

This PR:

  • Reverts apply_order() to return None.
  • Changes rewrite_expr() into find_common_exprs() to extract common sub-expressions and rewrite an expression list. The step of recursing into child plan nodes is moved out from this method. This way find_common_exprs() can safely leverage the boolean of to_arrays() to skip the 2nd traversal.
  • Refactors try_unary_plan(), try_optimize_aggregate() and try_optimize_window().

Are these changes tested?

Yes, added new UTs.

Are there any user-facing changes?

Yes, it fixes a possible performance regression.

@github-actions github-actions bot added the optimizer Optimizer rules label Jul 27, 2024
Some((common_aggr_exprs, mut aggr_list)) => {
let new_aggr_expr = aggr_list.pop().unwrap();

let mut agg_exprs = common_aggr_exprs
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.map_data(|(new_window_expr_list, new_input, window_expr_list)| {
// If there were common expressions extracted, then we need to make sure
// we restore the original column names.
// TODO: Although `find_common_exprs()` inserts aliases around extracted
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to fix the previous TODO (https://github.com/apache/datafusion/pull/11683/files#diff-351499880963d6a383c92e156e75019cd9ce33107724a9635853d7d4cd1898d0L563) but realized that preserving names is still required, so I added 2 TODOs where we have that logic.
I will try to get rid of them in a follow-up PR.

@peter-toth
Copy link
Contributor Author

cc @alamb

@@ -1963,6 +1944,52 @@ mod test {
Ok(())
}

#[test]
fn test_non_top_level_common_expression() -> Result<()> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New test for the possible perf regression (1.).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I verified without the code chages in this PR this test fails like this:


failed to optimize plan
thread 'common_subexpr_eliminate::test::test_non_top_level_common_expression' panicked at datafusion/optimizer/src/common_subexpr_eliminate.rs:1215:9:
failed to optimize plan
stack backtrace:
   0: rust_begin_unwind
             at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/std/src/panicking.rs:652:5
   1: core::panicking::panic_fmt
             at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/core/src/panicking.rs:72:14
   2: datafusion_optimizer::common_subexpr_eliminate::test::assert_optimized_plan_eq
             at ./src/common_subexpr_eliminate.rs:1215:9
   3: datafusion_optimizer::common_subexpr_eliminate::test::test_non_top_level_common_expression
             at ./src/common_subexpr_eliminate.rs:1984:9
   4: datafusion_optimizer::common_subexpr_eliminate::test::test_non_top_level_common_expression::{{closure}}
             at ./src/common_subexpr_eliminate.rs:1967:50
   5: core::ops::function::FnOnce::call_once
             at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/core/src/ops/function.rs:250:5
   6: core::ops::function::FnOnce::call_once
             at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/core/src/ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

}

#[test]
fn test_nested_common_expression() -> Result<()> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New test for the top-down improvement (2.).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I verified this test fails without the changes in this PR:


assertion `left == right` failed
  left: "Projection: __common_expr_1 AS c1, __common_expr_1 AS c2\n  Projection: __common_expr_2 * __common_expr_2 AS __common_expr_1, test.a, test.b, test.c\n    Projection: test.a + test.b AS __common_expr_2, test.a, test.b, test.c\n      TableScan: test"
 right: "Projection: __common_expr_1 AS c1, __common_expr_1 AS c2\n  Projection: (test.a + test.b) * (test.a + test.b) AS __common_expr_1, test.a, test.b, test.c\n    TableScan: test"

<Click to see difference>

thread 'common_subexpr_eliminate::test::test_nested_common_expression' panicked at datafusion/optimizer/src/common_subexpr_eliminate.rs:1218:9:
assertion `left == right` failed
  left: "Projection: __common_expr_1 AS c1, __common_expr_1 AS c2\n  Projection: __common_expr_2 * __common_expr_2 AS __common_expr_1, test.a, test.b, test.c\n    Projection: test.a + test.b AS __common_expr_2, test.a, test.b, test.c\n      TableScan: test"
 right: "Projection: __common_expr_1 AS c1, __common_expr_1 AS c2\n  Projection: (test.a + test.b) * (test.a + test.b) AS __common_expr_1, test.a, test.b, test.c\n    TableScan: test"
stack backtrace:
   0: rust_begin_unwind
             at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/std/src/panicking.rs:652:5
   1: core::panicking::panic_fmt
             at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/core/src/panicking.rs:72:14
   2: core::panicking::assert_failed_inner
   3: core::panicking::assert_failed
             at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/core/src/panicking.rs:363:5
   4: datafusion_optimizer::common_subexpr_eliminate::test::assert_optimized_plan_eq
             at ./src/common_subexpr_eliminate.rs:1218:9
   5: datafusion_optimizer::common_subexpr_eliminate::test::test_nested_common_expression
             at ./src/common_subexpr_eliminate.rs:2007:9
   6: datafusion_optimizer::common_subexpr_eliminate::test::test_nested_common_expression::{{closure}}
             at ./src/common_subexpr_eliminate.rs:1990:43
   7: core::ops::function::FnOnce::call_once
             at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/core/src/ops/function.rs:250:5
   8: core::ops::function::FnOnce::call_once
             at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/core/src/ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

Namely the output projection looks like

Projection: __common_expr_1 AS c1, __common_expr_1 AS c2
  Projection: (test.a + test.b) * (test.a + test.b) AS __common_expr_1, test.a, test.b, test.c
    TableScan: test

Rather than

Projection: __common_expr_1 AS c1, __common_expr_1 AS c2
  Projection: __common_expr_2 * __common_expr_2 AS __common_expr_1, test.a, test.b, test.c
    Projection: test.a + test.b AS __common_expr_2, test.a, test.b, test.c
      TableScan: test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes the 2nd is a better plan and after this PR we reach it with executing the rule only once. (Before this PR the optimizer had to execute the rule one more time to reach the 2nd plan from the 1st.)

@alamb
Copy link
Contributor

alamb commented Jul 29, 2024

Thank you @peter-toth -- this is on my review list

@alamb
Copy link
Contributor

alamb commented Jul 30, 2024

Interestingly, I ran the planning benchmarks and this branch actually seems to be ever so slightly slower. I'll rerun to see if I can reproduce the same results.

cargo bench --bench sql_planner
++ critcmp main make-cse-top-down-like
group                                         main                                   make-cse-top-down-like
-----                                         ----                                   ----------------------
logical_aggregate_with_join                   1.00  1135.6±20.84µs        ? ?/sec    1.01  1149.4±92.43µs        ? ?/sec
logical_plan_tpcds_all                        1.00    163.4±1.30ms        ? ?/sec    1.01    164.9±1.10ms        ? ?/sec
logical_plan_tpch_all                         1.00     17.7±0.18ms        ? ?/sec    1.01     17.9±0.21ms        ? ?/sec
logical_select_all_from_1000                  1.00     18.0±0.13ms        ? ?/sec    1.00     17.9±0.12ms        ? ?/sec
logical_select_one_from_700                   1.00   835.8±30.14µs        ? ?/sec    1.00    837.2±9.13µs        ? ?/sec
logical_trivial_join_high_numbered_columns    1.00   786.5±10.01µs        ? ?/sec    1.00   785.2±21.12µs        ? ?/sec
logical_trivial_join_low_numbered_columns     1.00    767.4±8.46µs        ? ?/sec    1.01   772.1±18.00µs        ? ?/sec
physical_plan_tpcds_all                       1.00   1124.5±5.74ms        ? ?/sec    1.04   1168.5±4.72ms        ? ?/sec
physical_plan_tpch_all                        1.00     72.9±0.62ms        ? ?/sec    1.03     75.2±1.28ms        ? ?/sec
physical_plan_tpch_q1                         1.00      2.6±0.02ms        ? ?/sec    1.05      2.7±0.02ms        ? ?/sec
physical_plan_tpch_q10                        1.00      3.6±0.03ms        ? ?/sec    1.04      3.7±0.03ms        ? ?/sec
physical_plan_tpch_q11                        1.00      3.1±0.02ms        ? ?/sec    1.02      3.2±0.04ms        ? ?/sec
physical_plan_tpch_q12                        1.00      2.5±0.02ms        ? ?/sec    1.04      2.6±0.03ms        ? ?/sec
physical_plan_tpch_q13                        1.00  1846.8±16.66µs        ? ?/sec    1.04  1912.5±60.02µs        ? ?/sec
physical_plan_tpch_q14                        1.00      2.2±0.02ms        ? ?/sec    1.03      2.2±0.02ms        ? ?/sec
physical_plan_tpch_q16                        1.00      3.1±0.06ms        ? ?/sec    1.04      3.2±0.03ms        ? ?/sec
physical_plan_tpch_q17                        1.00      2.8±0.02ms        ? ?/sec    1.02      2.9±0.03ms        ? ?/sec
physical_plan_tpch_q18                        1.00      3.3±0.02ms        ? ?/sec    1.03      3.4±0.03ms        ? ?/sec
physical_plan_tpch_q19                        1.00      4.9±0.04ms        ? ?/sec    1.03      5.1±0.06ms        ? ?/sec
physical_plan_tpch_q2                         1.00      6.3±0.06ms        ? ?/sec    1.02      6.4±0.23ms        ? ?/sec
physical_plan_tpch_q20                        1.00      3.7±0.03ms        ? ?/sec    1.03      3.8±0.06ms        ? ?/sec
physical_plan_tpch_q21                        1.00      5.0±0.05ms        ? ?/sec    1.02      5.1±0.05ms        ? ?/sec
physical_plan_tpch_q22                        1.00      2.8±0.02ms        ? ?/sec    1.03      2.9±0.04ms        ? ?/sec
physical_plan_tpch_q3                         1.00      2.6±0.02ms        ? ?/sec    1.03      2.7±0.03ms        ? ?/sec
physical_plan_tpch_q4                         1.00      2.0±0.02ms        ? ?/sec    1.02      2.0±0.02ms        ? ?/sec
physical_plan_tpch_q5                         1.00      3.8±0.04ms        ? ?/sec    1.03      3.9±0.05ms        ? ?/sec
physical_plan_tpch_q6                         1.00  1360.3±23.73µs        ? ?/sec    1.05  1430.4±38.00µs        ? ?/sec
physical_plan_tpch_q7                         1.00      4.7±0.06ms        ? ?/sec    1.01      4.8±0.05ms        ? ?/sec
physical_plan_tpch_q8                         1.00      5.8±0.06ms        ? ?/sec    1.02      5.9±0.07ms        ? ?/sec
physical_plan_tpch_q9                         1.00      4.4±0.03ms        ? ?/sec    1.02      4.5±0.04ms        ? ?/sec
physical_select_all_from_1000                 1.00     44.7±0.20ms        ? ?/sec    1.00     44.6±0.24ms        ? ?/sec
physical_select_one_from_700                  1.00      3.4±0.02ms        ? ?/sec    1.00      3.4±0.01ms        ? ?/sec

@peter-toth
Copy link
Contributor Author

peter-toth commented Jul 30, 2024

Interestingly, I ran the planning benchmarks and this branch actually seems to be ever so slightly slower. I'll rerun to see if I can reproduce the same results.

Yeah, this is basically the trade-off between the time we spend in planning and execution. These benchmarks only measure planning time and that might be a bit more with this PR. #10473 could cause regression due to no longer extracting common expressions from non top level nodes and so causing longer execution times in some cases. (This side effect was not intentional, I just didn't notice it with the test available at that time.) This PR restores the common expression elimination from non top level nodes and so fix the possible execution time regression, but it comes with a small cost in planning time.

BTW, the TPCH queries are not affected by the regression (there are no non top level nodes with common expressions in them) so this PR only makes their planning slower but doesn't improve their execution times. But I still think that the possible execution time regression that #10473 can cause needs to be fixed.

Update: I've ellaborated on 1. in the PR description.

@alamb
Copy link
Contributor

alamb commented Jul 30, 2024

I reran the planning benchmarks and I see the slowdown again

Details

group                                         main                                   make-cse-top-down-like
-----                                         ----                                   ----------------------
logical_aggregate_with_join                   1.00  1135.2±13.55µs        ? ?/sec    1.00  1134.3±12.87µs        ? ?/sec
logical_plan_tpcds_all                        1.00    164.2±1.33ms        ? ?/sec    1.00    163.8±1.33ms        ? ?/sec
logical_plan_tpch_all                         1.00     17.8±0.15ms        ? ?/sec    1.00     17.9±0.21ms        ? ?/sec
logical_select_all_from_1000                  1.00     18.0±0.16ms        ? ?/sec    1.01     18.2±0.89ms        ? ?/sec
logical_select_one_from_700                   1.00   840.5±20.94µs        ? ?/sec    1.00    837.6±9.76µs        ? ?/sec
logical_trivial_join_high_numbered_columns    1.00   785.9±17.46µs        ? ?/sec    1.01   793.5±38.70µs        ? ?/sec
logical_trivial_join_low_numbered_columns     1.00    774.2±8.94µs        ? ?/sec    1.00   775.4±13.41µs        ? ?/sec
physical_plan_tpcds_all                       1.00   1130.6±4.40ms        ? ?/sec    1.03   1166.8±6.32ms        ? ?/sec
physical_plan_tpch_all                        1.00     73.4±0.77ms        ? ?/sec    1.03     75.7±1.00ms        ? ?/sec
physical_plan_tpch_q1                         1.00      2.6±0.03ms        ? ?/sec    1.05      2.7±0.02ms        ? ?/sec
physical_plan_tpch_q10                        1.00      3.7±0.04ms        ? ?/sec    1.04      3.8±0.07ms        ? ?/sec
physical_plan_tpch_q11                        1.00      3.1±0.03ms        ? ?/sec    1.01      3.2±0.03ms        ? ?/sec
physical_plan_tpch_q12                        1.00      2.5±0.03ms        ? ?/sec    1.04      2.6±0.03ms        ? ?/sec
physical_plan_tpch_q13                        1.00  1865.1±22.15µs        ? ?/sec    1.03  1925.7±21.68µs        ? ?/sec
physical_plan_tpch_q14                        1.00      2.2±0.03ms        ? ?/sec    1.03      2.3±0.03ms        ? ?/sec
physical_plan_tpch_q16                        1.00      3.1±0.03ms        ? ?/sec    1.04      3.2±0.04ms        ? ?/sec
physical_plan_tpch_q17                        1.01      2.9±0.19ms        ? ?/sec    1.00      2.9±0.03ms        ? ?/sec
physical_plan_tpch_q18                        1.00      3.3±0.04ms        ? ?/sec    1.04      3.5±0.05ms        ? ?/sec
physical_plan_tpch_q19                        1.00      5.0±0.07ms        ? ?/sec    1.03      5.2±0.09ms        ? ?/sec
physical_plan_tpch_q2                         1.00      6.3±0.05ms        ? ?/sec    1.03      6.5±0.08ms        ? ?/sec
physical_plan_tpch_q20                        1.00      3.7±0.04ms        ? ?/sec    1.05      3.8±0.05ms        ? ?/sec
physical_plan_tpch_q21                        1.00      5.1±0.06ms        ? ?/sec    1.04      5.3±0.09ms        ? ?/sec
physical_plan_tpch_q22                        1.00      2.8±0.02ms        ? ?/sec    1.04      2.9±0.05ms        ? ?/sec
physical_plan_tpch_q3                         1.00      2.6±0.03ms        ? ?/sec    1.03      2.7±0.02ms        ? ?/sec
physical_plan_tpch_q4                         1.00      2.0±0.02ms        ? ?/sec    1.02      2.1±0.02ms        ? ?/sec
physical_plan_tpch_q5                         1.00      3.8±0.04ms        ? ?/sec    1.03      3.9±0.06ms        ? ?/sec
physical_plan_tpch_q6                         1.00  1373.4±33.44µs        ? ?/sec    1.04  1429.7±13.05µs        ? ?/sec
physical_plan_tpch_q7                         1.00      4.8±0.08ms        ? ?/sec    1.00      4.8±0.07ms        ? ?/sec
physical_plan_tpch_q8                         1.00      5.9±0.08ms        ? ?/sec    1.00      5.9±0.11ms        ? ?/sec
physical_plan_tpch_q9                         1.00      4.5±0.07ms        ? ?/sec    1.01      4.5±0.06ms        ? ?/sec
physical_select_all_from_1000                 1.00     44.6±0.26ms        ? ?/sec    1.01     44.9±0.23ms        ? ?/sec
physical_select_one_from_700                  1.00      3.4±0.03ms        ? ?/sec    1.00      3.4±0.02ms        ? ?/sec

@alamb
Copy link
Contributor

alamb commented Aug 5, 2024

I am sorry for the delay in reviewing this PR -- my big hesitation is that after all the work we have done to improve planning time this makes planning time worse

I understand that there is a tradeoff wher the plan execution time should decrease for other plans. However, there are no changes to existing tests, thus suggesting that this isn't a widely applicable optimziation

So what I am hoping to do (or maybe someone will beat me to it) is figure out how to have my cake and eat it too (aka optimize this code so it doesn't slow down planning but still makes better plans)

@peter-toth
Copy link
Contributor Author

peter-toth commented Aug 6, 2024

@alamb , I just realized that CommonSubexprEliminate is a ApplyOrder::TopDown rule: https://github.com/apache/datafusion/blob/main/datafusion/optimizer/src/common_subexpr_eliminate.rs#L690-L692

I have no idea how I missed that fact. Maybe because I was working on parts of the rule where it does call self.rewrite() explicitely (organizes recursion itself).
I don't get why this rule became TopDown in #10835 and if it is really TopDown then why the current code can't deal with test_non_top_level_common_expression. Let me convert this PR to draft and look into those...

@peter-toth peter-toth marked this pull request as draft August 6, 2024 08:40
@peter-toth
Copy link
Contributor Author

peter-toth commented Aug 6, 2024

I believe I understood what happened here.
In #10835 the CommonSubexprEliminate rule was converted to a ApplyOrder::TopDown rule, but also the self.rewrite() call was kept and caused a weird plan traversal where some of the nodes are traversed 2 times.

The main problem with the top-down conversion is that in some cases CommonSubexprEliminate collects adjacent nodes (e.g. Window) into groups and once we eliminated subexpression among those groups we transform back the result into adjacent nodes. This kind of transformations can't be definied with a simple top-down optimizer rule. Most likely this is why the rule handled recursion itself before #10835.

Then I added test_non_top_level_common_expression test in this PR and noticed that a non root project node was not CSEd. But I made a mistake as the unit tests in common_subexpr_eliminate.rs don't actually use an Optimizer (configured to use CommonSubexprEliminate) but just invoke CommonSubexprEliminate::rewrite() directly. So despite the CommonSubexprEliminate was top-down, the test couldn't succeed.

So I think the right fix is to:

  • Let the rule handle recursion (i.e. change apply_order() to return None) to avoid double recursion.
  • Change assert_optimized_plan_eq to use an Optimizer to be able to test cases like test_non_top_level_common_expression.

I've rebased the previous commit on the latest main and added a 2nd commit with the above 2 points.

I ran some benchmarks and this PR is now better than main:

% critcmp main make-cse-top-down-like
group                                         main                                    make-cse-top-down-like
-----                                         ----                                    ----------------------
logical_aggregate_with_join                   1.01   565.4±14.92µs        ? ?/sec     1.00    561.0±9.20µs        ? ?/sec
logical_plan_tpcds_all                        1.03     77.5±2.96ms        ? ?/sec     1.00     75.2±1.10ms        ? ?/sec
logical_plan_tpch_all                         1.00      7.7±0.09ms        ? ?/sec     1.00      7.7±0.08ms        ? ?/sec
logical_select_all_from_1000                  1.00     15.0±0.26ms        ? ?/sec     1.01     15.2±0.18ms        ? ?/sec
logical_select_one_from_700                   1.00   405.6±10.82µs        ? ?/sec     1.00   404.3±10.31µs        ? ?/sec
logical_trivial_join_high_numbered_columns    1.01    402.7±5.89µs        ? ?/sec     1.00    399.7±6.28µs        ? ?/sec
logical_trivial_join_low_numbered_columns     1.01    383.7±7.82µs        ? ?/sec     1.00    378.9±5.03µs        ? ?/sec
physical_plan_tpcds_all                       1.09  590.3±107.45ms        ? ?/sec     1.00    543.2±3.93ms        ? ?/sec
physical_plan_tpch_all                        1.01     34.1±0.47ms        ? ?/sec     1.00     33.9±0.58ms        ? ?/sec
physical_plan_tpch_q1                         1.00  1105.2±12.43µs        ? ?/sec     1.00  1110.3±23.70µs        ? ?/sec
physical_plan_tpch_q10                        1.01  1550.2±38.57µs        ? ?/sec     1.00  1537.9±19.03µs        ? ?/sec
physical_plan_tpch_q11                        1.01  1328.8±16.32µs        ? ?/sec     1.00  1313.9±17.56µs        ? ?/sec
physical_plan_tpch_q12                        1.01  1108.5±18.29µs        ? ?/sec     1.00  1096.9±16.36µs        ? ?/sec
physical_plan_tpch_q13                        1.00   755.8±10.98µs        ? ?/sec     1.00   752.3±10.62µs        ? ?/sec
physical_plan_tpch_q14                        1.04   942.4±73.52µs        ? ?/sec     1.00   908.6±11.28µs        ? ?/sec
physical_plan_tpch_q16                        1.01  1345.2±26.82µs        ? ?/sec     1.00  1334.1±34.09µs        ? ?/sec
physical_plan_tpch_q17                        1.01  1200.3±50.67µs        ? ?/sec     1.00  1188.0±22.75µs        ? ?/sec
physical_plan_tpch_q18                        1.02  1433.4±102.83µs        ? ?/sec    1.00  1402.0±15.42µs        ? ?/sec
physical_plan_tpch_q19                        1.02      2.6±0.20ms        ? ?/sec     1.00      2.6±0.17ms        ? ?/sec
physical_plan_tpch_q2                         1.00      2.9±0.04ms        ? ?/sec     1.00      2.9±0.04ms        ? ?/sec
physical_plan_tpch_q20                        1.04  1673.2±79.81µs        ? ?/sec     1.00  1616.4±20.52µs        ? ?/sec
physical_plan_tpch_q21                        1.04      2.4±0.10ms        ? ?/sec     1.00      2.3±0.03ms        ? ?/sec
physical_plan_tpch_q22                        1.02  1165.8±27.17µs        ? ?/sec     1.00  1147.1±35.94µs        ? ?/sec
physical_plan_tpch_q3                         1.00  1102.3±16.77µs        ? ?/sec     1.00  1099.9±12.63µs        ? ?/sec
physical_plan_tpch_q4                         1.00    846.3±8.77µs        ? ?/sec     1.00    844.0±9.90µs        ? ?/sec
physical_plan_tpch_q5                         1.01  1661.4±23.93µs        ? ?/sec     1.00  1641.8±19.28µs        ? ?/sec
physical_plan_tpch_q6                         1.00    584.7±9.64µs        ? ?/sec     1.00   583.2±20.63µs        ? ?/sec
physical_plan_tpch_q7                         1.00      2.1±0.03ms        ? ?/sec     1.00      2.1±0.21ms        ? ?/sec
physical_plan_tpch_q8                         1.01      2.7±0.04ms        ? ?/sec     1.00      2.6±0.04ms        ? ?/sec
physical_plan_tpch_q9                         1.01  1932.5±23.29µs        ? ?/sec     1.00  1906.6±21.08µs        ? ?/sec
physical_select_all_from_1000                 1.00     35.2±1.20ms        ? ?/sec     1.01     35.5±0.48ms        ? ?/sec
physical_select_one_from_700                  1.00  1781.3±59.97µs        ? ?/sec     1.01  1805.4±43.25µs        ? ?/sec

@peter-toth peter-toth marked this pull request as ready for review August 6, 2024 14:37
@peter-toth
Copy link
Contributor Author

peter-toth commented Aug 6, 2024

I've updated the PR description and PR is ready for review again.

@alamb
Copy link
Contributor

alamb commented Aug 6, 2024

Thanks @peter-toth

I also reran the benchmarks and no longer see any regression. I will get this PR reviewed carefully shortly (hopefully later today or tomorrow)

Details

++ critcmp main make-cse-top-down-like
group                                         main                                   make-cse-top-down-like
-----                                         ----                                   ----------------------
logical_aggregate_with_join                   1.00  1356.5±10.54µs        ? ?/sec    1.01  1366.6±61.02µs        ? ?/sec
logical_plan_tpcds_all                        1.00    186.6±1.00ms        ? ?/sec    1.00    187.3±1.09ms        ? ?/sec
logical_plan_tpch_all                         1.00     22.9±0.17ms        ? ?/sec    1.00     22.8±0.18ms        ? ?/sec
logical_select_all_from_1000                  1.00     18.0±0.11ms        ? ?/sec    1.00     18.0±0.11ms        ? ?/sec
logical_select_one_from_700                   1.00  1096.5±13.51µs        ? ?/sec    1.00  1093.5±12.52µs        ? ?/sec
logical_trivial_join_high_numbered_columns    1.02  1045.5±15.11µs        ? ?/sec    1.00  1029.4±14.83µs        ? ?/sec
logical_trivial_join_low_numbered_columns     1.01  1028.2±16.71µs        ? ?/sec    1.00  1018.4±12.42µs        ? ?/sec
physical_plan_tpcds_all                       1.00   1110.8±3.48ms        ? ?/sec    1.00   1110.1±3.55ms        ? ?/sec
physical_plan_tpch_all                        1.00     76.9±0.42ms        ? ?/sec    1.00     76.6±0.40ms        ? ?/sec
physical_plan_tpch_q1                         1.00      2.8±0.03ms        ? ?/sec    1.00      2.8±0.02ms        ? ?/sec
physical_plan_tpch_q10                        1.00      3.8±0.02ms        ? ?/sec    1.00      3.8±0.02ms        ? ?/sec
physical_plan_tpch_q11                        1.00      3.3±0.02ms        ? ?/sec    1.00      3.3±0.01ms        ? ?/sec
physical_plan_tpch_q12                        1.00      2.7±0.02ms        ? ?/sec    1.00      2.7±0.01ms        ? ?/sec
physical_plan_tpch_q13                        1.00      2.1±0.02ms        ? ?/sec    1.00      2.1±0.01ms        ? ?/sec
physical_plan_tpch_q14                        1.00      2.4±0.02ms        ? ?/sec    1.00      2.4±0.01ms        ? ?/sec
physical_plan_tpch_q16                        1.00      3.3±0.02ms        ? ?/sec    1.00      3.3±0.02ms        ? ?/sec
physical_plan_tpch_q17                        1.00      3.0±0.02ms        ? ?/sec    1.00      3.0±0.02ms        ? ?/sec
physical_plan_tpch_q18                        1.00      3.5±0.02ms        ? ?/sec    1.00      3.5±0.02ms        ? ?/sec
physical_plan_tpch_q19                        1.00      5.1±0.02ms        ? ?/sec    1.01      5.1±0.03ms        ? ?/sec
physical_plan_tpch_q2                         1.00      6.4±0.03ms        ? ?/sec    1.00      6.4±0.03ms        ? ?/sec
physical_plan_tpch_q20                        1.00      3.9±0.02ms        ? ?/sec    1.01      3.9±0.02ms        ? ?/sec
physical_plan_tpch_q21                        1.00      5.2±0.03ms        ? ?/sec    1.00      5.2±0.02ms        ? ?/sec
physical_plan_tpch_q22                        1.00      3.0±0.02ms        ? ?/sec    1.01      3.0±0.07ms        ? ?/sec
physical_plan_tpch_q3                         1.00      2.8±0.02ms        ? ?/sec    1.00      2.8±0.02ms        ? ?/sec
physical_plan_tpch_q4                         1.01      2.2±0.02ms        ? ?/sec    1.00      2.2±0.01ms        ? ?/sec
physical_plan_tpch_q5                         1.00      3.9±0.02ms        ? ?/sec    1.00      3.9±0.02ms        ? ?/sec
physical_plan_tpch_q6                         1.00  1577.6±12.49µs        ? ?/sec    1.01  1588.0±15.26µs        ? ?/sec
physical_plan_tpch_q7                         1.00      4.9±0.03ms        ? ?/sec    1.00      4.9±0.03ms        ? ?/sec
physical_plan_tpch_q8                         1.00      5.9±0.02ms        ? ?/sec    1.00      5.9±0.02ms        ? ?/sec
physical_plan_tpch_q9                         1.00      4.6±0.03ms        ? ?/sec    1.00      4.6±0.02ms        ? ?/sec
physical_select_all_from_1000                 1.00     43.6±0.11ms        ? ?/sec    1.00     43.5±0.19ms        ? ?/sec
physical_select_one_from_700                  1.00      3.5±0.02ms        ? ?/sec    1.00      3.6±0.02ms        ? ?/sec

@alamb
Copy link
Contributor

alamb commented Aug 6, 2024

CI failure seemed infra related https://github.com/apache/datafusion/actions/runs/10267531872/job/28408513630?pr=11683 so I restarted it on this PR

@alamb
Copy link
Contributor

alamb commented Aug 7, 2024

I am sorry -- something came up at work today and I am super behind on reviews. this is very high on my list for tomorow

@peter-toth
Copy link
Contributor Author

peter-toth commented Aug 8, 2024

I am sorry -- something came up at work today and I am super behind on reviews. this is very high on my list for tomorow

No worries @alamb, this PR can wait.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @peter-toth

I reviewed this PR quite carefully and I think it is very nice 👌

I had a suggestion on readability (use a named struct) in peter-toth#4 but we could also merge that as a follow on PR (or never)

@@ -688,7 +702,10 @@ impl OptimizerRule for CommonSubexprEliminate {
}

fn apply_order(&self) -> Option<ApplyOrder> {
Some(ApplyOrder::TopDown)
// This rule handles recursion itself in a `ApplyOrder::TopDown` like manner.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

fn rewrite_expr(
/// 2. An optional tuple that contains the extracted common sub-expressions and the
/// original `exprs_list`.
fn find_common_exprs(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very nice refactor / rename -- calling it find_common_exprs is 💯 and I think makes it much easier to follow

/// Rewrites the expression in `exprs_list` with common sub-expressions
/// replaced with a new column and adds a ProjectionExec on top of `input`
/// which computes any replaced common sub-expressions.
/// Extracts common sub-expressions and rewrites `exprs_list`.
///
/// Returns a tuple of:
/// 1. The rewritten expressions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// 1. The rewritten expressions
/// 1. The (potentially) rewritten expressions

Expr::Column(Column::from_name(expr_alias)).alias(out_name),
);
let input = unwrap_arc(input);
// Extract common sub-expressions from the aggregate and grouping expressions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this new structure is very nice

)?;
transformed |= rewrite_exprs.transformed;
expr_mask: ExprMask,
) -> Result<Transformed<(Vec<Vec<Expr>>, FindCommonExprResult)>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found reasoning about this return type (Vec<Vec<Expr>>, FindCommonExprResult) hard (I couldn't keep track in my head of the three potential fields that were returned)

I made a PR here with a proposal to move it into a named enum: peter-toth#4

(I can also make that a PR to main if we would prefer to merge this PR as is)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for PR, I've just merged it.

* Extract the result of find_common_exprs into a struct

* Make naming consistent
@alamb
Copy link
Contributor

alamb commented Aug 8, 2024

Thanks @peter-toth -- I plan to merge this PR tomorrow (after the #11476 RC is cut)

# Conflicts:
#	datafusion/optimizer/src/common_subexpr_eliminate.rs
@alamb alamb merged commit b5d7931 into apache:main Aug 9, 2024
24 checks passed
@alamb
Copy link
Contributor

alamb commented Aug 9, 2024

Thanks again @peter-toth -- the code continues to look nicer and nicer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
optimizer Optimizer rules
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants