Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(optimizer): divide logical optimizer into one for batch and one for streaming. #8192

Merged
merged 8 commits into from
Feb 28, 2023

Conversation

chenzl25
Copy link
Contributor

@chenzl25 chenzl25 commented Feb 27, 2023

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

  • Move logical optimization from PlanRoot to LogicalOptimizer.
  • Use static rules instead of creating new ones for queries every time.
  • Separate logical optimization into one for batch and one for streaming.

Checklist For Contributors

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • I have demonstrated that backward compatibility is not broken by breaking changes and created issues to track deprecated features to be removed in the future. (Please refer to the issue)
  • All checks passed in ./risedev check (or alias, ./risedev c)

Checklist For Reviewers

  • I have requested macro/micro-benchmarks as this PR can affect performance substantially, and the results are shown.

Documentation

  • My PR DOES NOT contain user-facing changes.
Click here for Documentation

Types of user-facing changes

Please keep the types that apply to your changes, and remove the others.

  • Installation and deployment
  • Connector (sources & sinks)
  • SQL commands, functions, and operators
  • RisingWave cluster configuration changes
  • Other (please specify in the release note below)

Release note

#8137

@codecov
Copy link

codecov bot commented Feb 27, 2023

Codecov Report

Merging #8192 (bfb4adf) into main (de37916) will decrease coverage by 0.02%.
The diff coverage is 90.00%.

@@            Coverage Diff             @@
##             main    #8192      +/-   ##
==========================================
- Coverage   71.69%   71.68%   -0.02%     
==========================================
  Files        1131     1132       +1     
  Lines      182332   182320      -12     
==========================================
- Hits       130723   130694      -29     
- Misses      51609    51626      +17     
Flag Coverage Δ
rust 71.68% <90.00%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/frontend/src/optimizer/logical_optimization.rs 89.56% <89.56%> (ø)
src/frontend/planner_test/src/lib.rs 74.94% <100.00%> (ø)
src/frontend/src/optimizer/heuristic_optimizer.rs 100.00% <100.00%> (ø)
src/frontend/src/optimizer/mod.rs 94.92% <100.00%> (-1.04%) ⬇️
...rontend/src/optimizer/rule/translate_apply_rule.rs 97.45% <100.00%> (+0.63%) ⬆️
src/meta/src/hummock/mock_hummock_meta_client.rs 64.94% <0.00%> (-0.52%) ⬇️
src/batch/src/task/task_execution.rs 51.62% <0.00%> (-0.51%) ⬇️
src/common/src/field_generator/mod.rs 75.00% <0.00%> (-0.50%) ⬇️
...frontend/src/scheduler/hummock_snapshot_manager.rs 60.00% <0.00%> (-0.48%) ⬇️
src/storage/src/hummock/compactor/mod.rs 80.19% <0.00%> (-0.39%) ⬇️
... and 5 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Member

@fuyufjh fuyufjh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PATL @st1page

src/frontend/src/optimizer/logical_optimization.rs Outdated Show resolved Hide resolved
src/frontend/src/optimizer/logical_optimization.rs Outdated Show resolved Hide resolved
Ok(plan)
}

pub fn gen_optimized_logical_plan_for_batch(mut plan: PlanRef) -> Result<PlanRef> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm. So it comes to another problem - lots of duplication code between the for_batch and for_stream. I am not sure whether it's better than before.

Copy link
Contributor

@jon-chuang jon-chuang Feb 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somewhat agree. I think it's easier to understand in a single block and branching on batch or stream. It makes it clearer which optimizations are shared, and which apply respectively only to batch or stream.

But this assumes that the general ordering of optimization passes are shared (so far, it's been true, but I can't tell if this PR also changes that)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somewhat agree. I think it's easier to understand in a single block and branching on batch or stream. It makes it clearer which optimizations are shared, and which apply respectively only to batch or stream.

But this assumes that the general ordering of optimization passes are shared (so far, it's been true, but I can't tell if this PR also changes that)

In the beginning, it sounds good to only have a single block and branching on batch or stream, however, It contains more and more branches gradually. At least for now, I can give about 5 optimizations that needed to be applied to the batch and stream separately. In addition, I hope this separation can help developers to think of whether their optimization can only be applied to the batch or the streaming, or both.

src/frontend/Cargo.toml Outdated Show resolved Hide resolved
Copy link
Contributor

@st1page st1page left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, We used to think the optimization for streaming and batch have a common phase and it save our test and coding for a while. But currently, we find that the difference between the two optimizations could be flexible. And after this pr change, we can plug in the different optimization for the two execution more flexibly.

Hmmm. So it comes to another problem - lots of duplication code between the for_batch and for_stream. I am not sure whether it's better than before.

I think to reuse the duplication code, we should extract the common and meaningful functions such as fn predicate_push_down(mut plan) ->Result<PlanRef>(in fact this piece is duplicated even in one specific execution's optimizer) or fn simple_push_down. The emphasis of the "meaningful" here is because we do not think the batch and streaming shared a common optimization logical in nature.

In summary, I think the changes are just like that we use the Combination to reuse the code instead of Inheritance here.

@chenzl25 chenzl25 added this pull request to the merge queue Feb 28, 2023
Merged via the queue into main with commit ac39f03 Feb 28, 2023
@chenzl25 chenzl25 deleted the dylan/divide_logical_optimizer_into_two branch February 28, 2023 05:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants