-
Notifications
You must be signed in to change notification settings - Fork 574
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(batch): ensure BatchSeqScan
runs on compute node
#7240
Conversation
Exchange
so BatchSeqScan
runs on compute node
src/frontend/src/optimizer/mod.rs
Outdated
// We remark that since the `to_local_with_order_required` does not enforce single | ||
// distribution, we enforce at the root if needed. | ||
let insert_exchange = match plan.distribution() { | ||
Distribution::Single => Self::require_additional_exchange_on_root(plan.clone()), | ||
_ => true, | ||
}; | ||
if insert_exchange { | ||
plan = | ||
BatchExchange::new(plan, self.required_order.clone(), Distribution::Single).into() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic seems to be used to handle dml. I think we should keep it.
src/frontend/src/optimizer/mod.rs
Outdated
// Ensure there is exchange before all seq scan. | ||
plan = Self::enforce_exchange_above_table_scan(plan, &self.required_order); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO, the enforcement here is trying to fix some unexpected plans at the end, while I think it could be more proper to handle it at the to_local()
method of the BatchSeqScan
and BatchSource
. We can try to provide SomeShard
for the BatchSeqScan
in local mode, so that we can always enforce an exchange on the top of table scan and by the way we can utilize the optimization such as push filter/ project through the exchange.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO, the enforcement here is trying to fix some unexpected plans at the end, while I think it could be more proper to handle it at the to_local() method of the BatchSeqScan and BatchSource. >
Good suggestion, I think that makes more sense.
We can try to provide SomeShard for the BatchSeqScan in local mode, so that we can always enforce an exchange on the top of table scan
Hmm don't quite understand this. Is SomeShard
required to enforce an exchange on top of table scan? Why is that so?
by the way we can utilize the optimization such as push filter/ project through the exchange.
Yes I think this is good suggestion. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For local batch plans, almost all operators will require their input with Singleton
distribution, while some TableScan
's distribution actually is Singleton
. Our enforcement will think since the distribution is already satisfied, we can just skip placing an exchange operator here. But as we know, in local execution we need an exchange operator to keep table scan run in the CN. So I think we can hack the distribution for table scan when we call the to_local()
method and return a new clone table scan with SomeShard
distribution so that the enforcement of the exchange will always work because we require a Singleton
distribution while the table can can only provide the SomeShard
distribution now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3376277
to
41aa016
Compare
Codecov Report
@@ Coverage Diff @@
## main #7240 +/- ##
==========================================
+ Coverage 73.06% 73.08% +0.02%
==========================================
Files 1067 1067
Lines 170734 170760 +26
==========================================
+ Hits 124750 124806 +56
+ Misses 45984 45954 -30
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM, just need to fix the tests.
e2e_test/batch/join/issue_7115.slt
Outdated
create materialized view v as select count(*) cnt from t; | ||
|
||
statement ok | ||
SET QUERY_MODE TO local; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just rename the suffix of this file to .slt.part
, and it will run in both local mode and distributed mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't fixed for distributed_mode
yet unfortunately. That's why use this as workaround for now. Only local_mode
is fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh just got what you mean after thinking about it. Shall change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add some data to verify it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!!
Exchange
so BatchSeqScan
runs on compute nodeBatchSeqScan
runs on compute node
I hereby agree to the terms of the Singularity Data, Inc. Contributor License Agreement.
What's changed and what's your intention?
Batch Scan should run on compute node. This means all table scan should have exchange before, so they won't be in root node.
Local Execution Mode: Change
Distribution
toSomeShard
, such that anExchange
will be inserted when enforcing the distribution.Distributed Execution Mode: Ensure all table scans have exchange before, by fixing
require_addiitonal_exchange_on_root
.Documentation
If your pull request contains user-facing changes, please specify the types of the changes, and create a release note. Otherwise, please feel free to remove this section.
Types of user-facing changes
Please keep the types that apply to your changes, and remove those that do not apply.
Release note
Please create a release note for your changes. In the release note, focus on the impact on users, and mention the environment or conditions where the impact may occur.
Refer to a related PR or issue link (optional)
Closes #7115