-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(optimizer): split required and provided distribution property #2726
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea LGTM
Do we also need to add
|
|
@@ -1863,19 +1863,18 @@ | |||
BatchHashAgg { group_keys: [$0], aggs: [count, sum($1)] } | |||
BatchExchange { order: [], dist: HashShard([0]) } | |||
BatchProject { exprs: [Substr($0, 1:Int32, 2:Int32), $1] } | |||
BatchExchange { order: [], dist: AnyShard } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think maybe there is a bug. if our batch execution can handle exchange with AnyShard
? @liurenjie1024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently no.
Codecov Report
@@ Coverage Diff @@
## main #2726 +/- ##
=======================================
Coverage 72.47% 72.48%
=======================================
Files 697 697
Lines 90329 90358 +29
=======================================
+ Hits 65468 65498 +30
+ Misses 24861 24860 -1
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rest LGTM
@@ -40,8 +90,8 @@ impl Distribution { | |||
Distribution::Single => DistributionMode::Single, | |||
Distribution::Broadcast => DistributionMode::Broadcast, | |||
Distribution::HashShard(_) => DistributionMode::Hash, | |||
// TODO: Should panic if AnyShard or Any | |||
_ => DistributionMode::Single, | |||
// TODO: add round robin DistributionMode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When is it possible to have round-robin? 🤣
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To increase parallelism of processing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Round robin will cause a lot of problems. For example, in two-phase agg, it will cause row_count to be negative. I think long time ago we've decided to use hash distribution of the values for the local phase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it's okay to have it in batch, IMHO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it should be used carefully.
What's changed and what's your intention?
rethinking the distribution bugs(#2447 #1926 #1653) and strange code and think the current implementation of distribution property mix the required distribution and provided distribution up.
split required and provided distribution property.
see the comments in src/frontend/src/optimizer/property/distribution.rs for more information.
the planner test's diff is due to
RequiredDist::ShardByKey
with the order-irrelevance bit-set, the exchange shuffle key's order changesChecklist
Refer to a related PR or issue link (optional)