-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ShuffleReaderExec now supports multiple locations per partition #541
Conversation
self.stream.size_hint() | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is moved to ballista-core utils
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a cool abstraction -- i have had need of something similar elsewhere -- perhaps it would be good to move to datafusion itself eventually
@edrevo fyi |
Box::pin(futures::stream::iter(result).flatten()), | ||
Arc::new(self.schema.as_ref().clone()), | ||
); | ||
Ok(Box::pin(result)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the main change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
let x = self.partition[partition].clone(); | ||
let result = future::join_all(x.into_iter().map(fetch_partition)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit; if you change fetch_partition
to take a refernce, you can avoid the .clone
:
let x = self.partition[partition].clone(); | |
let result = future::join_all(x.into_iter().map(fetch_partition)) | |
let partition_locations = &self.partition[partition]; | |
let result = future::join_all(partition_locations.iter().map(fetch_partition)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me (though I am not a ballista expert)
self.stream.size_hint() | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a cool abstraction -- i have had need of something similar elsewhere -- perhaps it would be good to move to datafusion itself eventually
.collect::<Result<Vec<_>>>()?; | ||
|
||
let result = WrappedStream::new( | ||
Box::pin(futures::stream::iter(result).flatten()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a clever way of flattening the streams (though I wonder if it will serialize them all (aka not start reading from the second until the first is entirely consumed)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it will serialize them all (aka not start reading from the second until the first is entirely consumed)?
Yes, that's exactly what it does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
I guess I figured I would point it out (that the different partitions wouldn't be producing in parallel)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once the basic shuffle mechanism is implemented, there will be a lot of optimization work to follow
Codecov Report
@@ Coverage Diff @@
## master #541 +/- ##
=======================================
Coverage 76.08% 76.09%
=======================================
Files 156 156
Lines 27035 27048 +13
=======================================
+ Hits 20570 20581 +11
- Misses 6465 6467 +2
Continue to review full report at Codecov.
|
Which issue does this PR close?
Closes #540 .
Rationale for this change
This is a step towards supporting true shuffle.
What changes are included in this PR?
WrappedStream
intoballista-core
crate and adds constructorShuffleReadExec
to acceptVec<Vec<PartitionLocation>>
instead ofVec<PartitionLocation>
Are there any user-facing changes?