-
Notifications
You must be signed in to change notification settings - Fork 171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce ObjectStore trait to replace ObjectClient in mountpoint-s3 #592
Conversation
Introduce a new `ObjectStore` trait that replaces `ObjectClient` in the mountpoint-s3 crate. In addition to most of `ObjectClient` methods, `ObjectStore` also declares a new `prefetch` method returning a `PrefetchGetObject` which allows callers to read the object content. `PrefetchGetObject` is where `ObjectStore` implementations can add object data caching. This change also reworks the `Prefetcher` so that `ObjectStore` implementations can delegate `prefetch` to it. The main changes to `Prefetcher` are: * it is now generic on the `ObjectPartStream` (previously `ObjectPartFeed`), rather than using dynamic dispatch. * the logic to spawn a new task for each `GetObject` request and handle the object body parts returned was moved into `ObjectPartStream`. Signed-off-by: Alessandro Passaro <alexpax@amazon.co.uk>
Note that this PR supersedes the first part of #590, which I'll close shortly. |
mountpoint-s3/src/prefetch.rs
Outdated
&mut self, | ||
offset: u64, | ||
length: usize, | ||
) -> Result<ChecksummedBytes, PrefetchReadError<TaskError<Client>>> { | ||
) -> ObjectClientResult<ChecksummedBytes, PrefetchReadError, Self::ClientError> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See discussion about error type here: #590 (comment)
/// Similar to [ObjectClient], but provides a [ObjectStore::prefetch] method instead | ||
/// of [ObjectClient::get_object]. | ||
#[async_trait] | ||
pub trait ObjectStore: Clone { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See discussion on extending ObjectClient
: #590 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think extending ObjectClient
, or otherwise exposing the ObjectClient
trait directly, is still the right thing to do here.
Actually, thinking more broadly: right now this is a pretty heavyweight abstraction for what's essentially one method (prefetch
) with two implementations (cached or not). Seems like we could have a narrower interface that just focuses on "where object bytes come from". For example, I don't see how we'd use this abstraction to handle cached writes in the future.
Basically: can we just start with a smaller abstraction and expand it once we need it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative approach could be to introduce a Prefetcher
trait, roughly something like:
trait Prefetcher {
type ClientError: std::error::Error + Send + Sync + 'static;
type PrefetchGetObject: PrefetchGetObject<ClientError = Self::ClientError>;
fn prefetch(&self, bucket: &str, key: &str, size: u64, etag: ETag) -> Self::PrefetchGetObject;
}
Comparing with ObjectStore
:
Pros:
- focus on "where object bytes come from"
- no impact on write path, readdir, etc.
Cons:
- the prefetcher type would still need to be propagated together with the client in many places, e.g.:
struct S3Filesystem<Client, Prefetcher>
. - If we wanted to introduce cached writes in the future, we would have to carry even more types, e.g.:
struct S3Filesystem<Client, Prefetcher, Uploader>
. - building a filesystem may become a bit complex and error prone, e.g.:
Although maybe solvable it we had
let client = Arc::new(client); // Annoyingly, ObjectClient is not Clone let prefetcher = SomePrefetcher::new(client.clone()); let filesystem = S3Filesystem::new(client, ...);
Prefetcher::prefetch
takeClient
as an argument (or ratherArc<Client>
).
On balance, I would still lean towards ObjectStore
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another idea could be something like:
trait ObjectStore {
...
type Client: ObjectClient + ...
fn client(&self) -> &ObjectClient;
fn prefetch(...) -> ...
// in the future
fn write(...)
}
Not too sure if it would qualify as a smaller abstraction, but it would make it clear what is the relationship with ObjectClient
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think "too many type parameters" is much of a concern — it's annoying to get right one time, and then shouldn't really matter. I like the idea of having small, separate abstractions for this until we know for sure that we need a shared one (for e.g. reads and writes).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll start reworking with something like the Prefetcher
above. When I'm done, shall I update this PR, or open a new one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened #595
/// Similar to [ObjectClient], but provides a [ObjectStore::prefetch] method instead | ||
/// of [ObjectClient::get_object]. | ||
#[async_trait] | ||
pub trait ObjectStore: Clone { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think extending ObjectClient
, or otherwise exposing the ObjectClient
trait directly, is still the right thing to do here.
Actually, thinking more broadly: right now this is a pretty heavyweight abstraction for what's essentially one method (prefetch
) with two implementations (cached or not). Seems like we could have a narrower interface that just focuses on "where object bytes come from". For example, I don't see how we'd use this abstraction to handle cached writes in the future.
Basically: can we just start with a smaller abstraction and expand it once we need it?
self.offset + self.size as u64 | ||
} | ||
|
||
pub fn trim_start(&self, start_offset: u64) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it intentional that start_offset > self.end()
is valid and just silently becomes a zero-length range? (Similar thing for trim_end
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Rustdoc added to clarify.
Signed-off-by: Alessandro Passaro <alexpax@amazon.co.uk>
Signed-off-by: Alessandro Passaro <alexpax@amazon.co.uk>
Closing and replacing with #595 |
Description of change
Introduce a new
ObjectStore
trait that replacesObjectClient
in the mountpoint-s3 crate. In addition to most ofObjectClient
methods,ObjectStore
also declares a newprefetch
method returning aPrefetchGetObject
which allows callers to read the object content.PrefetchGetObject
is whereObjectStore
implementations can add object data caching.This change also reworks the
Prefetcher
so thatObjectStore
implementations can delegateprefetch
to it. The main changes toPrefetcher
are:ObjectPartStream
(previouslyObjectPartFeed
), rather than using dynamic dispatch.GetObject
request and handle the object body parts returned was moved intoObjectPartStream
.Relevant issues: #255
Does this change impact existing behavior?
No changes in behavior.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the Developer Certificate of Origin (DCO).