Introduce ObjectStore trait to replace ObjectClient in mountpoint-s3 #592

passaro · 2023-11-01T16:19:23Z

Description of change

Introduce a new ObjectStore trait that replaces ObjectClient in the mountpoint-s3 crate. In addition to most of ObjectClient methods, ObjectStore also declares a new prefetch method returning a PrefetchGetObject which allows callers to read the object content. PrefetchGetObject is where ObjectStore implementations can add object data caching.

This change also reworks the Prefetcher so that ObjectStore implementations can delegate prefetch to it. The main changes to Prefetcher are:

it is now generic on the ObjectPartStream (previously ObjectPartFeed), rather than using dynamic dispatch.
the logic to spawn a new task for each GetObject request and handle the object body parts returned was moved into ObjectPartStream.

Relevant issues: #255

Does this change impact existing behavior?

No changes in behavior.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the Developer Certificate of Origin (DCO).

Introduce a new `ObjectStore` trait that replaces `ObjectClient` in the mountpoint-s3 crate. In addition to most of `ObjectClient` methods, `ObjectStore` also declares a new `prefetch` method returning a `PrefetchGetObject` which allows callers to read the object content. `PrefetchGetObject` is where `ObjectStore` implementations can add object data caching. This change also reworks the `Prefetcher` so that `ObjectStore` implementations can delegate `prefetch` to it. The main changes to `Prefetcher` are: * it is now generic on the `ObjectPartStream` (previously `ObjectPartFeed`), rather than using dynamic dispatch. * the logic to spawn a new task for each `GetObject` request and handle the object body parts returned was moved into `ObjectPartStream`. Signed-off-by: Alessandro Passaro <alexpax@amazon.co.uk>

passaro · 2023-11-01T16:23:47Z

Note that this PR supersedes the first part of #590, which I'll close shortly.

passaro · 2023-11-01T16:24:50Z

mountpoint-s3/src/prefetch.rs

        &mut self,
        offset: u64,
        length: usize,
-    ) -> Result<ChecksummedBytes, PrefetchReadError<TaskError<Client>>> {
+    ) -> ObjectClientResult<ChecksummedBytes, PrefetchReadError, Self::ClientError> {


See discussion about error type here: #590 (comment)

passaro · 2023-11-01T16:25:44Z

mountpoint-s3/src/store.rs

+/// Similar to [ObjectClient], but provides a [ObjectStore::prefetch] method instead
+/// of [ObjectClient::get_object].
+#[async_trait]
+pub trait ObjectStore: Clone {


See discussion on extending ObjectClient: #590 (comment)

I think extending ObjectClient, or otherwise exposing the ObjectClient trait directly, is still the right thing to do here.

Actually, thinking more broadly: right now this is a pretty heavyweight abstraction for what's essentially one method (prefetch) with two implementations (cached or not). Seems like we could have a narrower interface that just focuses on "where object bytes come from". For example, I don't see how we'd use this abstraction to handle cached writes in the future.

Basically: can we just start with a smaller abstraction and expand it once we need it?

An alternative approach could be to introduce a Prefetcher trait, roughly something like:

trait Prefetcher { type ClientError: std::error::Error + Send + Sync + 'static; type PrefetchGetObject: PrefetchGetObject<ClientError = Self::ClientError>; fn prefetch(&self, bucket: &str, key: &str, size: u64, etag: ETag) -> Self::PrefetchGetObject; }

Comparing with ObjectStore:

Pros:

focus on "where object bytes come from"

no impact on write path, readdir, etc.

Cons:

the prefetcher type would still need to be propagated together with the client in many places, e.g.:
struct S3Filesystem<Client, Prefetcher>.

If we wanted to introduce cached writes in the future, we would have to carry even more types, e.g.: struct S3Filesystem<Client, Prefetcher, Uploader>.

building a filesystem may become a bit complex and error prone, e.g.:
let client = Arc::new(client); // Annoyingly, ObjectClient is not Clone let prefetcher = SomePrefetcher::new(client.clone()); let filesystem = S3Filesystem::new(client, ...);
Although maybe solvable it we had Prefetcher::prefetch take Client as an argument (or rather Arc<Client>).

On balance, I would still lean towards ObjectStore.

Another idea could be something like:

trait ObjectStore { ... type Client: ObjectClient + ... fn client(&self) -> &ObjectClient; fn prefetch(...) -> ... // in the future fn write(...) }

Not too sure if it would qualify as a smaller abstraction, but it would make it clear what is the relationship with ObjectClient.

I don't think "too many type parameters" is much of a concern — it's annoying to get right one time, and then shouldn't really matter. I like the idea of having small, separate abstractions for this until we know for sure that we need a shared one (for e.g. reads and writes).

I'll start reworking with something like the Prefetcher above. When I'm done, shall I update this PR, or open a new one?

Opened #595

jamesbornholt · 2023-11-02T01:47:04Z

mountpoint-s3/src/store.rs

+/// Similar to [ObjectClient], but provides a [ObjectStore::prefetch] method instead
+/// of [ObjectClient::get_object].
+#[async_trait]
+pub trait ObjectStore: Clone {


I think extending ObjectClient, or otherwise exposing the ObjectClient trait directly, is still the right thing to do here.

Actually, thinking more broadly: right now this is a pretty heavyweight abstraction for what's essentially one method (prefetch) with two implementations (cached or not). Seems like we could have a narrower interface that just focuses on "where object bytes come from". For example, I don't see how we'd use this abstraction to handle cached writes in the future.

Basically: can we just start with a smaller abstraction and expand it once we need it?

mountpoint-s3/src/prefetch/part_queue.rs

jamesbornholt · 2023-11-02T01:53:18Z

mountpoint-s3/src/prefetch/part_stream.rs

+        self.offset + self.size as u64
+    }
+
+    pub fn trim_start(&self, start_offset: u64) -> Self {


Is it intentional that start_offset > self.end() is valid and just silently becomes a zero-length range? (Similar thing for trim_end).

Yes. Rustdoc added to clarify.

Signed-off-by: Alessandro Passaro <alexpax@amazon.co.uk>

passaro · 2023-11-03T15:33:14Z

Closing and replacing with #595

passaro temporarily deployed to PR integration tests November 1, 2023 16:19 — with GitHub Actions Inactive

passaro commented Nov 1, 2023

View reviewed changes

passaro mentioned this pull request Nov 1, 2023

Add caching ObjectStore implementation #590

Closed

passaro requested a review from jamesbornholt November 1, 2023 17:08

jamesbornholt reviewed Nov 2, 2023

View reviewed changes

passaro added 2 commits November 2, 2023 11:27

Invert PrefetchReadError

e268669

Signed-off-by: Alessandro Passaro <alexpax@amazon.co.uk>

Clarify trim methods behavior

e49b8b1

Signed-off-by: Alessandro Passaro <alexpax@amazon.co.uk>

passaro temporarily deployed to PR integration tests November 2, 2023 11:50 — with GitHub Actions Inactive

passaro closed this Nov 3, 2023

passaro deleted the object-store branch December 11, 2023 20:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce ObjectStore trait to replace ObjectClient in mountpoint-s3 #592

Introduce ObjectStore trait to replace ObjectClient in mountpoint-s3 #592

passaro commented Nov 1, 2023

passaro commented Nov 1, 2023

passaro Nov 1, 2023

passaro Nov 1, 2023

jamesbornholt Nov 2, 2023

passaro Nov 2, 2023

passaro Nov 2, 2023

jamesbornholt Nov 2, 2023

passaro Nov 2, 2023

passaro Nov 3, 2023

jamesbornholt Nov 2, 2023

jamesbornholt Nov 2, 2023

passaro Nov 2, 2023

passaro commented Nov 3, 2023

Introduce ObjectStore trait to replace ObjectClient in mountpoint-s3 #592

Introduce ObjectStore trait to replace ObjectClient in mountpoint-s3 #592

Conversation

passaro commented Nov 1, 2023

Description of change

Does this change impact existing behavior?

passaro commented Nov 1, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

passaro commented Nov 3, 2023