Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce ObjectStore trait to replace ObjectClient in mountpoint-s3 #592

Closed
wants to merge 3 commits into from

Conversation

passaro
Copy link
Contributor

@passaro passaro commented Nov 1, 2023

Description of change

Introduce a new ObjectStore trait that replaces ObjectClient in the mountpoint-s3 crate. In addition to most of ObjectClient methods, ObjectStore also declares a new prefetch method returning a PrefetchGetObject which allows callers to read the object content. PrefetchGetObject is where ObjectStore implementations can add object data caching.

This change also reworks the Prefetcher so that ObjectStore implementations can delegate prefetch to it. The main changes to Prefetcher are:

  • it is now generic on the ObjectPartStream (previously ObjectPartFeed), rather than using dynamic dispatch.
  • the logic to spawn a new task for each GetObject request and handle the object body parts returned was moved into ObjectPartStream.

Relevant issues: #255

Does this change impact existing behavior?

No changes in behavior.


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the Developer Certificate of Origin (DCO).

Introduce a new `ObjectStore` trait that replaces `ObjectClient` in the mountpoint-s3 crate. In addition to most of `ObjectClient` methods, `ObjectStore` also declares a new `prefetch` method returning a `PrefetchGetObject` which allows callers to read the object content. `PrefetchGetObject` is where `ObjectStore` implementations can add object data caching.

This change also reworks the `Prefetcher` so that `ObjectStore` implementations can delegate `prefetch` to it. The main changes to `Prefetcher` are:
* it is now generic on the `ObjectPartStream` (previously `ObjectPartFeed`), rather than using dynamic dispatch.
* the logic to spawn a new task for each `GetObject` request and handle the object body parts returned was moved into `ObjectPartStream`.

Signed-off-by: Alessandro Passaro <alexpax@amazon.co.uk>
@passaro passaro temporarily deployed to PR integration tests November 1, 2023 16:19 — with GitHub Actions Inactive
@passaro passaro temporarily deployed to PR integration tests November 1, 2023 16:19 — with GitHub Actions Inactive
@passaro passaro temporarily deployed to PR integration tests November 1, 2023 16:19 — with GitHub Actions Inactive
@passaro passaro temporarily deployed to PR integration tests November 1, 2023 16:19 — with GitHub Actions Inactive
@passaro
Copy link
Contributor Author

passaro commented Nov 1, 2023

Note that this PR supersedes the first part of #590, which I'll close shortly.

&mut self,
offset: u64,
length: usize,
) -> Result<ChecksummedBytes, PrefetchReadError<TaskError<Client>>> {
) -> ObjectClientResult<ChecksummedBytes, PrefetchReadError, Self::ClientError> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See discussion about error type here: #590 (comment)

/// Similar to [ObjectClient], but provides a [ObjectStore::prefetch] method instead
/// of [ObjectClient::get_object].
#[async_trait]
pub trait ObjectStore: Clone {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See discussion on extending ObjectClient: #590 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think extending ObjectClient, or otherwise exposing the ObjectClient trait directly, is still the right thing to do here.

Actually, thinking more broadly: right now this is a pretty heavyweight abstraction for what's essentially one method (prefetch) with two implementations (cached or not). Seems like we could have a narrower interface that just focuses on "where object bytes come from". For example, I don't see how we'd use this abstraction to handle cached writes in the future.

Basically: can we just start with a smaller abstraction and expand it once we need it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative approach could be to introduce a Prefetcher trait, roughly something like:

trait Prefetcher {
  type ClientError: std::error::Error + Send + Sync + 'static;
  type PrefetchGetObject: PrefetchGetObject<ClientError = Self::ClientError>;
  fn prefetch(&self, bucket: &str, key: &str, size: u64, etag: ETag) -> Self::PrefetchGetObject;
}

Comparing with ObjectStore:

Pros:

  • focus on "where object bytes come from"
  • no impact on write path, readdir, etc.

Cons:

  • the prefetcher type would still need to be propagated together with the client in many places, e.g.:
    struct S3Filesystem<Client, Prefetcher>.
  • If we wanted to introduce cached writes in the future, we would have to carry even more types, e.g.: struct S3Filesystem<Client, Prefetcher, Uploader>.
  • building a filesystem may become a bit complex and error prone, e.g.:
    let client = Arc::new(client); // Annoyingly, ObjectClient is not Clone
    let prefetcher = SomePrefetcher::new(client.clone());
    let filesystem = S3Filesystem::new(client, ...);
    Although maybe solvable it we had Prefetcher::prefetch take Client as an argument (or rather Arc<Client>).

On balance, I would still lean towards ObjectStore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another idea could be something like:

trait ObjectStore {
  ...
  type Client: ObjectClient + ...
  
  fn client(&self) -> &ObjectClient;

  fn prefetch(...) -> ...

  // in the future 
  fn write(...)
}

Not too sure if it would qualify as a smaller abstraction, but it would make it clear what is the relationship with ObjectClient.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think "too many type parameters" is much of a concern — it's annoying to get right one time, and then shouldn't really matter. I like the idea of having small, separate abstractions for this until we know for sure that we need a shared one (for e.g. reads and writes).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll start reworking with something like the Prefetcher above. When I'm done, shall I update this PR, or open a new one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened #595

/// Similar to [ObjectClient], but provides a [ObjectStore::prefetch] method instead
/// of [ObjectClient::get_object].
#[async_trait]
pub trait ObjectStore: Clone {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think extending ObjectClient, or otherwise exposing the ObjectClient trait directly, is still the right thing to do here.

Actually, thinking more broadly: right now this is a pretty heavyweight abstraction for what's essentially one method (prefetch) with two implementations (cached or not). Seems like we could have a narrower interface that just focuses on "where object bytes come from". For example, I don't see how we'd use this abstraction to handle cached writes in the future.

Basically: can we just start with a smaller abstraction and expand it once we need it?

mountpoint-s3/src/prefetch/part_queue.rs Outdated Show resolved Hide resolved
self.offset + self.size as u64
}

pub fn trim_start(&self, start_offset: u64) -> Self {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it intentional that start_offset > self.end() is valid and just silently becomes a zero-length range? (Similar thing for trim_end).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Rustdoc added to clarify.

Signed-off-by: Alessandro Passaro <alexpax@amazon.co.uk>
Signed-off-by: Alessandro Passaro <alexpax@amazon.co.uk>
@passaro passaro temporarily deployed to PR integration tests November 2, 2023 11:50 — with GitHub Actions Inactive
@passaro passaro temporarily deployed to PR integration tests November 2, 2023 11:50 — with GitHub Actions Inactive
@passaro passaro temporarily deployed to PR integration tests November 2, 2023 11:50 — with GitHub Actions Inactive
@passaro passaro temporarily deployed to PR integration tests November 2, 2023 11:50 — with GitHub Actions Inactive
@passaro
Copy link
Contributor Author

passaro commented Nov 3, 2023

Closing and replacing with #595

@passaro passaro closed this Nov 3, 2023
@passaro passaro deleted the object-store branch December 11, 2023 20:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants