Introduce new abstraction between the prefetcher and GetObject calls #552

passaro · 2023-10-11T13:29:48Z

Description of change

Preliminary work that will allow in the future to plug in a caching layer between the prefetching logic and the GetObject calls on the client.

This change introduce a ObjectPartFeed trait that will be used by the prefetcher to obtain the chunks of object data and to adjust the size of requests so that they are "optimally" aligned. The currently only implementation delegates retrieving object data to a GetObject call on the client and aligns to requests to part_size boundaries, replicating the logic previously in the prefetcher.

Relevant issues: #255

Does this change impact existing behavior?

No functional change.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the Developer Certificate of Origin (DCO).

passaro · 2023-10-11T13:33:25Z

mountpoint-s3/src/prefetch.rs

    config: PrefetcherConfig,
    runtime: Runtime,
 }

+impl<Client, Runtime> Debug for PrefetcherInner<Client, Runtime> {


derive(Debug) did not seem to work with part_feed: Arc<dyn ObjectPartFeed<Client>...>. Not completely sure why (wrt client: Arc<Client>) or whether there is a better solution.

It's because the Debug impl for part_feed: Arc<dyn ObjectPartFeed<Client>...> has to work for any ObjectPartFeed (because the concrete type isn't known), which can only be possible if the ObjectPartFeed trait itself guarantees Debug. On the other hand, the Arc<Client> version only has to work for the concrete Client type(s) that we instantiate it with, which all happen to implement Debug.

passaro · 2023-10-11T13:33:56Z

mountpoint-s3/src/prefetch.rs

@@ -45,8 +43,6 @@ pub struct PrefetcherConfig {
    pub sequential_prefetch_multiplier: usize,
    /// Timeout to wait for a part to become available
    pub read_timeout: Duration,
-    /// The size of the parts that the prefetcher is trying to align with
-    pub part_alignment: usize,


Now we retrieve this directly from the client

passaro · 2023-10-11T13:36:08Z

mountpoint-s3/src/prefetch/feed.rs

+            .unwrap_or_default()
+            .clamp(MIN_PART_SIZE, MAX_PART_SIZE);
+
+        let get_object_result = match self.client.get_object(bucket, key, Some(range), Some(if_match)).await {


This is moved from PrefetchGetObject::spawn_next_request

passaro · 2023-10-11T13:36:40Z

mountpoint-s3/src/prefetch/feed.rs

+    fn get_aligned_request_size(&self, offset: u64, preferred_length: usize) -> usize {
+        // If the request size is bigger than a part size we will try to align it to part boundaries.
+        let part_alignment = self.client.part_size().unwrap_or(8 * 1024 * 1024);
+        let offset_in_part = (offset % part_alignment as u64) as usize;


Moved from PrefetchGetObject::get_next_request_size

dannycjones

looking good

dannycjones · 2023-10-11T15:01:21Z

mountpoint-s3/src/prefetch.rs

+    // Tracks the size of read requests
+    max_read_request_size: usize,


Suggested change

// Tracks the size of read requests

max_read_request_size: usize,

/// Tracks the maximum size of read request submitted so far

max_read_request_size: usize,

jamesbornholt · 2023-10-13T15:49:48Z

mountpoint-s3/src/prefetch.rs

 struct PrefetcherInner<Client, Runtime> {
-    client: Arc<Client>,
+    part_feed: Arc<dyn ObjectPartFeed<Client> + Send + Sync>,


Any thoughts on whether this should be a generic type rather than dynamic dispatch? I guess we can get away with it here (unlike Client) because ObjectPartFeed is object-safe.

It is a pragmatic choice which we can revert later on. The downside of a new generic type here is that it would propagate up to the root S3FuseFilesystem and end up touching most of the code in this crate.
In fact, I had started working on an alternative option where I replaced ObjectClient with a new ObjectStore trait, with get_object_parts instead of get_object (and allowing for further divergence in the future), but that was a much larger change and would have slowed down work on the caching layer.
I still think something like that would be a preferable solution in the long term, but we can review the approach while we work on the cache or soon after.

jamesbornholt · 2023-10-13T15:52:12Z

mountpoint-s3/src/prefetch.rs

    config: PrefetcherConfig,
    runtime: Runtime,
 }

+impl<Client, Runtime> Debug for PrefetcherInner<Client, Runtime> {


It's because the Debug impl for part_feed: Arc<dyn ObjectPartFeed<Client>...> has to work for any ObjectPartFeed (because the concrete type isn't known), which can only be possible if the ObjectPartFeed trait itself guarantees Debug. On the other hand, the Arc<Client> version only has to work for the concrete Client type(s) that we instantiate it with, which all happen to implement Debug.

mountpoint-s3/src/prefetch.rs

Signed-off-by: Alessandro Passaro <alexpax@amazon.co.uk>

dannycjones · 2023-10-16T14:01:29Z

mountpoint-s3/src/prefetch/feed.rs

+/// [ObjectPartFeed] implementation which delegates retrieving object data to a [Client].
+#[derive(Debug)]
+pub struct ClientPartFeed<Client> {
+    client: Arc<Client>,
+}


Where do we plan to put the cache-based implementation of this? In this file also?

I'm wondering if there's a good way to break down this file.

The old test was hiding a bug because it used a hard coded part size of 8MB regardless of what the client used. awslabs#552 changed that and now this test runs out of memory a lot because it degrades to doing 1 byte requests. I don't think it's worth playing with the logic because it requires a weird config to get there, so just fix the test. Signed-off-by: James Bornholt <bornholt@amazon.com>

* Allow seeking forwards within the prefetch stream Right now we reset the prefetcher any time it seeks forwards, even if the distance it's seeking could be handled by inflight requests (in the worst case, the bytes are already in our buffers, and we just throw them away). That's expensive and slow! This change allows us to seek forwards a limited distance into the prefetch stream. When we see a seek of an acceptable distance, we fast-forward through the stream to the desired target offset, dropping the skipped bytes on the floor. We enforce a maximum seek distance, which is a trade-off between streaming a lot of unnecessary bytes versus an extra request's latency. I haven't put any careful thought into the number. This commit also sets us up to support backwards seeking, which will come in the future. Signed-off-by: James Bornholt <bornholt@amazon.com> * Allow seeking backwards within a prefetch stream Linux asynchronous readahead confuses our prefetcher by sometimes making the stream appear to go backwards, even though the customer is actually just reading sequentially (#488). The problem is that with parallel FUSE threads, the two asynchronous read operations can arrive to the prefetcher out of order. This change allows us to tolerate a little bit of backwards seeking in a prefetch stream. We keep around a little bit of previously read data and can reload it in the event that a seek goes backwards. We do this by creating a fake new request containing the rewound bytes, so that the existing read logic will pick them up. I chose an arbitrary max for the backwards seek buffer, big enough to handle Linux readahead. This should fix the readahead issue: in my testing, I no longer saw slow sequential reads, and the logs confirmed this seeking logic was being triggered in both directions (forwards and backwards), consistent with the readahead requests sometimes arriving out of order. Signed-off-by: James Bornholt <bornholt@amazon.com> * Fix Shuttle tests with new request size logic The old test was hiding a bug because it used a hard coded part size of 8MB regardless of what the client used. #552 changed that and now this test runs out of memory a lot because it degrades to doing 1 byte requests. I don't think it's worth playing with the logic because it requires a weird config to get there, so just fix the test. Signed-off-by: James Bornholt <bornholt@amazon.com> --------- Signed-off-by: James Bornholt <bornholt@amazon.com>

passaro temporarily deployed to PR integration tests October 11, 2023 13:29 — with GitHub Actions Inactive

passaro had a problem deploying to PR integration tests October 11, 2023 13:29 — with GitHub Actions Failure

passaro temporarily deployed to PR integration tests October 11, 2023 13:29 — with GitHub Actions Inactive

passaro commented Oct 11, 2023

View reviewed changes

passaro requested a review from dannycjones October 11, 2023 13:36

dannycjones reviewed Oct 11, 2023

View reviewed changes

passaro force-pushed the object-part-feed branch from 7b730cf to d6d950e Compare October 12, 2023 07:39

passaro temporarily deployed to PR integration tests October 12, 2023 07:39 — with GitHub Actions Inactive

passaro had a problem deploying to PR integration tests October 12, 2023 07:39 — with GitHub Actions Error

passaro temporarily deployed to PR integration tests October 12, 2023 07:39 — with GitHub Actions Inactive

passaro marked this pull request as ready for review October 12, 2023 10:28

dannycjones previously approved these changes Oct 12, 2023

View reviewed changes

passaro temporarily deployed to PR integration tests October 12, 2023 12:28 — with GitHub Actions Inactive

passaro had a problem deploying to PR integration tests October 12, 2023 12:28 — with GitHub Actions Error

passaro temporarily deployed to PR integration tests October 12, 2023 12:28 — with GitHub Actions Inactive

passaro temporarily deployed to PR integration tests October 12, 2023 14:39 — with GitHub Actions Inactive

passaro had a problem deploying to PR integration tests October 12, 2023 14:39 — with GitHub Actions Error

passaro temporarily deployed to PR integration tests October 12, 2023 14:39 — with GitHub Actions Inactive

jamesbornholt reviewed Oct 13, 2023

View reviewed changes

passaro temporarily deployed to PR integration tests October 16, 2023 11:53 — with GitHub Actions Inactive

Introduce new abstraction between the prefetcher and GetObject calls

47c225d

Signed-off-by: Alessandro Passaro <alexpax@amazon.co.uk>

passaro dismissed dannycjones’s stale review via 47c225d October 16, 2023 12:57

passaro force-pushed the object-part-feed branch from d6d950e to 47c225d Compare October 16, 2023 12:57

passaro temporarily deployed to PR integration tests October 16, 2023 12:57 — with GitHub Actions Inactive

dannycjones approved these changes Oct 16, 2023

View reviewed changes

passaro added this pull request to the merge queue Oct 16, 2023

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 16, 2023

jamesbornholt added this pull request to the merge queue Oct 16, 2023

github-merge-queue bot removed this pull request from the merge queue due to no response for status checks Oct 16, 2023

jamesbornholt added this pull request to the merge queue Oct 16, 2023

Merged via the queue into awslabs:main with commit 8490e8b Oct 16, 2023
18 checks passed

passaro deleted the object-part-feed branch October 26, 2023 11:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce new abstraction between the prefetcher and GetObject calls #552

Introduce new abstraction between the prefetcher and GetObject calls #552

passaro commented Oct 11, 2023

passaro Oct 11, 2023

jamesbornholt Oct 13, 2023

passaro Oct 11, 2023

passaro Oct 11, 2023

passaro Oct 11, 2023

dannycjones left a comment

dannycjones Oct 11, 2023

jamesbornholt Oct 13, 2023

passaro Oct 13, 2023

jamesbornholt Oct 13, 2023

dannycjones Oct 16, 2023

		// Tracks the size of read requests
		max_read_request_size: usize,

Introduce new abstraction between the prefetcher and GetObject calls #552

Introduce new abstraction between the prefetcher and GetObject calls #552

Conversation

passaro commented Oct 11, 2023

Description of change

Does this change impact existing behavior?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dannycjones left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment