Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new DataCache trait and InMemoryDataCache implementation #557

Merged
merged 8 commits into from
Oct 23, 2023

Conversation

dannycjones
Copy link
Contributor

Description of change

Part of #255, this contains some early thinking on what our trait should look like for the data cache itself. It also contains an in-memory implementation that can be used for test cases, without setting up a full data cache (which doesn't exist yet).

Relevant issues: #255

Does this change impact existing behavior?

No.


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the Developer Certificate of Origin (DCO).

Signed-off-by: Daniel Carl Jones <djonesoa@amazon.com>
@dannycjones dannycjones temporarily deployed to PR integration tests October 16, 2023 12:10 — with GitHub Actions Inactive
@dannycjones dannycjones temporarily deployed to PR integration tests October 16, 2023 12:10 — with GitHub Actions Inactive
@dannycjones dannycjones temporarily deployed to PR integration tests October 16, 2023 12:10 — with GitHub Actions Inactive
@dannycjones
Copy link
Contributor Author

There will be a small conflict with #556. I'll rebase this once that's merged.

/// Get block of data from the cache for the given [Key] and [BlockIndex], if available.
///
/// Operation may fail due to errors, or return [None] if the block was not available in the cache.
fn get_block(&self, cache_key: &Key, block_idx: BlockIndex) -> DataCacheResult<Option<&ChecksummedBytes>>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two thoughts here:

  • Does the client decide how block indexing works (and the implementation treats them as opaque), or do the implementation and client need to agree on how that works somehow?
  • If it's the former, can they just be part of the generic Key type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the client decide how block indexing works (and the implementation treats them as opaque), or do the implementation and client need to agree on how that works somehow?

I hadn't fully thought this through yet. I think for now we will move towards the implementation knowing about "blocks" and surfacing things like the cache's block size in the trait. Client will need to use that information to calculate things such as:

  • If I want to fetch some data, what ranges should I use for full blocks?
  • If I read from blocks, what do I need to trim at the start and end?

I'm hoping we can move that logic into the DataCache trait later, but for now I'm not sure exactly how it'll look like.

Copy link
Contributor

@passaro passaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good first stab at the trait. Left a few comments.

mountpoint-s3/src/prefetch/checksummed_bytes.rs Outdated Show resolved Hide resolved
mountpoint-s3/src/data_cache/mod.rs Outdated Show resolved Hide resolved
mountpoint-s3/src/data_cache/mod.rs Outdated Show resolved Hide resolved
mountpoint-s3/src/data_cache/mod.rs Outdated Show resolved Hide resolved
@dannycjones dannycjones temporarily deployed to PR integration tests October 16, 2023 16:46 — with GitHub Actions Inactive
@dannycjones dannycjones temporarily deployed to PR integration tests October 16, 2023 16:46 — with GitHub Actions Inactive
@dannycjones dannycjones temporarily deployed to PR integration tests October 16, 2023 16:46 — with GitHub Actions Inactive
@dannycjones dannycjones temporarily deployed to PR integration tests October 16, 2023 16:55 — with GitHub Actions Inactive
@dannycjones dannycjones temporarily deployed to PR integration tests October 16, 2023 16:55 — with GitHub Actions Inactive
@dannycjones dannycjones temporarily deployed to PR integration tests October 16, 2023 16:55 — with GitHub Actions Inactive
@dannycjones dannycjones temporarily deployed to PR integration tests October 16, 2023 16:55 — with GitHub Actions Inactive
… to caller

Signed-off-by: Daniel Carl Jones <djonesoa@amazon.com>
…rt_eq_checksummed_bytes macro

Signed-off-by: Daniel Carl Jones <djonesoa@amazon.com>
Signed-off-by: Daniel Carl Jones <djonesoa@amazon.com>
@dannycjones dannycjones requested a review from passaro October 17, 2023 16:51
@dannycjones dannycjones marked this pull request as ready for review October 17, 2023 16:51
Signed-off-by: Daniel Carl Jones <djonesoa@amazon.com>
Signed-off-by: Daniel Carl Jones <djonesoa@amazon.com>
@dannycjones dannycjones temporarily deployed to PR integration tests October 18, 2023 13:29 — with GitHub Actions Inactive
@dannycjones dannycjones temporarily deployed to PR integration tests October 18, 2023 13:29 — with GitHub Actions Inactive
@dannycjones dannycjones temporarily deployed to PR integration tests October 18, 2023 13:29 — with GitHub Actions Inactive
@dannycjones dannycjones temporarily deployed to PR integration tests October 18, 2023 13:29 — with GitHub Actions Inactive
Signed-off-by: Daniel Carl Jones <djonesoa@amazon.com>
Signed-off-by: Daniel Carl Jones <djonesoa@amazon.com>
@dannycjones dannycjones temporarily deployed to PR integration tests October 23, 2023 10:32 — with GitHub Actions Inactive
@dannycjones dannycjones temporarily deployed to PR integration tests October 23, 2023 10:32 — with GitHub Actions Inactive
@dannycjones dannycjones temporarily deployed to PR integration tests October 23, 2023 10:32 — with GitHub Actions Inactive
@dannycjones dannycjones temporarily deployed to PR integration tests October 23, 2023 10:32 — with GitHub Actions Inactive
@dannycjones dannycjones requested a review from passaro October 23, 2023 10:38
@dannycjones dannycjones enabled auto-merge October 23, 2023 10:59
@dannycjones dannycjones added this pull request to the merge queue Oct 23, 2023
Merged via the queue into awslabs:main with commit cb0d26b Oct 23, 2023
18 checks passed
@dannycjones dannycjones deleted the data-cache-in-memory branch October 23, 2023 11:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants