-
Notifications
You must be signed in to change notification settings - Fork 308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(puffin): complete dir support #4240
Conversation
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
Note Reviews pausedUse the following commands to manage reviews:
WalkthroughThe changes to the Puffin repository revolve around enhancing the PuffinManager by introducing directory support and data compression mechanisms. This includes updating reader and writer structures, implementing a cache manager, and adding comprehensive test cases. These modifications ensure efficient file handling, caching, and compression capabilities within the Puffin system. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant CachedPuffinManager
participant Cache
participant FsPuffinWriter
participant FsPuffinReader
User->>CachedPuffinManager: Put directory
CachedPuffinManager->>FsPuffinWriter: Write data with compression
FsPuffinWriter->>Cache: Store compressed data
User->>CachedPuffinManager: Get directory
CachedPuffinManager->>Cache: Retrieve data
Cache->>FsPuffinReader: Fetch compressed data
FsPuffinReader->>cachedPuffinManager: Read data
CachedPuffinManager->>User: Return data
Assessment against linked issues
Poem
Tip Early access features: enabledWe are currently testing the following features in early access:
Note:
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files ignored due to path filters (1)
Cargo.lock
is excluded by!**/*.lock
Files selected for processing (11)
- src/puffin/Cargo.toml (1 hunks)
- src/puffin/src/blob_metadata.rs (2 hunks)
- src/puffin/src/file_format/reader/file.rs (1 hunks)
- src/puffin/src/file_format/writer/file.rs (4 hunks)
- src/puffin/src/puffin_manager.rs (2 hunks)
- src/puffin/src/puffin_manager/cache_manager.rs (1 hunks)
- src/puffin/src/puffin_manager/cache_manager/moka_cache_manager.rs (1 hunks)
- src/puffin/src/puffin_manager/cached_puffin_manager.rs (1 hunks)
- src/puffin/src/puffin_manager/cached_puffin_manager/reader.rs (1 hunks)
- src/puffin/src/puffin_manager/cached_puffin_manager/writer.rs (5 hunks)
- src/puffin/src/puffin_manager/tests.rs (1 hunks)
Files skipped from review due to trivial changes (4)
- src/puffin/Cargo.toml
- src/puffin/src/puffin_manager/cache_manager.rs
- src/puffin/src/puffin_manager/cache_manager/moka_cache_manager.rs
- src/puffin/src/puffin_manager/cached_puffin_manager/reader.rs
Additional comments not posted (10)
src/puffin/src/puffin_manager/cached_puffin_manager.rs (3)
31-37
: Struct definition looks good.The
CachedPuffinManager
struct is well-defined with appropriate fields.
39-49
: Constructor method looks good.The
new
method correctly initializes the struct fields.
51-77
: Trait implementation looks good.The
reader
andwriter
methods are well-defined and follow async conventions.src/puffin/src/puffin_manager.rs (1)
95-95
: Trait modification looks good.The addition of the
Unpin
trait to theBlobGuard
trait's associated typeReader
is appropriate for async operations.src/puffin/src/file_format/reader/file.rs (1)
Line range hint
73-76
: Document the removal of decompression support.The
blob_reader
method no longer supports decompression. Consider documenting this change to inform users.src/puffin/src/file_format/writer/file.rs (1)
60-67
: Method modification looks good.The addition of the
compression_codec
parameter to thecreate_blob_metadata
method is appropriate and well-implemented.src/puffin/src/puffin_manager/cached_puffin_manager/writer.rs (2)
173-201
: LGTM!The
handle_compress
function correctly handles different compression codecs and falls back to no compression if none is specified.
82-84
: LGTM! But verify the method usage in the codebase.The code changes are approved.
However, ensure that all function calls to
handle_compress
are correct and match the new signature.Verification successful
Let's examine the
handle_compress
function definition to confirm its parameters and ensure that the calls on lines 82 and 100 match the new signature.
Let's use
rg
to search for thehandle_compress
function definition and verify its parameters to ensure the calls match the new signature.
The method usage for
handle_compress
is correct.Both calls to
handle_compress
match the new function signature:
key.to_string()
,file_key.clone()
->key: String
raw_data
,reader
->raw_data: impl AsyncRead + Send
options.compression
->compression: Option<CompressionCodec>
No issues found with the method usage.
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify all method calls to `handle_compress` match the new signature. # Test: Search for the method usage. Expect: Only occurrences of the new signature. rg --type rust $'handle_compress'Length of output: 405
Script:
#!/bin/bash # Extract the `handle_compress` function definition to verify its parameters ast-grep --lang rust --pattern $'async fn handle_compress($_, $_, $_) { $$$ }'Length of output: 81
Script:
#!/bin/bash # Search for the `handle_compress` function definition to verify its parameters rg 'async fn handle_compress' src/puffin/src/puffin_manager/cached_puffin_manager/writer.rs -A 5Length of output: 279
src/puffin/src/blob_metadata.rs (1)
59-59
: LGTM! But verify the field usage in the codebase.The code changes are approved.
However, ensure that all instances of
compression_codec
are handled correctly in the codebase.src/puffin/src/puffin_manager/tests.rs (1)
1-361
: LGTM!The new tests are comprehensive and cover various scenarios for putting and getting files and directories with different compression codecs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 9
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (2)
- src/puffin/src/puffin_manager/cached_puffin_manager.rs (1 hunks)
- src/puffin/src/puffin_manager/tests.rs (1 hunks)
Files skipped from review as they are similar to previous changes (1)
- src/puffin/src/puffin_manager/cached_puffin_manager.rs
Additional comments not posted (1)
src/puffin/src/puffin_manager/tests.rs (1)
15-31
: Consider removing redundant imports.The import
futures::AsyncReadExt as _;
seems redundant astokio::io::AsyncReadExt as _;
is already imported.- use futures::AsyncReadExt as _;
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
456ae37
to
9ad076c
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4240 +/- ##
==========================================
- Coverage 84.84% 84.73% -0.11%
==========================================
Files 1049 1051 +2
Lines 186267 186627 +360
==========================================
+ Hits 158029 158134 +105
- Misses 28238 28493 +255 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 8
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (4)
- src/puffin/src/file_format/reader.rs (2 hunks)
- src/puffin/src/puffin_manager/cache_manager/moka_cache_manager.rs (2 hunks)
- src/puffin/src/puffin_manager/cached_puffin_manager/writer.rs (5 hunks)
- src/puffin/src/puffin_manager/tests.rs (1 hunks)
Files skipped from review due to trivial changes (1)
- src/puffin/src/file_format/reader.rs
Files skipped from review as they are similar to previous changes (2)
- src/puffin/src/puffin_manager/cache_manager/moka_cache_manager.rs
- src/puffin/src/puffin_manager/cached_puffin_manager/writer.rs
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 19
Outside diff range and nitpick comments (3)
src/puffin/src/puffin_manager/fs_puffin_manager/writer.rs (1)
Line range hint
67-84
: Handle potential errors input_blob
.The
put_blob
function should handle potential errors fromhandle_compress
instead of unwrapping.let written_bytes = self .handle_compress(key.to_string(), raw_data, options.compression) .await?;src/puffin/src/puffin_manager/fs_puffin_manager/reader.rs (2)
Line range hint
101-105
: Consider adding error handling forinit_blob_to_cache
.The function
init_blob_to_cache
should handle potential errors more gracefully and provide meaningful error messages.fn init_blob_to_cache( puffin_file_name: String, key: String, mut writer: BoxWriter, accessor: PuffinFileAccessorRef<AR, AW>, ) -> BoxFuture<'static, Result<u64>> { Box::pin(async move { let reader = accessor.reader(&puffin_file_name).await?; let mut file = PuffinFileReader::new(reader); let metadata = file.metadata().await?; let blob_metadata = metadata .blobs .iter() .find(|m| m.blob_type == key.as_str()) .context(BlobNotFoundSnafu { blob: key })?; let reader = file.blob_reader(blob_metadata)?; let compression = blob_metadata.compression_codec; let size = Self::handle_decompress(reader, &mut writer, compression).await?; Ok(size) }) }
Line range hint
137-146
: Consider adding error handling forinit_dir_to_cache
.The function
init_dir_to_cache
should handle potential errors more gracefully and provide meaningful error messages.fn init_dir_to_cache( puffin_file_name: String, key: String, writer_provider: DirWriterProviderRef, accessor: PuffinFileAccessorRef<AR, AW>, ) -> BoxFuture<'static, Result<u64>> { Box::pin(async move { let reader = accessor.reader(&puffin_file_name).await?; let mut file = PuffinFileReader::new(reader); let puffin_metadata = file.metadata().await?; let blob_metadata = puffin_metadata .blobs .iter() .find(|m| m.blob_type == key.as_str()) .context(BlobNotFoundSnafu { blob: key })?; let mut reader = file.blob_reader(blob_metadata)?; let mut buf = vec![]; reader.read_to_end(&mut buf).await.context(ReadSnafu)?; let dir_meta: DirMetadata = serde_json::from_slice(buf.as_slice()).context(DeserializeJsonSnafu)?; let mut size = 0; for file_meta in dir_meta.files { let blob_meta = puffin_metadata.blobs.get(file_meta.blob_index).context( BlobIndexOutOfBoundSnafu { index: file_meta.blob_index, max_index: puffin_metadata.blobs.len(), }, )?; ensure!( blob_meta.blob_type == file_meta.key, FileKeyNotMatchSnafu { expected: file_meta.key, actual: &blob_meta.blob_type, } ); let reader = file.blob_reader(blob_meta)?; let writer = writer_provider.writer(&file_meta.relative_path).await?; let compression = blob_meta.compression_codec; size += Self::handle_decompress(reader, writer, compression).await?; } Ok(size) }) }
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (5)
- src/puffin/src/puffin_manager.rs (2 hunks)
- src/puffin/src/puffin_manager/fs_puffin_manager.rs (1 hunks)
- src/puffin/src/puffin_manager/fs_puffin_manager/reader.rs (4 hunks)
- src/puffin/src/puffin_manager/fs_puffin_manager/writer.rs (7 hunks)
- src/puffin/src/puffin_manager/tests.rs (1 hunks)
Files skipped from review as they are similar to previous changes (1)
- src/puffin/src/puffin_manager.rs
Additional comments not posted (5)
src/puffin/src/puffin_manager/fs_puffin_manager.rs (1)
29-36
: Ensure fields are documented.The fields
cache_manager
andpuffin_file_accessor
should have more detailed documentation to explain their roles within theFsPuffinManager
.pub struct FsPuffinManager<B, D, AR, AW> { /// The cache manager for managing cached operations. cache_manager: CacheManagerRef<B, D>, /// The puffin file accessor for file operations. puffin_file_accessor: PuffinFileAccessorRef<AR, AW>, }src/puffin/src/puffin_manager/fs_puffin_manager/writer.rs (2)
Line range hint
36-50
: Ensure fields are documented.The fields
puffin_file_name
,cache_manager
,puffin_file_writer
, andblob_keys
should have more detailed documentation to explain their roles within theFsPuffinWriter
.pub struct FsPuffinWriter<B, D, W> { /// The name of the puffin file. puffin_file_name: String, /// The cache manager for managing cached operations. cache_manager: CacheManagerRef<B, D>, /// The underlying Puffin file writer. puffin_file_writer: PuffinFileWriter<W>, /// Set of written blob keys to avoid duplicates. blob_keys: HashSet<String>, }
127-131
: Ensure cross-platform compatibility.The code correctly handles path separators for different platforms. Ensure that this logic is thoroughly tested on both Windows and Unix-based systems.
let unified_rel_path = if cfg!(windows) { relative_path.to_string_lossy().replace('\\', "/") } else { relative_path.to_string_lossy().to_string() };src/puffin/src/puffin_manager/fs_puffin_manager/reader.rs (1)
Line range hint
33-44
: Ensure fields are documented.The fields
puffin_file_name
,cache_manager
, andpuffin_file_accessor
should have more detailed documentation to explain their roles within theFsPuffinReader
.pub struct FsPuffinReader<B, G, AR, AW> { /// The name of the puffin file. puffin_file_name: String, /// The cache manager for managing cached operations. cache_manager: CacheManagerRef<B, G>, /// The puffin file accessor for file operations. puffin_file_accessor: PuffinFileAccessorRef<AR, AW>, }src/puffin/src/puffin_manager/tests.rs (1)
34-40
: Handle potential errors fromcreate_temp_dir
andMokaCacheManager::new
.The function should handle potential errors instead of unwrapping them.
let cache_dir = create_temp_dir(prefix).expect("Failed to create temp directory"); Arc::new(MokaCacheManager::new(path, cache_size).await.expect("Failed to create MokaCacheManager")),Likely invalid or redundant comment.
https://github.com/GreptimeTeam/greptimedb/actions/runs/9755006232/job/26922900846?pr=4240 Is this related to #4243? @WenyXu
|
No, it seems something else. Please trigger test again, and I’ll check the failed action ASAP. |
@coderabbitai pause |
Actions performedReviews paused. |
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
* feat(puffin): implement CachedPuffinReader Signed-off-by: Zhenchi <zhongzc_arch@outlook.com> * chore: next PR to introduce CachedPuffinManager Signed-off-by: Zhenchi <zhongzc_arch@outlook.com> * chore: rename Signed-off-by: Zhenchi <zhongzc_arch@outlook.com> * feat(puffin): implement MokaCacheManager Signed-off-by: Zhenchi <zhongzc_arch@outlook.com> * chore: polish Signed-off-by: Zhenchi <zhongzc_arch@outlook.com> * fix: clippy Signed-off-by: Zhenchi <zhongzc_arch@outlook.com> * chore: +1s Signed-off-by: Zhenchi <zhongzc_arch@outlook.com> * feat(puffin): implement CachedPuffinManager and add tests Signed-off-by: Zhenchi <zhongzc_arch@outlook.com> * fix: corner case to get a blob Signed-off-by: Zhenchi <zhongzc_arch@outlook.com> * fix: keep dir in used Signed-off-by: Zhenchi <zhongzc_arch@outlook.com> * fix: add more tests Signed-off-by: Zhenchi <zhongzc_arch@outlook.com> * chore: add doc comments Signed-off-by: Zhenchi <zhongzc_arch@outlook.com> * fix: toml format Signed-off-by: Zhenchi <zhongzc_arch@outlook.com> * chore: rename unreleased_dirs Signed-off-by: Zhenchi <zhongzc_arch@outlook.com> * chore: refine some comments Signed-off-by: Zhenchi <zhongzc_arch@outlook.com> * fix: handle more cornor cases Signed-off-by: Zhenchi <zhongzc_arch@outlook.com> * chore: refine Signed-off-by: Zhenchi <zhongzc_arch@outlook.com> * refactor: simplify Signed-off-by: Zhenchi <zhongzc_arch@outlook.com> * chore: more explanation Signed-off-by: Zhenchi <zhongzc_arch@outlook.com> * fix: polish Signed-off-by: Zhenchi <zhongzc_arch@outlook.com> * chore: comment compressed Signed-off-by: Zhenchi <zhongzc_arch@outlook.com> * fix: fmt Signed-off-by: Zhenchi <zhongzc_arch@outlook.com> * chore: address comment Signed-off-by: Zhenchi <zhongzc_arch@outlook.com> * refactor: Cached* -> Fs* Signed-off-by: Zhenchi <zhongzc_arch@outlook.com> * refactor: CacheManager -> Stager Signed-off-by: Zhenchi <zhongzc_arch@outlook.com> * refactor: rename Puffin(A)sync* -> (A)sync* Signed-off-by: Zhenchi <zhongzc_arch@outlook.com> * fix: fmt Signed-off-by: Zhenchi <zhongzc_arch@outlook.com> --------- Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
I hereby agree to the terms of the GreptimeDB CLA.
Refer to a related PR or issue link (optional)
close #4193
What's changed and what's your intention?
as @coderabbitai
Checklist
Summary by CodeRabbit
New Features
CachedPuffinManager
for managing cached readers and writers.FsPuffinManager
for handling puffin data in the filesystem.CachedPuffinManager
with various compression codecs.Bug Fixes
Refactor
CachedPuffinReader
toFsPuffinReader
.CachedPuffinWriter
toFsPuffinWriter
.Documentation
Chores
DELETE_QUEUE_SIZE
constant.