-
Notifications
You must be signed in to change notification settings - Fork 308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(puffin): implement CachedPuffinWriter #4203
Conversation
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
WalkthroughThe changes involve significant updates across various files in the codebase, such as altering method return types from Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant PuffinFileWriter
participant Blob
participant CacheManager
User->>PuffinFileWriter: add_blob()
PuffinFileWriter-->>Blob: Write compressed_data
PuffinFileWriter-->>User: Return u64 (byte count)
User->>PuffinFileWriter: finish()
PuffinFileWriter-->>CacheManager: Cache directory
PuffinFileWriter-->>User: Return u64 (total bytes)
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files ignored due to path filters (1)
Cargo.lock
is excluded by!**/*.lock
Files selected for processing (16)
- src/mito2/src/sst/index.rs (1 hunks)
- src/mito2/src/sst/index/applier.rs (2 hunks)
- src/mito2/src/sst/index/creator.rs (3 hunks)
- src/mito2/src/sst/index/creator/statistics.rs (3 hunks)
- src/mito2/src/sst/parquet/writer.rs (1 hunks)
- src/puffin/Cargo.toml (2 hunks)
- src/puffin/src/blob_metadata.rs (1 hunks)
- src/puffin/src/error.rs (4 hunks)
- src/puffin/src/file_format/writer.rs (4 hunks)
- src/puffin/src/file_format/writer/file.rs (2 hunks)
- src/puffin/src/lib.rs (1 hunks)
- src/puffin/src/puffin_manager.rs (1 hunks)
- src/puffin/src/puffin_manager/cache_manager.rs (1 hunks)
- src/puffin/src/puffin_manager/cached_puffin_manager.rs (1 hunks)
- src/puffin/src/puffin_manager/cached_puffin_manager/writer.rs (1 hunks)
- src/puffin/src/tests.rs (2 hunks)
Files skipped from review due to trivial changes (2)
- src/mito2/src/sst/parquet/writer.rs
- src/puffin/src/lib.rs
Additional comments not posted (34)
src/puffin/Cargo.toml (3)
11-11
: Added dependency:async-compression
.Adding
async-compression
version 0.4.11 is consistent with the PR's focus on enhancing compression features. Ensure the chosen version is compatible with other project dependencies to avoid conflicts.
13-13
: Added dependency:async-walkdir
.The addition of
async-walkdir
version 2.0.0 supports asynchronous directory walking, which is likely used in the new caching functionalities. Similar to the previous comment, verify compatibility with existing dependencies.
26-26
: Dependency moved:uuid
.Moving
uuid
from[dev-dependencies]
to[dependencies]
suggests it's now used in the production code, not just during development. This change should be double-checked to ensure it's intended and thatuuid
is indeed utilized in the production paths.Verification successful
Dependency moved:
uuid
.The
uuid
dependency is indeed used in production code paths, such as insrc/servers/src/mysql/handler.rs
andsrc/puffin/src/puffin_manager/cached_puffin_manager/writer.rs
. Movinguuid
to[dependencies]
is justified based on its usage in the production code.
src/servers/src/mysql/handler.rs
src/puffin/src/puffin_manager/cached_puffin_manager/writer.rs
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the usage of `uuid` in production code paths. # Test: Search for `uuid` usage in non-test Rust files. Expect: At least one occurrence. rg --type rust --glob '!*_test.rs' 'uuid'Length of output: 8843
src/puffin/src/puffin_manager/cached_puffin_manager.rs (3)
15-15
: Module import for writer.The import of the
writer
module is necessary for utilizingCachedPuffinWriter
within this file. This is a straightforward change that supports the PR's functionality.
18-18
: Export ofCachedPuffinWriter
.Exporting
CachedPuffinWriter
makes it available for use elsewhere in the project. This is crucial for integrating the new writer functionality provided by this PR.
20-24
: Introduction of metadata structures (DirMetadata
andDirFileMetadata
).The new structures for directory and file metadata are well-defined and include serialization capabilities, which are essential for the caching functionality. Ensure that all necessary fields are included and appropriately documented.
Also applies to: 26-38
src/puffin/src/file_format/writer.rs (2)
22-22
: UpdatedBlob
structure with compression support.Adding
compression_codec
to theBlob
structure aligns with the PR's focus on enhanced compression capabilities. This change is well-integrated with the existing structure and should facilitate more efficient data handling.Also applies to: 34-37
52-52
: Updated return types for writer methods.Changing the return type of the
add_blob
andfinish
methods fromResult<()>
toResult<u64>
in both synchronous and asynchronous contexts is a significant improvement. This allows tracking the number of bytes written, which is crucial for managing storage efficiently. Ensure that all usages of these methods are updated to handle the new return type.Also applies to: 55-55, 68-68, 71-71
Verification successful
Updated return types for writer methods.
The return types for the
add_blob
andfinish
methods have been updated toResult<u64>
, allowing for tracking the number of bytes written. The provided context indicates that these changes have been reflected across the codebase and proper handling of the new return type has been implemented.
- Verified Usages:
src/puffin/src/file_format/writer.rs
src/puffin/src/file_format/writer/file.rs
src/puffin/src/puffin_manager/cached_puffin_manager/writer.rs
src/puffin/src/tests.rs
src/mito2/src/sst/index/applier.rs
src/mito2/src/sst/index/creator.rs
The usages in these files correctly handle the new return type by either unwrapping the result or properly propagating errors.
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify all usages of `add_blob` and `finish` methods handle the new return type. # Test: Search for usages of these methods in Rust files. Expect: Proper handling of `Result<u64>`. rg --type rust 'add_blob|finish'Length of output: 50113
Script:
#!/bin/bash # Description: Extract lines surrounding the instances of `add_blob` and `finish` methods to verify handling of the new return type. # Extract surrounding lines for `add_blob` rg --type rust 'add_blob' -C 3 # Extract surrounding lines for `finish` rg --type rust 'finish' -C 3Length of output: 208460
src/puffin/src/puffin_manager.rs (1)
15-16
: Introduction of new modules for cache management.The introduction of
cache_manager
andcached_puffin_manager
modules is crucial for the new caching functionality. This structural change supports the PR's objectives and enhances the modularity of the code.src/puffin/src/puffin_manager/cache_manager.rs (1)
24-28
: Comprehensive introduction of cache management functionalities.The introduction of
DirWriterProvider
,CacheManager
, and associated types and functions is well-executed. These changes are essential for managing the cache effectively and are aligned with the PR's objectives. Ensure that the implementations of these traits are robust and handle edge cases properly.Also applies to: 30-79
src/mito2/src/sst/index/creator/statistics.rs (3)
38-38
: Type conversion from usize to u64 for byte_countThe change in type from
usize
tou64
forbyte_count
is consistent with the need to handle larger data sizes, which could exceed the limits ofusize
on 32-bit systems.
66-66
: Updated return type for byte_count methodThe method
byte_count
now returnsu64
, aligning with the updated type of thebyte_count
field. This ensures type consistency across the interface and the data structure.
115-115
: Updated parameter type for inc_byte_count methodThe parameter type for
inc_byte_count
has been updated tou64
to match the field's type, ensuring consistency and avoiding potential type conversion issues or data loss on large values.src/puffin/src/file_format/writer/file.rs (4)
78-87
: Updated add_blob method to return u64 and handle compressed dataThe
add_blob
method now correctly returns the size of the written data asu64
, which is a necessary change to support large file operations. Additionally, handling of thecompressed_data
attribute of theBlob
struct is properly implemented, ensuring that data compression is accounted for during the write operation.
94-99
: Updated finish method to return total written bytes as u64The
finish
method now returns the total number of bytes written asu64
. This is an appropriate change for consistency with the return type ofadd_blob
and to support large data sizes.
109-120
: Asynchronous handling of add_blob with compressed dataThe asynchronous version of
add_blob
has been updated similarly to its synchronous counterpart, handling compressed data and returning the size asu64
. This ensures that the API is consistent and capable of handling large data operations asynchronously.
127-133
: Asynchronous finish method correctly handles footer and returns total bytesThe asynchronous
finish
method, like its synchronous version, now handles writing the footer and flushing the writer, returning the total bytes written asu64
. This change ensures that the method behaves consistently in both synchronous and asynchronous contexts.src/puffin/src/error.rs (5)
67-73
: New error variant for file opening issuesThe addition of the
Open
variant to handle file opening errors is a necessary improvement for robust error handling, especially given the file-intensive operations of the system.
75-81
: New error variant for metadata reading issuesThe
Metadata
error variant addresses potential issues during metadata retrieval, which is crucial for the system's operation where metadata plays a significant role.
83-89
: New error variant for directory walking issuesThe
WalkDirError
variant is crucial for robustly handling errors that may occur during directory traversal, which is a common operation in file management systems.
187-199
: New error variants for unsupported compression and duplicate blob writesThe addition of
UnsupportedCompression
andDuplicateBlob
error variants is appropriate, given the system's expanded functionality with compression and the need to handle specific blob-related errors.
Line range hint
213-230
: Updated ErrorExt implementation to include new error variantsThe
ErrorExt
implementation has been updated to include the new error variants, ensuring that all errors have appropriate status codes. This is crucial for consistent error handling and reporting across the system.src/puffin/src/puffin_manager/cached_puffin_manager/writer.rs (4)
36-49
: Struct definition for CachedPuffinWriterThe
CachedPuffinWriter
struct is well-defined with clear documentation on its purpose. It includes essential fields for managing file writing, such aspuffin_file_name
,cache_manager
, andpuffin_file_writer
, which are crucial for its functionality.
57-94
: Implementation of put_blob method with compression handlingThe
put_blob
method is implemented with comprehensive error checks and support for different compression codecs. This method's design ensures that blobs are written correctly and efficiently, handling potential duplicate blobs and unsupported compressions smoothly.
96-180
: Implementation of put_dir method with directory handlingThe
put_dir
method is robust, handling directory traversal and file writing with appropriate error checks and compression handling. The method effectively manages directory metadata and ensures all files within the directory are processed correctly.
187-190
: Finish method for finalizing file writesThe
finish
method inCachedPuffinWriter
correctly finalizes the file writing process, ensuring that all data is written and the file is properly closed. This is essential for maintaining data integrity.src/mito2/src/sst/index/applier.rs (2)
211-213
: Updated Blob struct to handle compressed dataThe update to the
Blob
struct to includecompressed_data
andcompression_codec
fields in the test setup is essential for testing the handling of different data types and compression settings. This ensures that the system can effectively manage various data scenarios.
264-266
: Handling of invalid blob types in testsThe test for handling invalid blob types is crucial for ensuring the system gracefully handles errors related to unsupported or incorrect blob types. This helps maintain robustness and reliability.
src/puffin/src/tests.rs (2)
192-195
: Approved: Updated test cases to reflect changes inBlob
struct.The test cases have been updated to use the new
compressed_data
andcompression_codec
fields in theBlob
struct, which aligns with the structural changes made across the project.Also applies to: 202-205
262-265
: Approved: Asynchronous test cases updated to reflect structural changes inBlob
struct.The asynchronous test cases have been correctly updated to include the
compressed_data
andcompression_codec
fields in theBlob
struct, maintaining consistency with the synchronous tests and the broader project updates.Also applies to: 273-276
src/puffin/src/blob_metadata.rs (1)
72-72
: Approved: Optimization by changing derive attributes inCompressionCodec
.Changing the derive attribute from
Clone
toCopy
for theCompressionCodec
enum is a good practice for small immutable data types like enums, as it allows more efficient value copying.src/mito2/src/sst/index.rs (1)
80-80
: Approved: Updated return type infinish
method to support larger data sizes.The change from
Option<usize>
toOption<u64>
in thefinish
method of theIndexer
struct supports handling larger data sizes, which is beneficial for scalability and data management in large-scale applications.src/mito2/src/sst/index/creator.rs (2)
57-57
: Type definition updated to accommodate larger values.Changing
ByteCount
fromusize
tou64
is a good practice for systems that handle large datasets or files, ensuring that the data type can accommodate larger values without overflow.
274-276
: EnhancedBlob
struct with optional compression support.The addition of the
compression_codec
field to theBlob
struct allows for optional compression, which can significantly reduce storage or transmission size when handling large data sets. Ensure that all parts of the system that instantiate or manipulateBlob
objects are updated to handle this new field correctly.Verification successful
Enhanced
Blob
struct with optional compression support.The addition of the
compression_codec
field to theBlob
struct has been correctly handled across the codebase. The field is properly included in all instances where theBlob
struct is instantiated, ensuring that the optional compression feature is supported.
- Files reviewed:
src/puffin/src/tests.rs
src/puffin/src/puffin_manager/cached_puffin_manager/writer.rs
src/mito2/src/sst/index/creator.rs
src/mito2/src/sst/index/applier.rs
All relevant usages of the
Blob
struct have been updated to handle the newcompression_codec
field appropriately.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify that all usages of `Blob` struct have been updated to handle the new `compression_codec` field. # Test: Search for the Blob struct usage across the codebase. Expect: Proper handling of the new field. rg --type rust $'Blob {'Length of output: 101
Script:
#!/bin/bash # Description: Verify that all usages of `Blob` struct have been updated to handle the new `compression_codec` field. # Test: Search for the Blob struct usage across the codebase. Expect: Proper handling of the new field. rg --type rust $'Blob \\{'Length of output: 1295
Script:
#!/bin/bash # Description: Extract relevant lines from files where `Blob` struct is instantiated to verify handling of `compression_codec` field. # Extract lines from src/puffin/src/tests.rs rg --type rust --context 5 'Blob \{' src/puffin/src/tests.rs # Extract lines from src/puffin/src/puffin_manager/cached_puffin_manager/writer.rs rg --type rust --context 5 'Blob \{' src/puffin/src/puffin_manager/cached_puffin_manager/writer.rs # Extract lines from src/mito2/src/sst/index/creator.rs rg --type rust --context 5 'Blob \{' src/mito2/src/sst/index/creator.rs # Extract lines from src/mito2/src/sst/index/applier.rs rg --type rust --context 5 'Blob \{' src/mito2/src/sst/index/applier.rsLength of output: 5108
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (3)
- src/puffin/src/puffin_manager.rs (2 hunks)
- src/puffin/src/puffin_manager/cache_manager.rs (1 hunks)
- src/puffin/src/puffin_manager/cached_puffin_manager/writer.rs (1 hunks)
Files skipped from review as they are similar to previous changes (3)
- src/puffin/src/puffin_manager.rs
- src/puffin/src/puffin_manager/cache_manager.rs
- src/puffin/src/puffin_manager/cached_puffin_manager/writer.rs
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4203 +/- ##
==========================================
- Coverage 85.05% 84.70% -0.35%
==========================================
Files 1031 1033 +2
Lines 181276 181442 +166
==========================================
- Hits 154176 153692 -484
- Misses 27100 27750 +650 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly LGTM
I hereby agree to the terms of the GreptimeDB CLA.
Refer to a related PR or issue link (optional)
#4193
What's changed and what's your intention?
Mainly implement
put_blob
andput_dir
.Checklist
Summary by CodeRabbit
New Features
Bug Fixes
Improvements
usize
tou64
, enhancing consistency.Blob
struct by introducing acompression_codec
field and renamingdata
tocompressed_data
.Dependency Updates
async-compression
andasync-walkdir
as dependencies.uuid
from development to regular dependencies.