-
Notifications
You must be signed in to change notification settings - Fork 308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(fulltext_index): integrate puffin manager with inverted index applier #4266
Conversation
…plier Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
Note Reviews pausedUse the following commands to manage reviews:
WalkthroughThe changes introduced revolve around enhancing the index management capabilities within the Mito engine, specifically by adding new configurations for staging, intermediate paths, and sizes. These updates include deprecating the Changes
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 14
Outside diff range and nitpick comments (14)
src/mito2/src/sst/index/creator/statistics.rs (2)
19-21
: Add documentation comments for type aliases.Adding documentation comments for
ByteCount
andRowCount
would improve code readability and maintainability./// Type alias for byte count. pub(crate) type ByteCount = u64; /// Type alias for row count. pub(crate) type RowCount = usize;
39-41
: Add documentation comments for new fields.Adding documentation comments for the
row_count
andbyte_count
fields would improve code readability and maintainability./// Number of rows in the index. row_count: RowCount, /// Number of bytes in the index. byte_count: ByteCount,src/mito2/src/sst/index/applier/builder/in_list.rs (1)
62-62
: Remove unused import.The import
PuffinManagerFactory
is used in the tests but not in the main implementation. If this is intentional, consider adding a comment to clarify its purpose.use crate::sst::index::puffin_manager::PuffinManagerFactory;src/mito2/src/sst/index/applier/builder/between.rs (1)
69-69
: Remove unused import.The import
PuffinManagerFactory
is used in the tests but not in the main implementation. If this is intentional, consider adding a comment to clarify its purpose.use crate::sst::index::puffin_manager::PuffinManagerFactory;src/mito2/src/sst/file_purger.rs (3)
Line range hint
108-141
: Fix typo in variable name.The variable name
puffin_mgr
should be consistent with the naming convention used in other files.- let puffin_mgr = PuffinManagerFactory::new(staging_path, 4096, None).await.unwrap(); + let puffin_manager = PuffinManagerFactory::new(staging_path, 4096, None).await.unwrap();
Line range hint
179-205
: Fix typo in variable name.The variable name
puffin_mgr
should be consistent with the naming convention used in other files.- let puffin_mgr = PuffinManagerFactory::new(staging_path, 4096, None).await.unwrap(); + let puffin_manager = PuffinManagerFactory::new(staging_path, 4096, None).await.unwrap();
123-141
: Fix typo in variable name across all test cases.The variable name
puffin_mgr
should be consistent with the naming convention used in other files.- let puffin_mgr = PuffinManagerFactory::new(staging_path, 4096, None).await.unwrap(); + let puffin_manager = PuffinManagerFactory::new(staging_path, 4096, None).await.unwrap();Also applies to: 180-205
src/mito2/src/sst/index/puffin_manager.rs (2)
1-13
: Add a module-level doc comment.Consider adding a module-level doc comment to explain the purpose and usage of this module.
//! This module defines the PuffinManagerFactory and related components for managing puffin files.
102-110
: Add a doc comment forObjectStorePuffinFileAccessor
.Consider adding a doc comment to explain the purpose of the
ObjectStorePuffinFileAccessor
struct./// A `PuffinFileAccessor` implementation that uses an object store as the underlying storage. pub(crate) struct ObjectStorePuffinFileAccessor { object_store: InstrumentedStore, }src/mito2/src/access_layer.rs (2)
44-45
: Add a doc comment for thepuffin_manager_factory
field.Consider adding a doc comment to explain the purpose of the
puffin_manager_factory
field./// Puffin manager factory for creating puffin managers. puffin_manager_factory: PuffinManagerFactory,
84-87
: Add a doc comment for thepuffin_manager_factory
method.Consider adding a doc comment to explain the purpose of the
puffin_manager_factory
method./// Returns the puffin manager factory. pub fn puffin_manager_factory(&self) -> &PuffinManagerFactory { &self.puffin_manager_factory }src/mito2/src/sst/index/applier.rs (3)
Line range hint
57-70
: Improve error handling fornew
method.The
new
method can be improved by providing more context in the error message.pub fn new( region_dir: String, region_id: RegionId, store: ObjectStore, file_cache: Option<FileCacheRef>, index_applier: Box<dyn IndexApplier>, puffin_manager_factory: PuffinManagerFactory, ) -> Self { INDEX_APPLY_MEMORY_USAGE.add(index_applier.memory_usage() as i64); Self { region_dir, region_id, store, file_cache, index_applier, puffin_manager_factory, } }
112-117
: Add a doc comment for thecached_blob_reader
method.Consider adding a doc comment to explain the purpose of the
cached_blob_reader
method./// Creates a blob reader from the cached index file. async fn cached_blob_reader(&self, file_id: FileId) -> Result<Option<BlobReader>> { let Some(file_cache) = &self.file_cache else { return Ok(None); }; ... }
139-151
: Add a doc comment for theremote_blob_reader
method.Consider adding a doc comment to explain the purpose of the
remote_blob_reader
method./// Creates a blob reader from the remote index file. async fn remote_blob_reader(&self, file_id: FileId) -> Result<BlobReader> { let puffin_manager = self.puffin_manager_factory.build(self.store.clone()); let file_path = location::index_file_path(&self.region_dir, file_id); puffin_manager .reader(&file_path) .await .context(PuffinBuildReaderSnafu)? .blob(INDEX_BLOB_TYPE) .await .context(PuffinReadBlobSnafu)? .reader() .await .context(PuffinBuildReaderSnafu) }
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (31)
- config/datanode.example.toml (2 hunks)
- config/standalone.example.toml (2 hunks)
- src/mito2/src/access_layer.rs (5 hunks)
- src/mito2/src/cache/file_cache.rs (1 hunks)
- src/mito2/src/compaction/compactor.rs (3 hunks)
- src/mito2/src/config.rs (7 hunks)
- src/mito2/src/error.rs (4 hunks)
- src/mito2/src/flush.rs (1 hunks)
- src/mito2/src/read/scan_region.rs (1 hunks)
- src/mito2/src/region/opener.rs (6 hunks)
- src/mito2/src/sst/file_purger.rs (5 hunks)
- src/mito2/src/sst/index.rs (1 hunks)
- src/mito2/src/sst/index/applier.rs (8 hunks)
- src/mito2/src/sst/index/applier/builder.rs (6 hunks)
- src/mito2/src/sst/index/applier/builder/between.rs (5 hunks)
- src/mito2/src/sst/index/applier/builder/comparison.rs (5 hunks)
- src/mito2/src/sst/index/applier/builder/eq_list.rs (7 hunks)
- src/mito2/src/sst/index/applier/builder/in_list.rs (5 hunks)
- src/mito2/src/sst/index/applier/builder/regex_match.rs (4 hunks)
- src/mito2/src/sst/index/creator.rs (7 hunks)
- src/mito2/src/sst/index/creator/statistics.rs (3 hunks)
- src/mito2/src/sst/index/puffin_manager.rs (1 hunks)
- src/mito2/src/test_util/scheduler_util.rs (2 hunks)
- src/mito2/src/worker.rs (8 hunks)
- src/mito2/src/worker/handle_catchup.rs (1 hunks)
- src/mito2/src/worker/handle_create.rs (1 hunks)
- src/mito2/src/worker/handle_open.rs (1 hunks)
- src/puffin/src/error.rs (3 hunks)
- src/puffin/src/puffin_manager/stager.rs (1 hunks)
- src/puffin/src/puffin_manager/stager/bounded_stager.rs (1 hunks)
- tests-integration/tests/http.rs (1 hunks)
Files skipped from review due to trivial changes (3)
- src/mito2/src/cache/file_cache.rs
- src/mito2/src/sst/index.rs
- src/puffin/src/puffin_manager/stager.rs
Additional comments not posted (32)
src/mito2/src/worker/handle_create.rs (1)
64-64
: Ensure proper initialization and error handling forpuffin_manager_factory
.The addition of
self.puffin_manager_factory.clone()
is appropriate. Ensure thatpuffin_manager_factory
is properly initialized and any potential errors during initialization are handled.src/mito2/src/worker/handle_catchup.rs (1)
57-57
: Ensure proper initialization and error handling forpuffin_manager_factory
.The addition of
self.puffin_manager_factory.clone()
is appropriate. Ensure thatpuffin_manager_factory
is properly initialized and any potential errors during initialization are handled.src/mito2/src/sst/index/applier/builder/regex_match.rs (1)
56-56
: EnsurePuffinManagerFactory
is correctly integrated in tests.The addition of
PuffinManagerFactory
in the test cases is appropriate. Ensure that the integration is correct and that the tests cover all necessary scenarios.src/mito2/src/test_util/scheduler_util.rs (2)
39-39
: EnsurePuffinManagerFactory
is correctly integrated.The addition of
PuffinManagerFactory
in theSchedulerEnv
struct is appropriate. Ensure that the integration is correct and that all necessary scenarios are covered.
59-63
: Ensure proper initialization and error handling forPuffinManagerFactory
.The addition of
PuffinManagerFactory::new
in thenew
function is appropriate. Ensure thatPuffinManagerFactory
is properly initialized and any potential errors during initialization are handled.src/mito2/src/worker/handle_open.rs (1)
96-96
: Ensure proper initialization and error handling forpuffin_manager_factory
.The addition of
self.puffin_manager_factory.clone()
is appropriate. Ensure thatpuffin_manager_factory
is properly initialized and any potential errors during initialization are handled.src/puffin/src/error.rs (1)
252-258
: Ensure proper usage of the newExternal
error variant.The new
External
error variant is added. Ensure that it is used correctly throughout the codebase and that appropriate tests are added to cover this new error case.src/mito2/src/access_layer.rs (1)
30-30
: EnsurePuffinManagerFactory
is used correctly.Ensure that the
PuffinManagerFactory
is correctly initialized and used in all relevant methods.Verification successful
Ensure
PuffinManagerFactory
is used correctly.The
PuffinManagerFactory
appears to be correctly initialized and used in theaccess_layer.rs
file.
- Initialization and usage of
PuffinManagerFactory
were found at:
src/mito2/src/access_layer.rs:30
src/mito2/src/access_layer.rs:31
src/mito2/src/access_layer.rs:32
src/mito2/src/access_layer.rs:33
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify that `PuffinManagerFactory` is correctly initialized and used. # Test: Search for the initialization and usage of `PuffinManagerFactory`. Expect: Correct initialization and usage. rg --type rust 'PuffinManagerFactory'Length of output: 7672
src/mito2/src/sst/index/applier.rs (1)
25-25
: EnsurePuffinManager
is used correctly.Ensure that the
PuffinManager
is correctly initialized and used in all relevant methods.src/mito2/src/sst/index/applier/builder/comparison.rs (1)
141-141
: EnsurePuffinManagerFactory
is used correctly.Ensure that the
PuffinManagerFactory
is correctly initialized and used in all relevant methods.src/mito2/src/sst/index/applier/builder/eq_list.rs (1)
131-131
: EnsurePuffinManagerFactory
is used correctly.Ensure that the
PuffinManagerFactory
is correctly initialized and used in all relevant methods.Verification successful
Ensure
PuffinManagerFactory
is used correctly.The
PuffinManagerFactory
is correctly initialized and used withineq_list.rs
using thenew_for_test_block
method.
src/mito2/src/sst/index/applier/builder/eq_list.rs
: Correct initialization and usage ofPuffinManagerFactory
.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify that `PuffinManagerFactory` is correctly initialized and used. # Test: Search for the initialization and usage of `PuffinManagerFactory`. Expect: Correct initialization and usage. rg --type rust 'PuffinManagerFactory'Length of output: 7672
Script:
#!/bin/bash # Description: Verify that `PuffinManagerFactory` is correctly initialized and used in `eq_list.rs`. # Test: Search for the initialization and usage of `PuffinManagerFactory` in `eq_list.rs`. Expect: Correct initialization and usage. rg 'PuffinManagerFactory' src/mito2/src/sst/index/applier/builder/eq_list.rsLength of output: 785
src/mito2/src/sst/index/applier/builder.rs (2)
63-64
: Verify the integration ofpuffin_manager_factory
.Ensure that the
puffin_manager_factory
parameter is correctly integrated and used in the builder initialization.Also applies to: 75-75, 84-84
111-111
: Verify the usage ofpuffin_manager_factory
inSstIndexApplier
initialization.Ensure that the
puffin_manager_factory
parameter is correctly used in theSstIndexApplier
initialization.config/datanode.example.toml (1)
397-407
: Verify the new configuration options for the Mito engine's index.Ensure that the new configuration options
staging_path
andstaging_size
are correctly integrated and documented.src/mito2/src/config.rs (2)
108-109
: Verify the integration and sanitization of the newindex
field.Ensure that the new
index
field inMitoConfig
is correctly integrated and sanitized.Also applies to: 140-140, 209-209
253-268
: Verify the sanitization method inIndexConfig
.Ensure that the sanitization method in
IndexConfig
correctly handles default values and deprecated fields.Also applies to: 270-278, 281-315
config/standalone.example.toml (1)
420-430
: Verify the new configuration options for the Mito engine's index.Ensure that the new configuration options
staging_path
andstaging_size
are correctly integrated and documented.src/mito2/src/compaction/compactor.rs (2)
97-109
: Verify the initialization and usage ofpuffin_manager_factory
.Ensure that the
puffin_manager_factory
parameter is correctly initialized and used in theopen_compaction_region
function.
276-276
: Verify the usage ofindex_write_buffer_size
inWriteOptions
.Ensure that the
index_write_buffer_size
parameter is correctly used in theWriteOptions
.src/mito2/src/region/opener.rs (3)
51-51
: Import ofPuffinManagerFactory
approved.The addition of this import statement is consistent with the changes described in the PR summary.
67-67
: Addition ofpuffin_manager_factory
field approved.The addition of this field to the
RegionOpener
struct is consistent with integrating thePuffinManagerFactory
.
82-82
: Updates to thenew
function approved.The changes to include
puffin_manager_factory
as a parameter and initialize it ensure proper integration.Also applies to: 95-95
src/mito2/src/sst/index/creator.rs (2)
335-335
: Import ofPuffinManagerFactory
approved.The addition of this import statement is consistent with the changes described in the PR summary.
Line range hint
407-446
: Updates to thebuild_applier_factory
function approved.The changes to include
PuffinManagerFactory
ensure proper integration.src/mito2/src/error.rs (2)
779-785
: Addition ofPuffinInitStager
error variant approved.The addition of this error variant to the
Error
enum is consistent with the changes described in the PR summary.
787-792
: Addition ofPuffinBuildReader
error variant approved.The addition of this error variant to the
Error
enum is consistent with the changes described in the PR summary.src/mito2/src/read/scan_region.rs (1)
346-346
: Ensure proper error handling forpuffin_manager_factory
cloning.The call to
self.access_layer.puffin_manager_factory().clone()
assumes that thepuffin_manager_factory
is always available. If there's a possibility that it might beNone
or encounter an error, consider adding appropriate error handling or checks.Please verify if
puffin_manager_factory
can beNone
or if it might encounter errors during cloning.tests-integration/tests/http.rs (1)
829-834
: Ensure new configuration options are documented and validated.The new configuration options for
region_engine.mito.index
(staging_path, staging_size, write_buffer_size, intermediate_path) should be documented and validated within the configuration management logic to prevent misconfigurations.src/mito2/src/flush.rs (1)
330-331
: Ensure the newindex_write_buffer_size
configuration is correctly integrated.The new
index_write_buffer_size
configuration should be validated and correctly utilized within theflush_memtables
function to ensure it does not cause any issues during the flush process.Verification successful
The new
index_write_buffer_size
configuration is correctly integrated and validated.The
index_write_buffer_size
is consistently set from the configuration, passed through various requests, and utilized by different components involved in the flush and compaction processes without any issues.
src/mito2/src/flush.rs
src/mito2/src/access_layer.rs
src/mito2/src/compaction/compactor.rs
src/mito2/src/cache/write_cache.rs
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the new `index_write_buffer_size` configuration is correctly integrated and validated. # Test: Search for the usage and validation of the `index_write_buffer_size` configuration in the codebase. rg --type rust 'index_write_buffer_size'Length of output: 790
Script:
#!/bin/bash # Extract relevant code snippets around the occurrences of `index_write_buffer_size` in the identified files. # Extract 10 lines before and after the occurrence in flush.rs echo "flush.rs:" rg --type rust -A 10 -B 10 'index_write_buffer_size' src/mito2/src/flush.rs # Extract 10 lines before and after the occurrence in access_layer.rs echo "access_layer.rs:" rg --type rust -A 10 -B 10 'index_write_buffer_size' src/mito2/src/access_layer.rs # Extract 10 lines before and after the occurrence in compactor.rs echo "compactor.rs:" rg --type rust -A 10 -B 10 'index_write_buffer_size' src/mito2/src/compaction/compactor.rs # Extract 10 lines before and after the occurrence in write_cache.rs echo "write_cache.rs:" rg --type rust -A 10 -B 10 'index_write_buffer_size' src/mito2/src/cache/write_cache.rsLength of output: 7685
src/mito2/src/worker.rs (3)
61-61
: Import addition looks good.The import of
PuffinManagerFactory
is necessary for the subsequent changes.
136-144
: Initialization ofPuffinManagerFactory
looks good.The initialization of
PuffinManagerFactory
and its integration into theWorkerGroup::start
function is appropriate. Ensure proper error handling is in place.
178-178
: Integration ofPuffinManagerFactory
inWorkerStarter
andRegionWorkerLoop
looks good.The integration of
PuffinManagerFactory
in theWorkerStarter
struct and its usage in theRegionWorkerLoop
is appropriate. Ensure proper initialization and error handling.Also applies to: 271-276, 307-307, 377-377, 604-605
@coderabbitai pause |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (1)
- config/config.md (2 hunks)
Additional comments not posted (2)
config/config.md (2)
121-124
: Ensure consistency and clarity in new configuration options.The new configuration options for
region_engine.mito.index
are consistent with existing configurations. However, ensure that descriptions are clear and concise.
130-130
: Deprecation Notice: Useregion_engine.mito.index.intermediate_path
instead.The
region_engine.mito.inverted_index.intermediate_path
option is deprecated. Ensure users are clearly directed to use the newregion_engine.mito.index.intermediate_path
option.
Actions performedReviews paused. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4266 +/- ##
==========================================
- Coverage 84.97% 84.68% -0.29%
==========================================
Files 1054 1058 +4
Lines 187160 187758 +598
==========================================
- Hits 159031 158996 -35
- Misses 28129 28762 +633 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll review the remaining 20 files later......
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
What about |
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
I hereby agree to the terms of the GreptimeDB CLA.
Refer to a related PR or issue link (optional)
#4246
What's changed and what's your intention?
Checklist
Summary by CodeRabbit
New Features
staging_path
,staging_size
, andintermediate_path
.region_engine.mito.inverted_index.intermediate_path
in favor ofregion_engine.mito.index.intermediate_path
.Bug Fixes
Refactor
PuffinManagerFactory
for better index management and organization.Documentation