-
Notifications
You must be signed in to change notification settings - Fork 308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(fulltext_index): integrate full-text indexer with parquet reader #4348
Conversation
WalkthroughThe changes introduce support for full-text indexing using Tantivy by adding modules and functionality for creating, searching, and testing full-text indexes. This includes new structs, traits, and methods for handling full-text indexes, integrating them into the existing SST writer, and updating error handling, metrics, and related indexing components accordingly. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Engine
participant FulltextIndexer
participant Tantivy
Client->>Engine: Request full-text search
Engine->>FulltextIndexer: Create searcher
FulltextIndexer->>Tantivy: Perform search
Tantivy-->>FulltextIndexer: Return search results
FulltextIndexer-->>Engine: Pass results
Engine-->>Client: Return search results
Assessment against linked issues
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
f631824
to
4d0f2b0
Compare
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
Sqlness failed
|
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
Please fix the clippy warning |
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Outside diff range, codebase verification and nitpick comments (2)
src/index/src/fulltext_index/search/tantivy.rs (1)
41-69
: Consider Logging or Collecting Cost to Open IndexOpening an index can be a costly operation. Consider logging the time taken or collecting metrics for this operation.
debug!( "Opened tantivy index on {:?} in {:?}", path.as_ref(), now.elapsed() );src/mito2/src/read/scan_region.rs (1)
Line range hint
336-376
:
Fix the typo in the method name.The method name
build_invereted_index_applier
has a typo. It should bebuild_inverted_index_applier
.- fn build_invereted_index_applier(&self) -> Option<InvertedIndexApplierRef> { + fn build_inverted_index_applier(&self) -> Option<InvertedIndexApplierRef> {
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files ignored due to path filters (1)
Cargo.lock
is excluded by!**/*.lock
Files selected for processing (29)
- src/index/src/fulltext_index.rs (1 hunks)
- src/index/src/fulltext_index/create.rs (1 hunks)
- src/index/src/fulltext_index/error.rs (2 hunks)
- src/index/src/fulltext_index/search.rs (1 hunks)
- src/index/src/fulltext_index/search/tantivy.rs (1 hunks)
- src/index/src/fulltext_index/tests.rs (1 hunks)
- src/mito2/Cargo.toml (2 hunks)
- src/mito2/src/engine.rs (1 hunks)
- src/mito2/src/error.rs (4 hunks)
- src/mito2/src/metrics.rs (1 hunks)
- src/mito2/src/read/scan_region.rs (13 hunks)
- src/mito2/src/sst/index.rs (1 hunks)
- src/mito2/src/sst/index/fulltext_index.rs (1 hunks)
- src/mito2/src/sst/index/fulltext_index/applier.rs (1 hunks)
- src/mito2/src/sst/index/fulltext_index/applier/builder.rs (1 hunks)
- src/mito2/src/sst/index/fulltext_index/creator.rs (4 hunks)
- src/mito2/src/sst/index/indexer/finish.rs (1 hunks)
- src/mito2/src/sst/index/inverted_index/applier.rs (7 hunks)
- src/mito2/src/sst/index/inverted_index/applier/builder.rs (5 hunks)
- src/mito2/src/sst/index/inverted_index/applier/builder/between.rs (6 hunks)
- src/mito2/src/sst/index/inverted_index/applier/builder/comparison.rs (5 hunks)
- src/mito2/src/sst/index/inverted_index/applier/builder/eq_list.rs (8 hunks)
- src/mito2/src/sst/index/inverted_index/applier/builder/in_list.rs (6 hunks)
- src/mito2/src/sst/index/inverted_index/applier/builder/regex_match.rs (5 hunks)
- src/mito2/src/sst/index/inverted_index/creator.rs (5 hunks)
- src/mito2/src/sst/index/puffin_manager.rs (1 hunks)
- src/mito2/src/sst/parquet/reader.rs (15 hunks)
- src/mito2/src/sst/parquet/row_selection.rs (3 hunks)
- src/puffin/src/puffin_manager/fs_puffin_manager/reader.rs (3 hunks)
Files skipped from review due to trivial changes (3)
- src/index/src/fulltext_index/create.rs
- src/mito2/src/sst/index.rs
- src/mito2/src/sst/index/indexer/finish.rs
Additional comments not posted (88)
src/mito2/src/sst/index/fulltext_index.rs (1)
15-15
: Addition ofapplier
module.The
applier
module has been added to thefulltext_index
module. Ensure that this new module is properly tested and documented.src/index/src/fulltext_index/search.rs (2)
26-30
: Definition ofFulltextIndexSearcher
trait.The
FulltextIndexSearcher
trait is defined with an async methodsearch
. Ensure that the method is properly documented and tested.
15-15
: Implementation ofFulltextIndexSearcher
trait intantivy
module.The implementation of the
FulltextIndexSearcher
trait in thetantivy
module looks correct. Ensure that the implementation is properly tested.src/index/src/fulltext_index.rs (1)
19-22
: Addition ofsearch
andtests
modules.The
search
andtests
modules have been added to thefulltext_index
module. Ensure that these new modules are properly tested and documented.src/index/src/fulltext_index/error.rs (2)
43-49
: Addition ofTantivyParser
error variant.The
TantivyParser
error variant has been added to theError
enum. Ensure that this new error variant is properly handled and documented.
71-72
: MappingTantivyParser
toStatusCode::InvalidSyntax
.The
TantivyParser
error variant is mapped toStatusCode::InvalidSyntax
. Ensure that this mapping is consistent with the rest of the error handling logic.src/mito2/Cargo.toml (2)
76-76
: Dependency Addition:common-function
The
common-function
dependency has been added. Ensure that this dependency is necessary for the dev environment and does not introduce any conflicts.
86-86
: Dependency Addition:session
The
session
dependency has been added. Ensure that this dependency is necessary for the dev environment and does not introduce any conflicts.src/index/src/fulltext_index/search/tantivy.rs (1)
72-84
: LGTM!The
search
method is correctly implemented and follows best practices.src/mito2/src/sst/index/fulltext_index/applier.rs (4)
34-47
: LGTM!The
FulltextIndexApplier
struct and its fields are correctly defined.
51-65
: LGTM!The
new
method is correctly implemented and follows best practices.
67-106
: LGTM!The
apply
method is correctly implemented and follows best practices.
108-128
: LGTM!The
index_dir_path
method is correctly implemented and follows best practices.src/mito2/src/sst/index/inverted_index/applier/builder/regex_match.rs (5)
22-22
: LGTM!The
collect_regex_match
method is correctly implemented and follows best practices.
62-62
: LGTM!The
test_regex_match_basic
test is correctly implemented and follows best practices.
91-91
: LGTM!The
test_regex_match_field_column
test is correctly implemented and follows best practices.
113-113
: LGTM!The
test_regex_match_type_mismatch
test is correctly implemented and follows best practices.
135-135
: LGTM!The
test_regex_match_type_nonexist_column
test is correctly implemented and follows best practices.src/index/src/fulltext_index/tests.rs (10)
23-36
: LGTM!The
create_index
function is well-implemented, initializing the full-text index creator and handling errors appropriately.
38-52
: LGTM!The
test_search
function is well-implemented, validating the search functionality with appropriate assertions.
54-67
: LGTM!The
test_simple_term
function is correctly implemented, testing a simple term search scenario.
69-79
: LGTM!The
test_negative_term
function is correctly implemented, testing a negative term search scenario.
81-95
: LGTM!The
test_must_term
function is correctly implemented, testing a must-term search scenario.
97-107
: LGTM!The
test_boolean_operators
function is correctly implemented, testing boolean operators in search queries.
109-122
: LGTM!The
test_phrase_term
function is correctly implemented, testing phrase term search scenarios.
124-137
: LGTM!The
test_config_english_analyzer_case_insensitive
function is correctly implemented, testing case-insensitive search with the English analyzer.
139-152
: LGTM!The
test_config_english_analyzer_case_sensitive
function is correctly implemented, testing case-sensitive search with the English analyzer.
154-167
: LGTM!The
test_config_chinese_analyzer
function is correctly implemented, testing search with the Chinese analyzer.src/mito2/src/sst/index/inverted_index/applier/builder/in_list.rs (6)
Line range hint
23-41
:
LGTM!The
collect_inlist
function is well-implemented, handlingIN
list expressions and adding them as predicates with appropriate error handling.
Line range hint
68-87
:
LGTM!The
test_collect_in_list_basic
function is correctly implemented, testing a basicIN
list scenario.
Line range hint
101-112
:
LGTM!The
test_collect_in_list_negated
function is correctly implemented, testing a negatedIN
list scenario.
Line range hint
126-137
:
LGTM!The
test_collect_in_list_field_column
function is correctly implemented, testing anIN
list scenario with a field column.
Line range hint
151-162
:
LGTM!The
test_collect_in_list_type_mismatch
function is correctly implemented, testing anIN
list scenario with a type mismatch.
Line range hint
178-189
:
LGTM!The
test_collect_in_list_nonexistent_column
function is correctly implemented, testing anIN
list scenario with a nonexistent column.src/mito2/src/sst/index/inverted_index/applier/builder/between.rs (6)
Line range hint
21-39
:
LGTM!The
collect_between
function is well-implemented, handlingBETWEEN
expressions and adding them as predicates with appropriate error handling.
Line range hint
75-92
:
LGTM!The
test_collect_between_basic
function is correctly implemented, testing a basicBETWEEN
scenario.
Line range hint
118-129
:
LGTM!The
test_collect_between_negated
function is correctly implemented, testing a negatedBETWEEN
scenario.
Line range hint
144-155
:
LGTM!The
test_collect_between_field_column
function is correctly implemented, testing aBETWEEN
scenario with a field column.
Line range hint
170-181
:
LGTM!The
test_collect_between_type_mismatch
function is correctly implemented, testing aBETWEEN
scenario with a type mismatch.
Line range hint
197-208
:
LGTM!The
test_collect_between_nonexistent_column
function is correctly implemented, testing aBETWEEN
scenario with a nonexistent column.src/mito2/src/sst/index/puffin_manager.rs (3)
42-44
: LGTM!The type definitions for
SstPuffinBlob
,SstPuffinDir
, andBlobReader
are correctly implemented, providing type aliases for the Puffin manager components.
Line range hint
49-79
:
LGTM!The
PuffinManagerFactory
implementation is well-implemented, providing methods for creating and building Puffin managers with appropriate error handling.
Line range hint
119-159
:
LGTM!The
test_puffin_manager_factory
function is correctly implemented, testing thePuffinManagerFactory
with various scenarios.src/mito2/src/sst/parquet/row_selection.rs (3)
30-49
: LGTM!The function
row_selection_from_row_ranges
is well-implemented and efficiently handles the conversion of row ranges into aRowSelection
.
55-79
: LGTM!The function
row_selection_from_sorted_row_ids
is well-implemented and efficiently handles the conversion of sorted row IDs into aRowSelection
.
82-91
: LGTM!The function
intersect_row_selections
is well-implemented and correctly handles the intersection of twoRowSelection
s.src/mito2/src/sst/index/fulltext_index/applier/builder.rs (1)
25-93
: LGTM!The
FulltextIndexApplierBuilder
struct and its implementation are well-constructed. Thebuild
method usesthen()
which is lazily evaluated, addressing the previous suggestion.src/mito2/src/sst/index/inverted_index/applier.rs (1)
Line range hint
39-175
:
LGTM!The
InvertedIndexApplier
struct and its implementation are well-constructed. Theapply
method correctly handles the application of predicates to SST files.src/puffin/src/puffin_manager/fs_puffin_manager/reader.rs (1)
Line range hint
1-202
:
LGTM!The
FsPuffinReader
struct and its implementation are well-constructed. Theinit_dir_to_stager
method efficiently handles the initialization of directories to the stager.src/mito2/src/sst/index/inverted_index/applier/builder/comparison.rs (1)
22-22
: RenamingSstIndexApplierBuilder
toInvertedIndexApplierBuilder
The renaming is consistent with the changes across the codebase to distinguish between different types of indexers.
src/mito2/src/sst/index/inverted_index/applier/builder/eq_list.rs (1)
25-25
: RenamingSstIndexApplierBuilder
toInvertedIndexApplierBuilder
The renaming is consistent with the changes across the codebase to distinguish between different types of indexers.
src/mito2/src/sst/index/inverted_index/applier/builder.rs (2)
40-40
: RenamingSstIndexApplierBuilder
toInvertedIndexApplierBuilder
The renaming is consistent with the changes across the codebase to distinguish between different types of indexers.
44-46
: Updating Documentation forInvertedIndexApplierBuilder
The documentation has been updated to reflect the renaming of
SstIndexApplierBuilder
toInvertedIndexApplierBuilder
.src/mito2/src/metrics.rs (2)
197-197
: ReplacingHistogram
withHistogramVec
The change improves the granularity of metrics tracking for index application times.
200-200
: Updating Labels forHistogramVec
The labels for
HistogramVec
have been updated to includeTYPE_LABEL
.src/mito2/src/sst/index/inverted_index/creator.rs (3)
54-55
: Struct Renaming ApprovedThe renaming of
SstIndexCreator
toInvertedIndexer
aligns with the new naming conventions and improves clarity.
78-78
: Implementation Approved with a SuggestionThe implementation of
InvertedIndexer
is comprehensive and aligns with the expected functionality.However, consider adding more detailed error handling in the
do_update
method to capture specific errors that might occur during the update process.
301-301
: Tests ApprovedThe tests for
InvertedIndexer
are comprehensive and cover a wide range of scenarios, ensuring the correctness of the indexer under different conditions.src/mito2/src/sst/index/fulltext_index/creator.rs (3)
41-42
: Struct Definition ApprovedThe
FulltextIndexer
struct definition is correct and includes relevant fields for managing the index creation process.
51-52
: Implementation ApprovedThe implementation of
FulltextIndexer
is comprehensive and aligns with the expected functionality.
296-296
: Tests ApprovedThe tests for
FulltextIndexer
are comprehensive and cover a wide range of scenarios, ensuring the correctness of the indexer under different conditions.src/mito2/src/engine.rs (1)
434-434
: Method Update ApprovedThe update to the
handle_query
method to includewith_ignore_fulltext_index
aligns with the new functionality to support fulltext indexing.src/mito2/src/error.rs (2)
825-830
: LGTM! But verify the usage of the new error variant.The addition of the
ApplyFulltextIndex
error variant is correct and consistent with other error variants.Ensure that the new error variant is correctly integrated and used in the codebase.
Verification successful
The new error variant
ApplyFulltextIndex
is correctly integrated and used in the codebase.
src/mito2/src/error.rs
src/mito2/src/sst/index/fulltext_index/applier.rs
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the usage of the new error variant `ApplyFulltextIndex`. # Test: Search for the usage of the new error variant. Expect: Usage in relevant files. rg --type rust 'ApplyFulltextIndex'Length of output: 570
559-560
: LGTM! But verify the usage of the renamed error variant.The renaming of the
ApplyIndex
error variant toApplyInvertedIndex
is correct and consistent with the new naming conventions.Ensure that all references to the old error variant name are updated in the codebase.
Verification successful
Renaming successfully verified.
The renaming of the
ApplyIndex
error variant toApplyInvertedIndex
has been correctly applied throughout the codebase.
ApplyInvertedIndex
is used in multiple places.- No occurrences of
ApplyIndex
were found.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the usage of the renamed error variant `ApplyInvertedIndex`. # Test: Search for the usage of the renamed error variant. Expect: No usage of the old name and correct usage of the new name. rg --type rust 'ApplyInvertedIndex' && ! rg --type rust 'ApplyIndex'Length of output: 530
src/mito2/src/read/scan_region.rs (11)
48-51
: Imports look good.The new imports for
FulltextIndexApplierBuilder
andFulltextIndexApplierRef
are necessary for the new functionality.
170-171
: Field addition looks good.The new field
ignore_fulltext_index
is necessary to control whether to ignore full-text indexing.
191-191
: Initialization looks good.The initialization of the new field
ignore_fulltext_index
in thenew
method is correct and necessary.
210-215
: Method addition looks good.The new method
with_ignore_fulltext_index
is necessary to set theignore_fulltext_index
field.
Line range hint
294-310
:
Inclusion of full-text index applier looks good.The inclusion of the full-text index applier in the scan input is correct and necessary for the new functionality.
378-391
: Method addition looks good.The new method
build_fulltext_index_applier
is necessary to build the full-text index applier.
438-440
: Field additions look good.The new fields
inverted_index_applier
andfulltext_index_applier
are necessary to store the index appliers.
467-468
: Initialization looks good.The initialization of the new fields
inverted_index_applier
andfulltext_index_applier
in thenew
method is correct and necessary.
526-535
: Method addition looks good.The new method
with_inverted_index_applier
is necessary to set theinverted_index_applier
field.
536-542
: Method addition looks good.The new method
with_fulltext_index_applier
is necessary to set thefulltext_index_applier
field.
618-619
: Inclusion of index appliers looks good.The inclusion of the new index appliers in the
prune_file_ranges
method is correct and necessary for the new functionality.src/mito2/src/sst/parquet/reader.rs (13)
18-18
: LGTM!The import of
Range
fromstd::ops
is correct and necessary for the code changes.
56-57
: LGTM!The import statements for
FulltextIndexApplierRef
andInvertedIndexApplierRef
are correct and necessary for the code changes.
62-64
: LGTM!The import statements for
row_selection_from_row_ranges
androw_selection_from_sorted_row_ids
are correct and necessary for the code changes.
85-87
: LGTM!The addition of
inverted_index_applier
andfulltext_index_applier
fields to theParquetReaderBuilder
struct is necessary for the integration of full-text indexing and inverted indexing with the Parquet reader.
109-110
: LGTM!The initialization of
inverted_index_applier
andfulltext_index_applier
fields in thenew
method ofParquetReaderBuilder
is correct and necessary for the code changes.
145-153
: LGTM!The addition of the
inverted_index_applier
method toParquetReaderBuilder
is necessary for attaching the inverted index applier to the builder.
155-161
: LGTM!The addition of the
fulltext_index_applier
method toParquetReaderBuilder
is necessary for attaching the fulltext index applier to the builder.
374-422
: LGTM!The addition of the
prune_row_groups_by_fulltext_index
method toParquetReaderBuilder
is necessary for pruning row groups by fulltext index. The method includes proper error handling and metrics collection.
Line range hint
432-493
: LGTM!The new logic in the
prune_row_groups_by_inverted_index
method to handle inverted index pruning is necessary. The method includes proper error handling and metrics collection.
Line range hint
701-712
: LGTM!The addition of
num_row_groups_fulltext_index_filtered
andnum_rows_in_row_group_fulltext_index_filtered
fields toReaderMetrics
is necessary for tracking metrics related to fulltext index filtering.
Line range hint
986-1003
: LGTM!The new metrics reporting for fulltext index filtering in the
Drop
implementation ofParquetReader
is necessary for tracking the performance of fulltext index filtering.
1147-1435
: LGTM!The addition of new test cases to the
tests
module for pruning row groups by rows and ranges is necessary for ensuring the correctness of the pruning logic.
337-372
: Verify handling of empty output after pruning.The new logic in the
row_groups_to_read
method to prune row groups by fulltext index and inverted index is necessary. However, ensure that the method handles empty output correctly after pruning.Verification successful
Handling of empty output after pruning is correctly implemented.
The
row_groups_to_read
method ensures that empty outputs are handled appropriately. Specifically, theParquetReader
'snext_batch
method checks ifrow_groups
is empty and returnsNone
if so, indicating no more data to read.
src/mito2/src/sst/parquet/reader.rs
:
next_batch
method inParquetReader
class handles emptyrow_groups
by returningNone
.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify handling of empty output after pruning in `row_groups_to_read` method. # Test: Search for the method usage. Expect: Proper handling of empty output. rg --type rust -A 10 'fn row_groups_to_read'Length of output: 874
Script:
#!/bin/bash # Description: Verify handling of empty output after pruning in `row_groups_to_read` method. # Search for the usage of `row_groups_to_read` method to inspect how its return value is handled. rg --type rust 'row_groups_to_read' --context 10Length of output: 3050
Script:
#!/bin/bash # Description: Verify handling of empty output after pruning in `row_groups_to_read` method. # Search for the usage of `row_groups` variable to inspect how its return value is handled. rg --type rust 'let row_groups' --context 20Length of output: 18180
Script:
#!/bin/bash # Description: Verify handling of empty output after pruning in `row_groups_to_read` method. # Search for the usage of `row_groups` variable to inspect how its return value is handled. rg --type rust 'row_groups' --context 20Length of output: 161796
Unit tests has been completed. Thank you for your hard work to review @evenyag @v0y4g3r @waynexia @fengjiachun |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4348 +/- ##
==========================================
- Coverage 85.14% 84.97% -0.18%
==========================================
Files 1063 1067 +4
Lines 189994 191146 +1152
==========================================
+ Hits 161774 162428 +654
- Misses 28220 28718 +498 |
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (1)
- src/mito2/src/sst/index/fulltext_index/creator.rs (4 hunks)
Files skipped from review as they are similar to previous changes (1)
- src/mito2/src/sst/index/fulltext_index/creator.rs
I hereby agree to the terms of the GreptimeDB CLA.
Refer to a related PR or issue link (optional)
close #4246
What's changed and what's your intention?
FulltextIndexSearcher
and corresponding implementationTantivyFulltextIndexSearcher
FulltextIndexSearcher
into mito2'sFulltextIndexApplier
prune_row_groups_by_fulltext_index
based onprune_row_groups_by_inverted_index
Checklist
Summary by CodeRabbit
New Features
Bug Fixes
Refactor
Performance
Tests