Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(fulltext_index): integrate full-text indexer with parquet reader #4348

Merged
merged 12 commits into from
Jul 15, 2024

Conversation

zhongzc
Copy link
Contributor

@zhongzc zhongzc commented Jul 11, 2024

I hereby agree to the terms of the GreptimeDB CLA.

Refer to a related PR or issue link (optional)

close #4246

What's changed and what's your intention?

  • Add FulltextIndexSearcher and corresponding implementation TantivyFulltextIndexSearcher
  • Integrate FulltextIndexSearcher into mito2's FulltextIndexApplier
  • Enhance prune row groups logic, add prune_row_groups_by_fulltext_index based on prune_row_groups_by_inverted_index

Checklist

  • I have written the necessary rustdoc comments.
  • I have added the necessary unit tests and integration tests.
  • This PR requires documentation updates.

Summary by CodeRabbit

  • New Features

    • Introduced full-text search functionality using Tantivy.
    • Added support for full-text index application and search in the scanning process.
  • Bug Fixes

    • Improved error handling for full-text and inverted indexes.
  • Refactor

    • Renamed various internal structs and methods for clarity, particularly around full-text and inverted index components.
  • Performance

    • Enhanced metrics tracking for operations related to index application.
  • Tests

    • Added comprehensive tests for full-text search scenarios.

Copy link
Contributor

coderabbitai bot commented Jul 11, 2024

Walkthrough

The changes introduce support for full-text indexing using Tantivy by adding modules and functionality for creating, searching, and testing full-text indexes. This includes new structs, traits, and methods for handling full-text indexes, integrating them into the existing SST writer, and updating error handling, metrics, and related indexing components accordingly.

Changes

Files/Paths Change Summaries
src/.../fulltext_index.rs, src/.../create.rs, src/.../error.rs Added modules for search and tests, exported fields, introduced new error variants.
src/.../search.rs, src/.../search/tantivy.rs Introduced FulltextIndexSearcher trait and its implementation using Tantivy.
src/.../tests.rs Added tests for various search scenarios using full-text indexes.
src/mito2/Cargo.toml Updated dependencies to include common-function and session.
src/.../engine.rs Added method to handle full-text index configuration in EngineInner.
src/.../error.rs Renamed error variants and added new error variant for full-text index.
src/.../metrics.rs Replaced Histogram with HistogramVec and updated metrics.
src/.../scan_region.rs Modified index applier functionality to support full-text indexing.
src/.../sst/index.rs, src/.../index/fulltext_index.rs Renamed SstIndexCreator to FulltextIndexer and updated method names.
src/.../index/fulltext_index/applier.rs, src/.../builder.rs, src/.../creator.rs Added modules and methods for applying full-text indexes and refactored existing ones.
src/.../indexer/finish.rs Updated import paths and renamed indexer entities.
src/.../inverted_index/applier.rs, .../builder.rs, .../between.rs, .../comparison.rs, .../eq_list.rs, .../in_list.rs, .../regex_match.rs Renamed structs and builders related to inverted index appliers.
src/.../puffin_manager.rs Redefined types related to puffin management.
src/.../parquet/reader.rs Modified Parquet reader to support full-text index applier and updated metrics for row group filtering.
src/.../creator.rs Renamed SstIndexCreator to FulltextIndexer and updated methods accordingly.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Engine
    participant FulltextIndexer
    participant Tantivy
    Client->>Engine: Request full-text search
    Engine->>FulltextIndexer: Create searcher
    FulltextIndexer->>Tantivy: Perform search
    Tantivy-->>FulltextIndexer: Return search results
    FulltextIndexer-->>Engine: Pass results
    Engine-->>Client: Return search results
Loading

Assessment against linked issues

Objective (Issue #4246) Addressed Explanation
Creator for full-text index
Integrate with SST writer
CREATE statement support
Push down matches and apply the index Not explicitly mentioned in the changes.

Poem

In the world of code, there came a quest,
To search through text, and find the best.
With Tantivy's might, and indexes full,
Now words align, in search, they pull.
Through tests and traits, the code did rise,
A rabbit’s dream, with keen, bright eyes.


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@github-actions github-actions bot added the docs-not-required This change does not impact docs. label Jul 11, 2024
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
src/index/src/fulltext_index/search/tantivy.rs Outdated Show resolved Hide resolved
src/mito2/src/sst/parquet/reader.rs Outdated Show resolved Hide resolved
src/mito2/src/sst/index/fulltext_index/applier/builder.rs Outdated Show resolved Hide resolved
src/puffin/src/puffin_manager/fs_puffin_manager/reader.rs Outdated Show resolved Hide resolved
@evenyag
Copy link
Contributor

evenyag commented Jul 12, 2024

Sqlness failed

thread 'grpc-handlers' panicked at src/mito2/src/sst/parquet/reader.rs:539:13:
attempt to subtract with overflow
stack backtrace:
   0: rust_begin_unwind
             at /rustc/f9b16149208c8a8a349c32813312716f6603eb6f/library/std/src/panicking.rs:652:5
   1: core::panicking::panic_fmt
             at /rustc/f9b16149208c8a8a349c32813312716f6603eb6f/library/core/src/panicking.rs:72:14
   2: core::panicking::panic_const::panic_const_sub_overflow
             at /rustc/f9b16149208c8a8a349c32813312716f6603eb6f/library/core/src/panicking.rs:175:21
   3: mito2::sst::parquet::reader::ParquetReaderBuilder::prune_row_groups_by_inverted_index::{{closure}}
             at /home/runner/work/greptimedb/greptimedb/src/mito2/src/sst/parquet/reader.rs:539:13
   4: mito2::sst::parquet::reader::ParquetReaderBuilder::row_groups_to_read::{{closure}}
             at /home/runner/work/greptimedb/greptimedb/src/mito2/src/sst/parquet/reader.rs:364:14
   5: mito2::sst::parquet::reader::ParquetReaderBuilder::build_reader_input::{{closure}}
             at /home/runner/work/greptimedb/greptimedb/src/mito2/src/sst/parquet/reader.rs:224:14
   6: mito2::read::scan_region::ScanInput::prune_file_ranges::{{closure}}
             at /home/runner/work/greptimedb/greptimedb/src/mito2/src/read/scan_region.rs:622:18

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
@evenyag
Copy link
Contributor

evenyag commented Jul 14, 2024

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
@zhongzc zhongzc marked this pull request as ready for review July 14, 2024 16:05
@zhongzc zhongzc requested review from v0y4g3r, waynexia and a team as code owners July 14, 2024 16:05
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Outside diff range, codebase verification and nitpick comments (2)
src/index/src/fulltext_index/search/tantivy.rs (1)

41-69: Consider Logging or Collecting Cost to Open Index

Opening an index can be a costly operation. Consider logging the time taken or collecting metrics for this operation.

debug!(
    "Opened tantivy index on {:?} in {:?}",
    path.as_ref(),
    now.elapsed()
);
src/mito2/src/read/scan_region.rs (1)

Line range hint 336-376:
Fix the typo in the method name.

The method name build_invereted_index_applier has a typo. It should be build_inverted_index_applier.

-    fn build_invereted_index_applier(&self) -> Option<InvertedIndexApplierRef> {
+    fn build_inverted_index_applier(&self) -> Option<InvertedIndexApplierRef> {
Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 15ac811 and 70e330c.

Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
Files selected for processing (29)
  • src/index/src/fulltext_index.rs (1 hunks)
  • src/index/src/fulltext_index/create.rs (1 hunks)
  • src/index/src/fulltext_index/error.rs (2 hunks)
  • src/index/src/fulltext_index/search.rs (1 hunks)
  • src/index/src/fulltext_index/search/tantivy.rs (1 hunks)
  • src/index/src/fulltext_index/tests.rs (1 hunks)
  • src/mito2/Cargo.toml (2 hunks)
  • src/mito2/src/engine.rs (1 hunks)
  • src/mito2/src/error.rs (4 hunks)
  • src/mito2/src/metrics.rs (1 hunks)
  • src/mito2/src/read/scan_region.rs (13 hunks)
  • src/mito2/src/sst/index.rs (1 hunks)
  • src/mito2/src/sst/index/fulltext_index.rs (1 hunks)
  • src/mito2/src/sst/index/fulltext_index/applier.rs (1 hunks)
  • src/mito2/src/sst/index/fulltext_index/applier/builder.rs (1 hunks)
  • src/mito2/src/sst/index/fulltext_index/creator.rs (4 hunks)
  • src/mito2/src/sst/index/indexer/finish.rs (1 hunks)
  • src/mito2/src/sst/index/inverted_index/applier.rs (7 hunks)
  • src/mito2/src/sst/index/inverted_index/applier/builder.rs (5 hunks)
  • src/mito2/src/sst/index/inverted_index/applier/builder/between.rs (6 hunks)
  • src/mito2/src/sst/index/inverted_index/applier/builder/comparison.rs (5 hunks)
  • src/mito2/src/sst/index/inverted_index/applier/builder/eq_list.rs (8 hunks)
  • src/mito2/src/sst/index/inverted_index/applier/builder/in_list.rs (6 hunks)
  • src/mito2/src/sst/index/inverted_index/applier/builder/regex_match.rs (5 hunks)
  • src/mito2/src/sst/index/inverted_index/creator.rs (5 hunks)
  • src/mito2/src/sst/index/puffin_manager.rs (1 hunks)
  • src/mito2/src/sst/parquet/reader.rs (15 hunks)
  • src/mito2/src/sst/parquet/row_selection.rs (3 hunks)
  • src/puffin/src/puffin_manager/fs_puffin_manager/reader.rs (3 hunks)
Files skipped from review due to trivial changes (3)
  • src/index/src/fulltext_index/create.rs
  • src/mito2/src/sst/index.rs
  • src/mito2/src/sst/index/indexer/finish.rs
Additional comments not posted (88)
src/mito2/src/sst/index/fulltext_index.rs (1)

15-15: Addition of applier module.

The applier module has been added to the fulltext_index module. Ensure that this new module is properly tested and documented.

src/index/src/fulltext_index/search.rs (2)

26-30: Definition of FulltextIndexSearcher trait.

The FulltextIndexSearcher trait is defined with an async method search. Ensure that the method is properly documented and tested.


15-15: Implementation of FulltextIndexSearcher trait in tantivy module.

The implementation of the FulltextIndexSearcher trait in the tantivy module looks correct. Ensure that the implementation is properly tested.

src/index/src/fulltext_index.rs (1)

19-22: Addition of search and tests modules.

The search and tests modules have been added to the fulltext_index module. Ensure that these new modules are properly tested and documented.

src/index/src/fulltext_index/error.rs (2)

43-49: Addition of TantivyParser error variant.

The TantivyParser error variant has been added to the Error enum. Ensure that this new error variant is properly handled and documented.


71-72: Mapping TantivyParser to StatusCode::InvalidSyntax.

The TantivyParser error variant is mapped to StatusCode::InvalidSyntax. Ensure that this mapping is consistent with the rest of the error handling logic.

src/mito2/Cargo.toml (2)

76-76: Dependency Addition: common-function

The common-function dependency has been added. Ensure that this dependency is necessary for the dev environment and does not introduce any conflicts.


86-86: Dependency Addition: session

The session dependency has been added. Ensure that this dependency is necessary for the dev environment and does not introduce any conflicts.

src/index/src/fulltext_index/search/tantivy.rs (1)

72-84: LGTM!

The search method is correctly implemented and follows best practices.

src/mito2/src/sst/index/fulltext_index/applier.rs (4)

34-47: LGTM!

The FulltextIndexApplier struct and its fields are correctly defined.


51-65: LGTM!

The new method is correctly implemented and follows best practices.


67-106: LGTM!

The apply method is correctly implemented and follows best practices.


108-128: LGTM!

The index_dir_path method is correctly implemented and follows best practices.

src/mito2/src/sst/index/inverted_index/applier/builder/regex_match.rs (5)

22-22: LGTM!

The collect_regex_match method is correctly implemented and follows best practices.


62-62: LGTM!

The test_regex_match_basic test is correctly implemented and follows best practices.


91-91: LGTM!

The test_regex_match_field_column test is correctly implemented and follows best practices.


113-113: LGTM!

The test_regex_match_type_mismatch test is correctly implemented and follows best practices.


135-135: LGTM!

The test_regex_match_type_nonexist_column test is correctly implemented and follows best practices.

src/index/src/fulltext_index/tests.rs (10)

23-36: LGTM!

The create_index function is well-implemented, initializing the full-text index creator and handling errors appropriately.


38-52: LGTM!

The test_search function is well-implemented, validating the search functionality with appropriate assertions.


54-67: LGTM!

The test_simple_term function is correctly implemented, testing a simple term search scenario.


69-79: LGTM!

The test_negative_term function is correctly implemented, testing a negative term search scenario.


81-95: LGTM!

The test_must_term function is correctly implemented, testing a must-term search scenario.


97-107: LGTM!

The test_boolean_operators function is correctly implemented, testing boolean operators in search queries.


109-122: LGTM!

The test_phrase_term function is correctly implemented, testing phrase term search scenarios.


124-137: LGTM!

The test_config_english_analyzer_case_insensitive function is correctly implemented, testing case-insensitive search with the English analyzer.


139-152: LGTM!

The test_config_english_analyzer_case_sensitive function is correctly implemented, testing case-sensitive search with the English analyzer.


154-167: LGTM!

The test_config_chinese_analyzer function is correctly implemented, testing search with the Chinese analyzer.

src/mito2/src/sst/index/inverted_index/applier/builder/in_list.rs (6)

Line range hint 23-41:
LGTM!

The collect_inlist function is well-implemented, handling IN list expressions and adding them as predicates with appropriate error handling.


Line range hint 68-87:
LGTM!

The test_collect_in_list_basic function is correctly implemented, testing a basic IN list scenario.


Line range hint 101-112:
LGTM!

The test_collect_in_list_negated function is correctly implemented, testing a negated IN list scenario.


Line range hint 126-137:
LGTM!

The test_collect_in_list_field_column function is correctly implemented, testing an IN list scenario with a field column.


Line range hint 151-162:
LGTM!

The test_collect_in_list_type_mismatch function is correctly implemented, testing an IN list scenario with a type mismatch.


Line range hint 178-189:
LGTM!

The test_collect_in_list_nonexistent_column function is correctly implemented, testing an IN list scenario with a nonexistent column.

src/mito2/src/sst/index/inverted_index/applier/builder/between.rs (6)

Line range hint 21-39:
LGTM!

The collect_between function is well-implemented, handling BETWEEN expressions and adding them as predicates with appropriate error handling.


Line range hint 75-92:
LGTM!

The test_collect_between_basic function is correctly implemented, testing a basic BETWEEN scenario.


Line range hint 118-129:
LGTM!

The test_collect_between_negated function is correctly implemented, testing a negated BETWEEN scenario.


Line range hint 144-155:
LGTM!

The test_collect_between_field_column function is correctly implemented, testing a BETWEEN scenario with a field column.


Line range hint 170-181:
LGTM!

The test_collect_between_type_mismatch function is correctly implemented, testing a BETWEEN scenario with a type mismatch.


Line range hint 197-208:
LGTM!

The test_collect_between_nonexistent_column function is correctly implemented, testing a BETWEEN scenario with a nonexistent column.

src/mito2/src/sst/index/puffin_manager.rs (3)

42-44: LGTM!

The type definitions for SstPuffinBlob, SstPuffinDir, and BlobReader are correctly implemented, providing type aliases for the Puffin manager components.


Line range hint 49-79:
LGTM!

The PuffinManagerFactory implementation is well-implemented, providing methods for creating and building Puffin managers with appropriate error handling.


Line range hint 119-159:
LGTM!

The test_puffin_manager_factory function is correctly implemented, testing the PuffinManagerFactory with various scenarios.

src/mito2/src/sst/parquet/row_selection.rs (3)

30-49: LGTM!

The function row_selection_from_row_ranges is well-implemented and efficiently handles the conversion of row ranges into a RowSelection.


55-79: LGTM!

The function row_selection_from_sorted_row_ids is well-implemented and efficiently handles the conversion of sorted row IDs into a RowSelection.


82-91: LGTM!

The function intersect_row_selections is well-implemented and correctly handles the intersection of two RowSelections.

src/mito2/src/sst/index/fulltext_index/applier/builder.rs (1)

25-93: LGTM!

The FulltextIndexApplierBuilder struct and its implementation are well-constructed. The build method uses then() which is lazily evaluated, addressing the previous suggestion.

src/mito2/src/sst/index/inverted_index/applier.rs (1)

Line range hint 39-175:
LGTM!

The InvertedIndexApplier struct and its implementation are well-constructed. The apply method correctly handles the application of predicates to SST files.

src/puffin/src/puffin_manager/fs_puffin_manager/reader.rs (1)

Line range hint 1-202:
LGTM!

The FsPuffinReader struct and its implementation are well-constructed. The init_dir_to_stager method efficiently handles the initialization of directories to the stager.

src/mito2/src/sst/index/inverted_index/applier/builder/comparison.rs (1)

22-22: Renaming SstIndexApplierBuilder to InvertedIndexApplierBuilder

The renaming is consistent with the changes across the codebase to distinguish between different types of indexers.

src/mito2/src/sst/index/inverted_index/applier/builder/eq_list.rs (1)

25-25: Renaming SstIndexApplierBuilder to InvertedIndexApplierBuilder

The renaming is consistent with the changes across the codebase to distinguish between different types of indexers.

src/mito2/src/sst/index/inverted_index/applier/builder.rs (2)

40-40: Renaming SstIndexApplierBuilder to InvertedIndexApplierBuilder

The renaming is consistent with the changes across the codebase to distinguish between different types of indexers.


44-46: Updating Documentation for InvertedIndexApplierBuilder

The documentation has been updated to reflect the renaming of SstIndexApplierBuilder to InvertedIndexApplierBuilder.

src/mito2/src/metrics.rs (2)

197-197: Replacing Histogram with HistogramVec

The change improves the granularity of metrics tracking for index application times.


200-200: Updating Labels for HistogramVec

The labels for HistogramVec have been updated to include TYPE_LABEL.

src/mito2/src/sst/index/inverted_index/creator.rs (3)

54-55: Struct Renaming Approved

The renaming of SstIndexCreator to InvertedIndexer aligns with the new naming conventions and improves clarity.


78-78: Implementation Approved with a Suggestion

The implementation of InvertedIndexer is comprehensive and aligns with the expected functionality.

However, consider adding more detailed error handling in the do_update method to capture specific errors that might occur during the update process.


301-301: Tests Approved

The tests for InvertedIndexer are comprehensive and cover a wide range of scenarios, ensuring the correctness of the indexer under different conditions.

src/mito2/src/sst/index/fulltext_index/creator.rs (3)

41-42: Struct Definition Approved

The FulltextIndexer struct definition is correct and includes relevant fields for managing the index creation process.


51-52: Implementation Approved

The implementation of FulltextIndexer is comprehensive and aligns with the expected functionality.


296-296: Tests Approved

The tests for FulltextIndexer are comprehensive and cover a wide range of scenarios, ensuring the correctness of the indexer under different conditions.

src/mito2/src/engine.rs (1)

434-434: Method Update Approved

The update to the handle_query method to include with_ignore_fulltext_index aligns with the new functionality to support fulltext indexing.

src/mito2/src/error.rs (2)

825-830: LGTM! But verify the usage of the new error variant.

The addition of the ApplyFulltextIndex error variant is correct and consistent with other error variants.

Ensure that the new error variant is correctly integrated and used in the codebase.

Verification successful

The new error variant ApplyFulltextIndex is correctly integrated and used in the codebase.

  • src/mito2/src/error.rs
  • src/mito2/src/sst/index/fulltext_index/applier.rs
Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the usage of the new error variant `ApplyFulltextIndex`.

# Test: Search for the usage of the new error variant. Expect: Usage in relevant files.
rg --type rust 'ApplyFulltextIndex'

Length of output: 570


559-560: LGTM! But verify the usage of the renamed error variant.

The renaming of the ApplyIndex error variant to ApplyInvertedIndex is correct and consistent with the new naming conventions.

Ensure that all references to the old error variant name are updated in the codebase.

Verification successful

Renaming successfully verified.

The renaming of the ApplyIndex error variant to ApplyInvertedIndex has been correctly applied throughout the codebase.

  • ApplyInvertedIndex is used in multiple places.
  • No occurrences of ApplyIndex were found.
Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the usage of the renamed error variant `ApplyInvertedIndex`.

# Test: Search for the usage of the renamed error variant. Expect: No usage of the old name and correct usage of the new name.
rg --type rust 'ApplyInvertedIndex' && ! rg --type rust 'ApplyIndex'

Length of output: 530

src/mito2/src/read/scan_region.rs (11)

48-51: Imports look good.

The new imports for FulltextIndexApplierBuilder and FulltextIndexApplierRef are necessary for the new functionality.


170-171: Field addition looks good.

The new field ignore_fulltext_index is necessary to control whether to ignore full-text indexing.


191-191: Initialization looks good.

The initialization of the new field ignore_fulltext_index in the new method is correct and necessary.


210-215: Method addition looks good.

The new method with_ignore_fulltext_index is necessary to set the ignore_fulltext_index field.


Line range hint 294-310:
Inclusion of full-text index applier looks good.

The inclusion of the full-text index applier in the scan input is correct and necessary for the new functionality.


378-391: Method addition looks good.

The new method build_fulltext_index_applier is necessary to build the full-text index applier.


438-440: Field additions look good.

The new fields inverted_index_applier and fulltext_index_applier are necessary to store the index appliers.


467-468: Initialization looks good.

The initialization of the new fields inverted_index_applier and fulltext_index_applier in the new method is correct and necessary.


526-535: Method addition looks good.

The new method with_inverted_index_applier is necessary to set the inverted_index_applier field.


536-542: Method addition looks good.

The new method with_fulltext_index_applier is necessary to set the fulltext_index_applier field.


618-619: Inclusion of index appliers looks good.

The inclusion of the new index appliers in the prune_file_ranges method is correct and necessary for the new functionality.

src/mito2/src/sst/parquet/reader.rs (13)

18-18: LGTM!

The import of Range from std::ops is correct and necessary for the code changes.


56-57: LGTM!

The import statements for FulltextIndexApplierRef and InvertedIndexApplierRef are correct and necessary for the code changes.


62-64: LGTM!

The import statements for row_selection_from_row_ranges and row_selection_from_sorted_row_ids are correct and necessary for the code changes.


85-87: LGTM!

The addition of inverted_index_applier and fulltext_index_applier fields to the ParquetReaderBuilder struct is necessary for the integration of full-text indexing and inverted indexing with the Parquet reader.


109-110: LGTM!

The initialization of inverted_index_applier and fulltext_index_applier fields in the new method of ParquetReaderBuilder is correct and necessary for the code changes.


145-153: LGTM!

The addition of the inverted_index_applier method to ParquetReaderBuilder is necessary for attaching the inverted index applier to the builder.


155-161: LGTM!

The addition of the fulltext_index_applier method to ParquetReaderBuilder is necessary for attaching the fulltext index applier to the builder.


374-422: LGTM!

The addition of the prune_row_groups_by_fulltext_index method to ParquetReaderBuilder is necessary for pruning row groups by fulltext index. The method includes proper error handling and metrics collection.


Line range hint 432-493: LGTM!

The new logic in the prune_row_groups_by_inverted_index method to handle inverted index pruning is necessary. The method includes proper error handling and metrics collection.


Line range hint 701-712: LGTM!

The addition of num_row_groups_fulltext_index_filtered and num_rows_in_row_group_fulltext_index_filtered fields to ReaderMetrics is necessary for tracking metrics related to fulltext index filtering.


Line range hint 986-1003: LGTM!

The new metrics reporting for fulltext index filtering in the Drop implementation of ParquetReader is necessary for tracking the performance of fulltext index filtering.


1147-1435: LGTM!

The addition of new test cases to the tests module for pruning row groups by rows and ranges is necessary for ensuring the correctness of the pruning logic.


337-372: Verify handling of empty output after pruning.

The new logic in the row_groups_to_read method to prune row groups by fulltext index and inverted index is necessary. However, ensure that the method handles empty output correctly after pruning.

Verification successful

Handling of empty output after pruning is correctly implemented.

The row_groups_to_read method ensures that empty outputs are handled appropriately. Specifically, the ParquetReader's next_batch method checks if row_groups is empty and returns None if so, indicating no more data to read.

  • src/mito2/src/sst/parquet/reader.rs:
    • next_batch method in ParquetReader class handles empty row_groups by returning None.
Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify handling of empty output after pruning in `row_groups_to_read` method.

# Test: Search for the method usage. Expect: Proper handling of empty output.
rg --type rust -A 10 'fn row_groups_to_read'

Length of output: 874


Script:

#!/bin/bash
# Description: Verify handling of empty output after pruning in `row_groups_to_read` method.

# Search for the usage of `row_groups_to_read` method to inspect how its return value is handled.
rg --type rust 'row_groups_to_read' --context 10

Length of output: 3050


Script:

#!/bin/bash
# Description: Verify handling of empty output after pruning in `row_groups_to_read` method.

# Search for the usage of `row_groups` variable to inspect how its return value is handled.
rg --type rust 'let row_groups' --context 20

Length of output: 18180


Script:

#!/bin/bash
# Description: Verify handling of empty output after pruning in `row_groups_to_read` method.

# Search for the usage of `row_groups` variable to inspect how its return value is handled.
rg --type rust 'row_groups' --context 20

Length of output: 161796

@zhongzc
Copy link
Contributor Author

zhongzc commented Jul 14, 2024

Unit tests has been completed. Thank you for your hard work to review @evenyag @v0y4g3r @waynexia @fengjiachun

Copy link

codecov bot commented Jul 14, 2024

Codecov Report

Attention: Patch coverage is 95.35799% with 59 lines in your changes missing coverage. Please review.

Project coverage is 84.97%. Comparing base (b8bd845) to head (813d38d).
Report is 3 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4348      +/-   ##
==========================================
- Coverage   85.14%   84.97%   -0.18%     
==========================================
  Files        1063     1067       +4     
  Lines      189994   191146    +1152     
==========================================
+ Hits       161774   162428     +654     
- Misses      28220    28718     +498     

@evenyag evenyag mentioned this pull request Jul 15, 2024
13 tasks
src/mito2/src/sst/index/fulltext_index/creator.rs Outdated Show resolved Hide resolved
src/mito2/src/sst/index/fulltext_index/creator.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 70e330c and 813d38d.

Files selected for processing (1)
  • src/mito2/src/sst/index/fulltext_index/creator.rs (4 hunks)
Files skipped from review as they are similar to previous changes (1)
  • src/mito2/src/sst/index/fulltext_index/creator.rs

@evenyag evenyag enabled auto-merge July 15, 2024 08:13
@evenyag evenyag added this pull request to the merge queue Jul 15, 2024
Merged via the queue into GreptimeTeam:main with commit 04ac0c8 Jul 15, 2024
55 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs-not-required This change does not impact docs.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Tracking Issue: support full-text index
3 participants