-
Notifications
You must be signed in to change notification settings - Fork 330
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: unordered scanner scans data by time ranges #4757
Conversation
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughThe changes introduce a new field Changes
Possibly related PRs
Suggested reviewers
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
@coderabbitai summary |
Actions performedSummary regeneration triggered. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4757 +/- ##
==========================================
- Coverage 84.56% 84.28% -0.28%
==========================================
Files 1118 1118
Lines 202889 202925 +36
==========================================
- Hits 171566 171033 -533
- Misses 31323 31892 +569 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I hereby agree to the terms of the GreptimeDB CLA.
Refer to a related PR or issue link (optional)
What's changed and what's your intention?
This PR adds the initial implementation for the unordered scanner to support reading rows from time ranges individually.
The scanner groups sources (files and memtables) into ranges and maintains a
RangeMeta
to store the metadata of that range. EachPartitionRange
refers to aRangeMeta
in theStreamContext
so we can use thePartitionRange::identifier
to locate theRangeMeta
.SeqScan
andUnorderedScan
have different grouping strategies:SeqScan
merges all overlapping files and memtables to ensure output rows are sorted. So it puts overlapping files and memtables in the same range.UnorderedScan
can assign each row group to a dedicated range as it doesn't sort and merge duplication rowsThe implementations of grouping strategies are similar to
SeqDistributor
andUnorderedDistributor
.It also refactors the
scan_partition()
method by wrapping some methods to scan memtables and files.Checklist
Summary by CodeRabbit
Release Notes
New Features
Improvements
BTreeMap
, enhancing data organization and access.num_ranges
to track range counts.Bug Fixes
Documentation