-
Notifications
You must be signed in to change notification settings - Fork 308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: Decouple dedup and merge #4139
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #4139 +/- ##
==========================================
- Coverage 85.49% 84.60% -0.89%
==========================================
Files 994 1021 +27
Lines 174907 178953 +4046
==========================================
+ Hits 149528 151399 +1871
- Misses 25379 27554 +2175 |
8cd0e78
to
71bccfd
Compare
Avoid iterating all timestamps.
a4e696b
to
37d8211
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I hereby agree to the terms of the GreptimeDB CLA.
Refer to a related PR or issue link (optional)
What's changed and what's your intention?
This PR refactors the MergeReader and introduces a new reader
DedupReader
MergeReader
only merges sorted batches but doesn't dedup rowsDedupReader
fetches batches from theMergeReader
and dedup themDedupReader
uses theDedupStrategy
to perform the actual deduplicationLastRow
, which keeps the row with the latest sequence for each keyput_only
hint from theBatch
as it is unused and easy to introduce bugs.Checklist