Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Decouple dedup and merge #4139

Merged
merged 11 commits into from
Jun 17, 2024

Conversation

evenyag
Copy link
Contributor

@evenyag evenyag commented Jun 13, 2024

I hereby agree to the terms of the GreptimeDB CLA.

Refer to a related PR or issue link (optional)

What's changed and what's your intention?

This PR refactors the MergeReader and introduces a new reader DedupReader

  • The MergeReader only merges sorted batches but doesn't dedup rows
  • The DedupReader fetches batches from the MergeReader and dedup them
  • The DedupReader uses the DedupStrategy to perform the actual deduplication
    • The default strategy is LastRow, which keeps the row with the latest sequence for each key
    • We could implement other dedup strategies later
  • Removes the put_only hint from the Batch as it is unused and easy to introduce bugs.

Checklist

  • I have written the necessary rustdoc comments.
  • I have added the necessary unit tests and integration tests.
  • This PR requires documentation updates.

@github-actions github-actions bot added the docs-not-required This change does not impact docs. label Jun 13, 2024
@evenyag evenyag marked this pull request as ready for review June 13, 2024 08:55
Copy link

codecov bot commented Jun 13, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 84.60%. Comparing base (acdfaab) to head (7274d77).
Report is 7 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4139      +/-   ##
==========================================
- Coverage   85.49%   84.60%   -0.89%     
==========================================
  Files         994     1021      +27     
  Lines      174907   178953    +4046     
==========================================
+ Hits       149528   151399    +1871     
- Misses      25379    27554    +2175     

@evenyag evenyag marked this pull request as draft June 13, 2024 12:13
@evenyag evenyag marked this pull request as ready for review June 14, 2024 07:35
Copy link
Contributor

@killme2008 killme2008 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

src/mito2/src/read/dedup.rs Show resolved Hide resolved
src/mito2/src/read/dedup.rs Outdated Show resolved Hide resolved
Copy link
Collaborator

@fengjiachun fengjiachun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@killme2008 killme2008 added this pull request to the merge queue Jun 17, 2024
Merged via the queue into GreptimeTeam:main with commit 558272d Jun 17, 2024
49 checks passed
@evenyag evenyag deleted the refactor/merge-reader branch June 17, 2024 06:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs-not-required This change does not impact docs.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants