-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HUDI-7532] Include only compaction instants for lastCompaction in getDeltaCommitsSinceLatestCompaction #10915
Conversation
hudi-common/src/main/java/org/apache/hudi/common/util/CompactionUtils.java
Outdated
Show resolved
Hide resolved
@@ -147,6 +147,12 @@ public HoodieTimeline filterCompletedAndCompactionInstants() { | |||
|| s.getAction().equals(HoodieTimeline.COMPACTION_ACTION)), details); | |||
} | |||
|
|||
@Override | |||
public HoodieTimeline filterCompletedCommitInstants() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's unit test this.
hudi-common/src/main/java/org/apache/hudi/common/util/CompactionUtils.java
Outdated
Show resolved
Hide resolved
hudi-common/src/test/java/org/apache/hudi/common/util/TestCompactionUtils.java
Outdated
Show resolved
Hide resolved
hudi-common/src/main/java/org/apache/hudi/common/table/cdc/HoodieCDCExtractor.java
Outdated
Show resolved
Hide resolved
@nsivabalan could you rebase the PR on the latest master and address the review comments? |
ae80da4
to
704439a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise looks good.
.option("hoodie.keep.min.commits", "4") | ||
.option("hoodie.keep.max.commits", "5") | ||
.option("hoodie.clean.commits.retained", "3") | ||
.option("hoodie.keep.min.commits", "16") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nsivabalan : Why did we change this test along with this PR ? I am not able to follow the comments added inline.
cc @yihua
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hey @bvaradar : I had a long discussion w/ @danny0405 on this.
here is the issue:
our data table archival polls for compaction commit and does something based on that. Before this patch, we had a bug in canSchedule call wrt compaction scheduling. This patched fixed the issue as you might know.
But this specific test started to fail w/ FileNotFound. Apparently in HoodieCDCExtractor, we parse all entries in commit metadata and poll file status/listStatus for them.
Prior to this patch, due to archival behavior, every commit in active timeline is uncleaned and we were good. after this patch, apparently archival is trailing, and so we do have some commits in active timeline which are cleaned up and hence the fileStatus/listStatus polling results in FileNotFound Issue.
Danny agreed that this is a long pending/known limitation and we could definitely improve user exp by throwing some legible exception/error msg. But it increases the scope of this patch. so, we agreed to tweak the test so that it will not hit FileNotFound issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good @nsivabalan . Can you file a jira to track this (if not already)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…tDeltaCommitsSinceLatestCompaction (#10915) * Fixing schedule compaction bug * Addressing comments * Fixing CDC tests
…tDeltaCommitsSinceLatestCompaction (#10915) * Fixing schedule compaction bug * Addressing comments * Fixing CDC tests
…tDeltaCommitsSinceLatestCompaction (#10915) * Fixing schedule compaction bug * Addressing comments * Fixing CDC tests
Change Logs
Fixing schedule compaction bug.
Looks like when polling timeline for last successful compaction commits, we also include replace commits. So, if a user sets clustering freq to 4 and compaction freq to 5, compaction scheduling may never kick in at all.
Impact
Scheduling of compaction will be triggered at the cadence configured.
Risk level (write none, low medium or high below)
low
Documentation Update
Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none".
ticket number here and follow the instruction to make
changes to the website.
Contributor's checklist