-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
etcdserver: separate maybeCompactRaftLog function to compact raft log independently #18635
base: main
Are you sure you want to change the base?
etcdserver: separate maybeCompactRaftLog function to compact raft log independently #18635
Conversation
…ly from snapshots Signed-off-by: Clement <gh.2lgqz@aleeas.com>
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: clement2026 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Hi @clement2026. Thanks for your PR. I'm waiting for a etcd-io member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
@serathius Please take a look. This PR focuses on one task only. |
…ly from snapshots Signed-off-by: Clement <gh.2lgqz@aleeas.com>
…ly from snapshots Signed-off-by: Clement <gh.2lgqz@aleeas.com>
Codecov ReportAttention: Patch coverage is
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files
... and 13 files with indirect coverage changes @@ Coverage Diff @@
## main #18635 +/- ##
==========================================
+ Coverage 68.79% 68.88% +0.08%
==========================================
Files 420 420
Lines 35522 35545 +23
==========================================
+ Hits 24438 24485 +47
+ Misses 9657 9639 -18
+ Partials 1427 1421 -6 Continue to review full report in Codecov by Sentry.
|
Signed-off-by: Clement <gh.2lgqz@aleeas.com>
I can't test Add log statements ep := etcdProgress{
confState: sn.Metadata.ConfState,
snapi: sn.Metadata.Index,
appliedt: sn.Metadata.Term,
appliedi: sn.Metadata.Index,
compacti: fi - 1,
}
s.Logger().Info("initial ep.compacti", zap.Uint64("ep.compacti", ep.compacti),zap.Uint64("ep.snapi", ep.snapi)) compacti := ep.snapi - s.Cfg.SnapshotCatchUpEntries
if compacti <= ep.compacti {
return
}
s.Logger().Info("compacti > ep.compacti", zap.Uint64("compacti", compacti), zap.Uint64("ep.compacti", ep.compacti)) Start a new etcd instance and make 110 put requests➜ rm -rf default.etcd; bin/etcd --experimental-snapshot-catchup-entries=5 --snapshot-count=10 2>&1 | grep 'ep.compacti'
➜ benchmark put --key-space-size=9999999 --val-size=100 --total=110 Logs:
Restart the etcd instance and make 50 put requests➜ bin/etcd --experimental-snapshot-catchup-entries=5 --snapshot-count=10 2>&1 | grep 'ep.compacti'
➜ benchmark put --key-space-size=9999999 --val-size=100 --total=50 Logs:
|
server/etcdserver/server.go
Outdated
@@ -813,6 +818,7 @@ func (s *EtcdServer) run() { | |||
snapi: sn.Metadata.Index, | |||
appliedt: sn.Metadata.Term, | |||
appliedi: sn.Metadata.Index, | |||
compacti: fi - 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to add a comment why we substract one. Compact index should be the last index that raftStorage.Compact was called. Whether we should substract depend on whether the Compact operation is inclusive or exclusive. Need to confirm this.
server/etcdserver/server.go
Outdated
|
||
err = s.r.raftStorage.Compact(compacti) | ||
err := s.r.raftStorage.Compact(compacti) | ||
ep.compacti = compacti |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should update compact id if there was no error.
Signed-off-by: Clement <gh.2lgqz@aleeas.com>
Signed-off-by: Clement <gh.2lgqz@aleeas.com>
// TestMemoryStorageCompaction tests that after calling raftStorage.Compact(compacti) | ||
// without errors, the dummy entry becomes {Index: compacti} and | ||
// raftStorage.FirstIndex() returns (compacti+1, nil). | ||
func TestMemoryStorageCompaction(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the test, it's really great to protect our assumption via test.
Signed-off-by: Clement <gh.2lgqz@aleeas.com>
Signed-off-by: Clement <gh.2lgqz@aleeas.com>
99db0bb
to
bc9a3fa
Compare
7a5bfd6
to
f34bee5
Compare
} | ||
} | ||
// The first snapshot and compaction shouldn't happen because applied index is less than 11 | ||
logOccurredAtMostNTimes(t, mem, 5*time.Second, "saved snapshot", 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those timeouts are very high, can we not have as big timeout? Snapshot and compaction should happen synchronously to operations, meaning logs should be available immidietly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, I'll reduce the timeout to 1 or 2 seconds second and see how it works out.
} | ||
// The first snapshot and compaction shouldn't happen because applied index is less than 11 | ||
logOccurredAtMostNTimes(t, mem, 5*time.Second, "saved snapshot", 0) | ||
logOccurredAtMostNTimes(t, mem, time.Second, "compacted Raft logs", 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason sometimes we wait for 1 second, sometimes for 5 seconds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first logOccurredAtMostNTimes
waits for 5 seconds. Once it's done, we can assume the log is synced up, so the second logOccurredAtMostNTimes
doesn't need to wait that long.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logs should appear in in matter of tens of miliseconds not multiple seconds. The whole test should take seconds.
// logOccurredAtMostNTimes ensures that the log has exactly `count` occurrences of `s` before timing out, no more, no less. | ||
func logOccurredAtMostNTimes(t *testing.T, m *integration.Member, timeout time.Duration, s string, count int) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// logOccurredAtMostNTimes ensures that the log has exactly `count` occurrences of `s` before timing out, no more, no less. | |
func logOccurredAtMostNTimes(t *testing.T, m *integration.Member, timeout time.Duration, s string, count int) { | |
func logOccurredExactlyNTimes(t *testing.T, m *integration.Member, timeout time.Duration, s string, count int) { |
// Retain all log entries up to the latest snapshot index to ensure any member can recover from that snapshot. | ||
// Beyond the snapshot index, preserve the most recent s.Cfg.SnapshotCatchUpEntries entries in memory. | ||
// This allows slow followers to catch up by synchronizing entries instead of requiring a full snapshot transfer. | ||
if ep.snapi <= s.Cfg.SnapshotCatchUpEntries { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if ep.snapi <= s.Cfg.SnapshotCatchUpEntries { | |
if ep.appliedi <= s.Cfg.SnapshotCatchUpEntries { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
snapi is more correct here, the reason is described in the comment.
return | ||
} | ||
|
||
compacti := ep.snapi - s.Cfg.SnapshotCatchUpEntries |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
compacti := ep.snapi - s.Cfg.SnapshotCatchUpEntries | |
compacti := ep.appliedi - s.Cfg.SnapshotCatchUpEntries |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the purpose of the change?
If you use ep.snapi
, then the behaviour is exactly the same as existing behaviour, because ep.snapi
only gets updated each time after creating the (v2) snapshot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see #17098 (comment). Can we get that confirmed firstly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change doesn't change when compaction is run or how many times it's executed. I was aware of https://github.com/etcd-io/raft/blob/5d6eb55c4e6929e461997c9113aba99a5148e921/storage.go#L266-L269 code, that's why I was proposing compacting only ever X applies.
@@ -980,6 +989,7 @@ func (s *EtcdServer) applyAll(ep *etcdProgress, apply *toApply) { | |||
<-apply.notifyc | |||
|
|||
s.triggerSnapshot(ep) | |||
s.maybeCompactRaftLog(ep) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you really want to get this small merged firstly, then please ensure it's as independent as possible. Currently the s.snapshot
performs both snapshot and compaction operations. It makes sense to extract the compaction operation as an independent function/method, but let's call the method inside s.triggerSnapshot
,
func (s *EtcdServer) triggerSnapshot(ep *etcdProgress) {
if !s.shouldSnapshot(ep) {
return
}
lg := s.Logger()
lg.Info(
"triggering snapshot",
zap.String("local-member-id", s.MemberID().String()),
zap.Uint64("local-member-applied-index", ep.appliedi),
zap.Uint64("local-member-snapshot-index", ep.snapi),
zap.Uint64("local-member-snapshot-count", s.Cfg.SnapshotCount),
zap.Bool("snapshot-forced", s.forceSnapshot),
)
s.forceSnapshot = false
s.snapshot(ep.appliedi, ep.confState)
ep.snapi = ep.appliedi
s.compact(xxxxx) // call the new method here, so we still do it each time after creating a snapshot.
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No rush on merging this PR. If we do merge it, we need to ensure etcd actually benefits from it. Let's resolve #17098 (comment) first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The goal of the PR is to make compaction independent from snapshot. Not just refactoring it to function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not just refactor the function.
Just refactoring the function (extract the compact into a separate method) is an independent and safe change, accordingly can be merged soon.
The goal of the PR is to make compaction independent from snapshot
It modifies the logic/semantics, so it's no longer an independent change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Refactoring it to function" is a subset of "refactoring".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clearer about #18635 (comment), I am proposing an independent & safe minor refactoring below as the very first step
// The first snapshot and compaction should happen because applied index is 11 | ||
logOccurredAtMostNTimes(t, mem, 5*time.Second, "saved snapshot", 1) | ||
logOccurredAtMostNTimes(t, mem, time.Second, "compacted Raft logs", 1) | ||
expectMemberLog(t, mem, time.Second, "\"compact-index\": 6", 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need to check if compacted Raft log
occured at most N times? This is hard to check, why checking if "compact-index": X
is not enough ?
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Part of #17098
Goal: separate maybeCompactRaftLog function to compact raft log independently from snapshots.
TODO