-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
etcdserver: separate raft log compact from snapshot #18459
etcdserver: separate raft log compact from snapshot #18459
Conversation
Signed-off-by: Clement <gh.2lgqz@aleeas.com>
…o 10 Signed-off-by: Clement <gh.2lgqz@aleeas.com>
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: clement2026 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Hi @clement2026. Thanks for your PR. I'm waiting for a etcd-io member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Benchmark SummaryHere are the benchmark results for compacting the raft log every time
Throughput performs best when the step is 1 or 10. There isn’t much difference between the two. I've set the default step value to 10 in the code. |
Benchmark DetailsHere's how I ran the benchmark:
step_1 vs mainCompact raft log every time step_10 vs mainCompact raft log every time step_100 vs mainCompact raft log every time step_1000 vs mainCompact raft log every time main vs mainRe-run benchmarks on the main branch to ensure results are consistent step_1 vs step_10Since the benchmark results for step_1 and step_10 are pretty close, here I compare them to see the difference: GitHub workflow: https://github.com/clement2026/etcd-benchmark-action/actions/runs/10424407957 The archive containing .csv files and the script: etcd-benchmark-20240817-06-00-29.zip The benchmark was run on a cloud VM with:
|
Codecov ReportAttention: Patch coverage is
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files
... and 22 files with indirect coverage changes @@ Coverage Diff @@
## main #18459 +/- ##
==========================================
+ Coverage 68.79% 68.89% +0.10%
==========================================
Files 420 420
Lines 35490 35482 -8
==========================================
+ Hits 24416 24446 +30
+ Misses 9647 9613 -34
+ Partials 1427 1423 -4 Continue to review full report in Codecov by Sentry.
|
/ok-to-test |
Signed-off-by: Clement <gh.2lgqz@aleeas.com>
/retest pull-etcd-robustness-arm64 |
@clement2026: The
The following commands are available to trigger optional jobs:
Use In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/test pull-etcd-robustness-arm64 |
1 similar comment
/test pull-etcd-robustness-arm64 |
Reason for the failureThe robustness test failed at This happens before the leader creates any snapshot, and https://github.com/etcd-io/raft/blob/ba8d7e340d530ae106118f059d3da4d89f933d23/raft.go#L662C1-L689 // maybeSendSnapshot fetches a snapshot from Storage, and sends it to the given
// node. Returns true iff the snapshot message has been emitted successfully.
func (r *raft) maybeSendSnapshot(to uint64, pr *tracker.Progress) bool {
if !pr.RecentActive {
r.logger.Debugf("ignore sending snapshot to %x since it is not recently active", to)
return false
}
snapshot, err := r.raftLog.snapshot()
if err != nil {
if err == ErrSnapshotTemporarilyUnavailable {
r.logger.Debugf("%x failed to send snapshot to %x because snapshot is temporarily unavailable", r.id, to)
return false
}
panic(err) // TODO(bdarnell)
}
if IsEmptySnap(snapshot) {
panic("need non-empty snapshot")
}
sindex, sterm := snapshot.Metadata.Index, snapshot.Metadata.Term
r.logger.Debugf("%x [firstindex: %d, commit: %d] sent snapshot[index: %d, term: %d] to %x [%s]",
r.id, r.raftLog.firstIndex(), r.raftLog.committed, sindex, sterm, to, pr)
pr.BecomeSnapshot(sindex)
r.logger.Debugf("%x paused sending replication messages to %x [%s]", r.id, to, pr)
r.send(pb.Message{To: to, Type: pb.MsgSnap, Snapshot: &snapshot})
return true
} VisualizationHere are some images to help us understand the situation better. Main branchCompaction only happens right after a snapshot is created. This PRCompaction can happen before a snapshot is created. Solutions
I'm stumped right now. Any ideas would be awesome! |
/test pull-etcd-robustness-arm64 |
2 similar comments
/test pull-etcd-robustness-arm64 |
/test pull-etcd-robustness-arm64 |
Sounds reasonable. We are separating mechanisms for raft log compaction and snapshots, meaning that now we need to guarantee that snapshot is always available. Creating empty snapshot at the etcd start sounds like a good first iteration. |
Thank you for the info! I'll give this a shot and open a new PR soon. |
3c14a8f
to
0e4d412
Compare
@clement2026: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
@ahrtr the image isn't loading; It's not set to public: https://private-user-images.githubusercontent.com/27894831/358932866-df2dbdb7-2165-492c-93b8-907e09651fd4.png
Though the leader doesn't care about the follower, the leader itself can panic before sending out the snapshot. Before sending a snapshot to a follower, the leader runs https://github.com/etcd-io/raft/blob/ba8d7e340d530ae106118f059d3da4d89f933d23/raft.go#L678-L680 if IsEmptySnap(snapshot) {
panic("need non-empty snapshot")
} All the leader wants is actually a bbolt db snapshot, not a raft log snapshot. But it seems I can't walk around |
It's concerning we mix a potential bug with enhancement. Can you raise a separate issue to clearly clarify how to reproduce the issue? |
Sure, I'll create the issue a bit later. |
After chatting with serathius, I've realized this PR isn’t needed anymore. I'll keep working on #17098 in a new PR. |
Part of #17098
Key Changes
compactRaftLog
for raft log compaction.RaftLogCompactionStep
andDefaultRaftLogCompactionStep
to manage how often the raft log is compacted.