etcdserver: separate raft log compact from snapshot #18459

clement2026 · 2024-08-17T06:51:45Z

Part of #17098

Key Changes

Added a new function compactRaftLog for raft log compaction.
Introduce 2 variables: RaftLogCompactionStep and DefaultRaftLogCompactionStep to manage how often the raft log is compacted.
Refactor some existing tests. These tests previously expected a compaction immediately after a snapshot to ensure a snapshot was sent to followers. This PR broke those tests, so I modified them to ensure they still work.
Add unit and integration tests for this PR.

Blocked by etcdserver: creates a non-empty raft log snapshot on server startup #18494

Signed-off-by: Clement <gh.2lgqz@aleeas.com>

…o 10 Signed-off-by: Clement <gh.2lgqz@aleeas.com>

k8s-ci-robot · 2024-08-17T06:51:49Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: clement2026
Once this PR has been reviewed and has the lgtm label, please assign ahrtr for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot · 2024-08-17T06:51:55Z

Hi @clement2026. Thanks for your PR.

I'm waiting for a etcd-io member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

clement2026 · 2024-08-17T09:49:48Z

Benchmark Summary

Here are the benchmark results for compacting the raft log every time applied index increases by 1, 10, 100, and 1000 (let's call this "step").

Branch		Throughput
main(as the base)	read	-
	write	-
step_1	read	[2.86%, 17.00%]
	write	[3.02%, 16.75%]
step_10	read	[2.82%, 16.53%]
	write	[2.98%, 16.95%]
step_100	read	[-2.09%, 9.77%]
	write	[-1.00%, 10.93%]
step_1000	read	[-4.13%, 7.27%]
	write	[-4.91%, 7.65%]
main again	read	[-4.19%, 7.39%]
	write	[-4.41%, 9.03%]

Throughput performs best when the step is 1 or 10. There isn’t much difference between the two. I've set the default step value to 10 in the code.

clement2026 · 2024-08-17T09:58:20Z

Benchmark Details

Here's how I ran the benchmark:

Run benchmark on main, step_1, step_10, step_100, step_1000 and main branch in that order.
Reboot the OS between each run.
Collect .csv files and generate the heat maps.

step_1 vs main

Compact raft log every time applied index increases by 1.

step_10 vs main

Compact raft log every time applied index increases by 10.

step_100 vs main

Compact raft log every time applied index increases by 100.

step_1000 vs main

Compact raft log every time applied index increases by 1000.

main vs main

Re-run benchmarks on the main branch to ensure results are consistent

step_1 vs step_10

Since the benchmark results for step_1 and step_10 are pretty close, here I compare them to see the difference:

GitHub workflow: https://github.com/clement2026/etcd-benchmark-action/actions/runs/10424407957

The archive containing .csv files and the script: etcd-benchmark-20240817-06-00-29.zip

The benchmark was run on a cloud VM with:

16GB RAM
8 vCPUs
150GB NVMe SSD
Ubuntu 24.04 LTS x64

codecov-commenter · 2024-08-17T13:11:45Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 39.13043% with 28 lines in your changes missing coverage. Please review.

Project coverage is 68.89%. Comparing base (62e4433) to head (0003fa2).
Report is 34 commits behind head on main.

❗ Current head 0003fa2 differs from pull request most recent head 0e4d412

Please upload reports for the commit 0e4d412 to get more accurate results.

Files with missing lines	Patch %	Lines
client/v3/kubernetes/client.go	0.00%	27 Missing ⚠️
server/embed/etcd.go	0.00%	0 Missing and 1 partial ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

Files with missing lines	Coverage Δ
server/config/config.go	`79.76% <ø> (ø)`
server/etcdserver/server.go	`81.28% <100.00%> (-0.29%)`	⬇️
server/features/etcd_features.go	`60.00% <ø> (ø)`
server/storage/mvcc/kvstore.go	`89.76% <100.00%> (ø)`
server/embed/etcd.go	`75.77% <0.00%> (-0.05%)`	⬇️
client/v3/kubernetes/client.go	`0.00% <0.00%> (ø)`

... and 22 files with indirect coverage changes

@@            Coverage Diff             @@
##             main   #18459      +/-   ##
==========================================
+ Coverage   68.79%   68.89%   +0.10%     
==========================================
  Files         420      420              
  Lines       35490    35482       -8     
==========================================
+ Hits        24416    24446      +30     
+ Misses       9647     9613      -34     
+ Partials     1427     1423       -4

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 62e4433...0e4d412. Read the comment docs.

jmhbnz · 2024-08-18T07:00:17Z

/ok-to-test

Signed-off-by: Clement <gh.2lgqz@aleeas.com>

clement2026 · 2024-08-18T09:24:11Z

/retest pull-etcd-robustness-arm64

k8s-ci-robot · 2024-08-18T09:24:28Z

@clement2026: The /retest command does not accept any targets.
The following commands are available to trigger required jobs:

/test pull-etcd-build
/test pull-etcd-e2e-386
/test pull-etcd-e2e-amd64
/test pull-etcd-e2e-arm64
/test pull-etcd-govulncheck
/test pull-etcd-integration-1-cpu-amd64
/test pull-etcd-integration-2-cpu-amd64
/test pull-etcd-integration-4-cpu-amd64
/test pull-etcd-unit-test-386
/test pull-etcd-unit-test-amd64
/test pull-etcd-unit-test-arm64
/test pull-etcd-verify

The following commands are available to trigger optional jobs:

/test pull-etcd-robustness-amd64
/test pull-etcd-robustness-arm64

Use /test all to run all jobs.

In response to this:

/retest pull-etcd-robustness-arm64

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

clement2026 · 2024-08-18T09:25:26Z

/test pull-etcd-robustness-arm64

clement2026 · 2024-08-18T10:21:19Z

/test pull-etcd-robustness-arm64

clement2026 · 2024-08-18T21:00:33Z

Reason for the failure

The robustness test failed at panic("need non-empty snapshot") in the raft library.

This happens before the leader creates any snapshot, and raftStorage.ents has already been compacted and lacks the entries a slow follower needs. So, the slow follower requests a snapshot instead, causing the leader to panic.

https://github.com/etcd-io/raft/blob/ba8d7e340d530ae106118f059d3da4d89f933d23/raft.go#L662C1-L689

// maybeSendSnapshot fetches a snapshot from Storage, and sends it to the given
// node. Returns true iff the snapshot message has been emitted successfully.
func (r *raft) maybeSendSnapshot(to uint64, pr *tracker.Progress) bool {
	if !pr.RecentActive {
		r.logger.Debugf("ignore sending snapshot to %x since it is not recently active", to)
		return false
	}


	snapshot, err := r.raftLog.snapshot()
	if err != nil {
		if err == ErrSnapshotTemporarilyUnavailable {
			r.logger.Debugf("%x failed to send snapshot to %x because snapshot is temporarily unavailable", r.id, to)
			return false
		}
		panic(err) // TODO(bdarnell)
	}
	if IsEmptySnap(snapshot) {
		panic("need non-empty snapshot")
	}
	sindex, sterm := snapshot.Metadata.Index, snapshot.Metadata.Term
	r.logger.Debugf("%x [firstindex: %d, commit: %d] sent snapshot[index: %d, term: %d] to %x [%s]",
		r.id, r.raftLog.firstIndex(), r.raftLog.committed, sindex, sterm, to, pr)
	pr.BecomeSnapshot(sindex)
	r.logger.Debugf("%x paused sending replication messages to %x [%s]", r.id, to, pr)


	r.send(pb.Message{To: to, Type: pb.MsgSnap, Snapshot: &snapshot})
	return true
}

Visualization

Here are some images to help us understand the situation better.

Main branch

Compaction only happens right after a snapshot is created.

This PR

Compaction can happen before a snapshot is created.

Solutions

Modify the implementation of func (r *raft) maybeSendSnapshot(to uint64, pr *tracker.Progress) bool to send an empty snapshot instead of causing a panic. However, I’m not sure about the expected behavior or potential risks.
Ensure a snapshot is created if it doesn’t exist when etcd starts up. This might sound silly, I know 😂

I'm stumped right now. Any ideas would be awesome!

clement2026 · 2024-08-21T22:27:14Z

/test pull-etcd-robustness-arm64

clement2026 · 2024-08-22T04:29:29Z

/test pull-etcd-robustness-arm64

clement2026 · 2024-08-23T07:17:54Z

/test pull-etcd-robustness-arm64

serathius · 2024-08-23T09:02:21Z

Ensure a snapshot is created if it doesn’t exist when etcd starts up. This might sound silly, I know

Sounds reasonable. We are separating mechanisms for raft log compaction and snapshots, meaning that now we need to guarantee that snapshot is always available. Creating empty snapshot at the etcd start sounds like a good first iteration.

clement2026 · 2024-08-23T09:44:14Z

Sounds reasonable. We are separating mechanisms for raft log compaction and snapshots, meaning that now we need to guarantee that snapshot is always available. Creating empty snapshot at the etcd start sounds like a good first iteration.

Thank you for the info! I'll give this a shot and open a new PR soon.

k8s-ci-robot · 2024-08-31T19:58:49Z

@clement2026: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
pull-etcd-robustness-arm64	`0e4d412`	link	false	`/test pull-etcd-robustness-arm64`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

ahrtr · 2024-09-13T13:52:38Z

Compaction can happen before a snapshot is created.

leader sending a snapshot means that the leader sends the whole bbolt db snapshot. It doesn't care about what's the current applied index the follower is.

clement2026 · 2024-09-13T14:29:23Z

@ahrtr the image isn't loading; It's not set to public: https://private-user-images.githubusercontent.com/27894831/358932866-df2dbdb7-2165-492c-93b8-907e09651fd4.png

leader sending a snapshot means that the leader sends the whole bbolt db snapshot. It doesn't care about what's the current applied index the follower is.

Though the leader doesn't care about the follower, the leader itself can panic before sending out the snapshot.

Before sending a snapshot to a follower, the leader runs maybeSendSnapshot() and may panic here:

https://github.com/etcd-io/raft/blob/ba8d7e340d530ae106118f059d3da4d89f933d23/raft.go#L678-L680

	if IsEmptySnap(snapshot) {
		panic("need non-empty snapshot")
	}

All the leader wants is actually a bbolt db snapshot, not a raft log snapshot. But it seems I can't walk around maybeSendSnapshot() as it's part of the raft mechanism.

ahrtr · 2024-09-13T15:38:09Z

Though the leader doesn't care about the follower, the leader itself can panic before sending out the snapshot.

Before sending a snapshot to a follower, the leader runs maybeSendSnapshot() and may panic here:

https://github.com/etcd-io/raft/blob/ba8d7e340d530ae106118f059d3da4d89f933d23/raft.go#L678-L680
	if IsEmptySnap(snapshot) {
		panic("need non-empty snapshot")
	}

It's concerning we mix a potential bug with enhancement. Can you raise a separate issue to clearly clarify how to reproduce the issue?

clement2026 · 2024-09-13T16:04:42Z

It's concerning we mix a potential bug with enhancement. Can you raise a separate issue to clearly clarify how to reproduce the issue?

Sure, I'll create the issue a bit later.

clement2026 · 2024-09-23T13:28:09Z

After chatting with serathius, I've realized this PR isn’t needed anymore. I'll keep working on #17098 in a new PR.
Closing this.

clement2026 added 2 commits August 17, 2024 00:59

etcdserver: separate raft log compact from snapshot

d594e49

Signed-off-by: Clement <gh.2lgqz@aleeas.com>

according to benchmark results, change DefaultRaftLogCompactionStep t…

7c40239

…o 10 Signed-off-by: Clement <gh.2lgqz@aleeas.com>

k8s-ci-robot added the do-not-merge/work-in-progress label Aug 17, 2024

k8s-ci-robot added the area/testing label Aug 17, 2024

k8s-ci-robot added needs-ok-to-test size/L labels Aug 17, 2024

clement2026 changed the title ~~etcdserver: separate raft log compact from snapshot~~ [WIP] etcdserver: separate raft log compact from snapshot Aug 17, 2024

clement2026 changed the title ~~[WIP] etcdserver: separate raft log compact from snapshot~~ WIP: etcdserver: separate raft log compact from snapshot Aug 17, 2024

clement2026 changed the title ~~WIP: etcdserver: separate raft log compact from snapshot~~ etcdserver: separate raft log compact from snapshot Aug 17, 2024

clement2026 marked this pull request as ready for review August 17, 2024 09:59

k8s-ci-robot removed the do-not-merge/work-in-progress label Aug 17, 2024

k8s-ci-robot added ok-to-test and removed needs-ok-to-test labels Aug 18, 2024

fix: golangci-lint checks

0e4d412

Signed-off-by: Clement <gh.2lgqz@aleeas.com>

clement2026 mentioned this pull request Aug 25, 2024

etcdserver: creates a non-empty raft log snapshot on server startup #18494

Closed

3 tasks

clement2026 force-pushed the issue-17098-separate-raft-log-compact branch 3 times, most recently from 3c14a8f to 0e4d412 Compare August 31, 2024 19:22

This was referenced Sep 14, 2024

tests: reproduce a panic #18588

Closed

Reduce memory usage of etcd member catchup mechanism #17098

Open

clement2026 closed this Sep 23, 2024

clement2026 deleted the issue-17098-separate-raft-log-compact branch September 24, 2024 09:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

etcdserver: separate raft log compact from snapshot #18459

etcdserver: separate raft log compact from snapshot #18459

clement2026 commented Aug 17, 2024 •

edited

Loading

k8s-ci-robot commented Aug 17, 2024

k8s-ci-robot commented Aug 17, 2024

clement2026 commented Aug 17, 2024

clement2026 commented Aug 17, 2024 •

edited

Loading

codecov-commenter commented Aug 17, 2024 •

edited

Loading

jmhbnz commented Aug 18, 2024

clement2026 commented Aug 18, 2024

k8s-ci-robot commented Aug 18, 2024

clement2026 commented Aug 18, 2024

clement2026 commented Aug 18, 2024

clement2026 commented Aug 18, 2024

clement2026 commented Aug 21, 2024

clement2026 commented Aug 22, 2024

clement2026 commented Aug 23, 2024

serathius commented Aug 23, 2024

clement2026 commented Aug 23, 2024

k8s-ci-robot commented Aug 31, 2024

ahrtr commented Sep 13, 2024

clement2026 commented Sep 13, 2024

ahrtr commented Sep 13, 2024

clement2026 commented Sep 13, 2024

clement2026 commented Sep 23, 2024

etcdserver: separate raft log compact from snapshot #18459

etcdserver: separate raft log compact from snapshot #18459

Conversation

clement2026 commented Aug 17, 2024 • edited Loading

Key Changes

k8s-ci-robot commented Aug 17, 2024

k8s-ci-robot commented Aug 17, 2024

clement2026 commented Aug 17, 2024

Benchmark Summary

clement2026 commented Aug 17, 2024 • edited Loading

Benchmark Details

step_1 vs main

step_10 vs main

step_100 vs main

step_1000 vs main

main vs main

step_1 vs step_10

codecov-commenter commented Aug 17, 2024 • edited Loading

Codecov Report

jmhbnz commented Aug 18, 2024

clement2026 commented Aug 18, 2024

k8s-ci-robot commented Aug 18, 2024

clement2026 commented Aug 18, 2024

clement2026 commented Aug 18, 2024

clement2026 commented Aug 18, 2024

Reason for the failure

Visualization

Main branch

This PR

Solutions

clement2026 commented Aug 21, 2024

clement2026 commented Aug 22, 2024

clement2026 commented Aug 23, 2024

serathius commented Aug 23, 2024

clement2026 commented Aug 23, 2024

k8s-ci-robot commented Aug 31, 2024

ahrtr commented Sep 13, 2024

clement2026 commented Sep 13, 2024

ahrtr commented Sep 13, 2024

clement2026 commented Sep 13, 2024

clement2026 commented Sep 23, 2024

clement2026 commented Aug 17, 2024 •

edited

Loading

clement2026 commented Aug 17, 2024 •

edited

Loading

codecov-commenter commented Aug 17, 2024 •

edited

Loading