etcdserver: separate maybeCompactRaftLog function to compact raft log independently #18635

clement2026 · 2024-09-24T10:10:09Z

Goal: separate maybeCompactRaftLog function to compact raft log independently from snapshots.

TODO

Add tests for the changes
Reduce memory usage of etcd member catchup mechanism #17098 (comment)
testutils: add ExpectExactNTimes function to LogObserver #18637

…ly from snapshots Signed-off-by: Clement <gh.2lgqz@aleeas.com>

k8s-ci-robot · 2024-09-24T10:10:13Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: clement2026
Once this PR has been reviewed and has the lgtm label, please assign ahrtr for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot · 2024-09-24T10:10:20Z

Hi @clement2026. Thanks for your PR.

I'm waiting for a etcd-io member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

clement2026 · 2024-09-24T10:18:21Z

@serathius Please take a look. This PR focuses on one task only.

…ly from snapshots Signed-off-by: Clement <gh.2lgqz@aleeas.com>

codecov-commenter · 2024-09-24T10:28:03Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 76.66667% with 7 lines in your changes missing coverage. Please review.

Project coverage is 68.88%. Comparing base (59cfd7a) to head (841da6e).
Report is 2 commits behind head on main.

❗ Current head 841da6e differs from pull request most recent head f34bee5

Please upload reports for the commit f34bee5 to get more accurate results.

Files with missing lines	Patch %	Lines
server/storage/wal/wal.go	71.42%	2 Missing and 2 partials ⚠️
server/etcdserver/server.go	85.71%	1 Missing and 1 partial ⚠️
server/storage/wal/file_pipeline.go	0.00%	0 Missing and 1 partial ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

Files with missing lines	Coverage Δ
server/storage/wal/repair.go	`56.14% <100.00%> (ø)`
server/storage/wal/file_pipeline.go	`90.69% <0.00%> (ø)`
server/etcdserver/server.go	`81.03% <85.71%> (-0.31%)`	⬇️
server/storage/wal/wal.go	`57.73% <71.42%> (+0.27%)`	⬆️

... and 13 files with indirect coverage changes

@@            Coverage Diff             @@
##             main   #18635      +/-   ##
==========================================
+ Coverage   68.79%   68.88%   +0.08%     
==========================================
  Files         420      420              
  Lines       35522    35545      +23     
==========================================
+ Hits        24438    24485      +47     
+ Misses       9657     9639      -18     
+ Partials     1427     1421       -6

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 59cfd7a...f34bee5. Read the comment docs.

server/etcdserver/server.go

Signed-off-by: Clement <gh.2lgqz@aleeas.com>

clement2026 · 2024-09-24T15:52:43Z

I can't test ep.compacti directly because it's not accessible. However, I checked the logs and confirmed it’s working as expected.

Add log statements

	ep := etcdProgress{
		confState: sn.Metadata.ConfState,
		snapi:     sn.Metadata.Index,
		appliedt:  sn.Metadata.Term,
		appliedi:  sn.Metadata.Index,
		compacti:  fi - 1,
	}
	s.Logger().Info("initial ep.compacti", zap.Uint64("ep.compacti", ep.compacti),zap.Uint64("ep.snapi", ep.snapi))

	compacti := ep.snapi - s.Cfg.SnapshotCatchUpEntries
	if compacti <= ep.compacti {
		return
	}
	s.Logger().Info("compacti > ep.compacti", zap.Uint64("compacti", compacti), zap.Uint64("ep.compacti", ep.compacti))

Start a new etcd instance and make 110 put requests

➜  rm -rf default.etcd;  bin/etcd --experimental-snapshot-catchup-entries=5 --snapshot-count=10  2>&1 | grep 'ep.compacti'

➜ benchmark put --key-space-size=9999999 --val-size=100 --total=110

Logs:

{"level":"info","ts":"2024-09-24T23:38:10.660332+0800","caller":"etcdserver/server.go:823","msg":"initial ep.compacti","ep.compacti":0,"ep.snapi":0}
{"level":"info","ts":"2024-09-24T23:38:16.890460+0800","caller":"etcdserver/server.go:2195","msg":"compacti > ep.compacti","compacti":6,"ep.compacti":0}
{"level":"info","ts":"2024-09-24T23:38:16.959513+0800","caller":"etcdserver/server.go:2195","msg":"compacti > ep.compacti","compacti":17,"ep.compacti":6}
{"level":"info","ts":"2024-09-24T23:38:17.017577+0800","caller":"etcdserver/server.go:2195","msg":"compacti > ep.compacti","compacti":28,"ep.compacti":17}
{"level":"info","ts":"2024-09-24T23:38:17.096548+0800","caller":"etcdserver/server.go:2195","msg":"compacti > ep.compacti","compacti":39,"ep.compacti":28}
{"level":"info","ts":"2024-09-24T23:38:17.159693+0800","caller":"etcdserver/server.go:2195","msg":"compacti > ep.compacti","compacti":50,"ep.compacti":39}
{"level":"info","ts":"2024-09-24T23:38:17.238754+0800","caller":"etcdserver/server.go:2195","msg":"compacti > ep.compacti","compacti":61,"ep.compacti":50}
{"level":"info","ts":"2024-09-24T23:38:17.316651+0800","caller":"etcdserver/server.go:2195","msg":"compacti > ep.compacti","compacti":72,"ep.compacti":61}
{"level":"info","ts":"2024-09-24T23:38:17.391588+0800","caller":"etcdserver/server.go:2195","msg":"compacti > ep.compacti","compacti":83,"ep.compacti":72}
{"level":"info","ts":"2024-09-24T23:38:17.460776+0800","caller":"etcdserver/server.go:2195","msg":"compacti > ep.compacti","compacti":94,"ep.compacti":83}
{"level":"info","ts":"2024-09-24T23:38:17.539552+0800","caller":"etcdserver/server.go:2195","msg":"compacti > ep.compacti","compacti":105,"ep.compacti":94}
^C

Restart the etcd instance and make 50 put requests

➜  bin/etcd --experimental-snapshot-catchup-entries=5 --snapshot-count=10  2>&1 | grep 'ep.compacti'

➜ benchmark put --key-space-size=9999999 --val-size=100 --total=50

Logs:

{"level":"info","ts":"2024-09-24T23:38:24.076007+0800","caller":"etcdserver/server.go:823","msg":"initial ep.compacti","ep.compacti":110,"ep.snapi":110}
{"level":"info","ts":"2024-09-24T23:39:16.807112+0800","caller":"etcdserver/server.go:2195","msg":"compacti > ep.compacti","compacti":116,"ep.compacti":110}
{"level":"info","ts":"2024-09-24T23:39:16.880017+0800","caller":"etcdserver/server.go:2195","msg":"compacti > ep.compacti","compacti":127,"ep.compacti":116}
{"level":"info","ts":"2024-09-24T23:39:16.941133+0800","caller":"etcdserver/server.go:2195","msg":"compacti > ep.compacti","compacti":138,"ep.compacti":127}
{"level":"info","ts":"2024-09-24T23:39:17.012076+0800","caller":"etcdserver/server.go:2195","msg":"compacti > ep.compacti","compacti":149,"ep.compacti":138}
{"level":"info","ts":"2024-09-24T23:39:17.075207+0800","caller":"etcdserver/server.go:2195","msg":"compacti > ep.compacti","compacti":160,"ep.compacti":149}

server/etcdserver/server.go

serathius · 2024-09-25T07:35:46Z

server/etcdserver/server.go

@@ -813,6 +818,7 @@ func (s *EtcdServer) run() {
 		snapi:     sn.Metadata.Index,
 		appliedt:  sn.Metadata.Term,
 		appliedi:  sn.Metadata.Index,
+		compacti:  fi - 1,


Would be good to add a comment why we substract one. Compact index should be the last index that raftStorage.Compact was called. Whether we should substract depend on whether the Compact operation is inclusive or exclusive. Need to confirm this.

serathius · 2024-09-25T07:37:25Z

server/etcdserver/server.go

-
-	err = s.r.raftStorage.Compact(compacti)
+	err := s.r.raftStorage.Compact(compacti)
+	ep.compacti = compacti


I think we should update compact id if there was no error.

Signed-off-by: Clement <gh.2lgqz@aleeas.com>

server/etcdserver/memory_storage_test.go

serathius · 2024-09-25T08:49:23Z

server/etcdserver/memory_storage_test.go

+// TestMemoryStorageCompaction tests that after calling raftStorage.Compact(compacti)
+// without errors, the dummy entry becomes {Index: compacti} and
+// raftStorage.FirstIndex() returns (compacti+1, nil).
+func TestMemoryStorageCompaction(t *testing.T) {


Thanks for the test, it's really great to protect our assumption via test.

server/etcdserver/memory_storage_test.go

server/etcdserver/server.go

server/etcdserver/memory_storage_test.go

tests/integration/raft_log_test.go

Signed-off-by: Clement <gh.2lgqz@aleeas.com>

serathius · 2024-09-25T11:45:17Z

tests/integration/raft_log_test.go

+		}
+	}
+	// The first snapshot and compaction shouldn't happen because applied index is less than 11
+	logOccurredAtMostNTimes(t, mem, 5*time.Second, "saved snapshot", 0)


Those timeouts are very high, can we not have as big timeout? Snapshot and compaction should happen synchronously to operations, meaning logs should be available immidietly.

Alright, I'll reduce the timeout to 1 or 2 seconds second and see how it works out.

serathius · 2024-09-25T11:45:43Z

tests/integration/raft_log_test.go

+	}
+	// The first snapshot and compaction shouldn't happen because applied index is less than 11
+	logOccurredAtMostNTimes(t, mem, 5*time.Second, "saved snapshot", 0)
+	logOccurredAtMostNTimes(t, mem, time.Second, "compacted Raft logs", 0)


Is there a reason sometimes we wait for 1 second, sometimes for 5 seconds?

The first logOccurredAtMostNTimes waits for 5 seconds. Once it's done, we can assume the log is synced up, so the second logOccurredAtMostNTimes doesn't need to wait that long.

Logs should appear in in matter of tens of miliseconds not multiple seconds. The whole test should take seconds.

tests/integration/raft_log_test.go

serathius · 2024-09-25T11:48:34Z

tests/integration/raft_log_test.go

+// logOccurredAtMostNTimes ensures that the log has exactly `count` occurrences of `s` before timing out, no more, no less.
+func logOccurredAtMostNTimes(t *testing.T, m *integration.Member, timeout time.Duration, s string, count int) {


Suggested change

// logOccurredAtMostNTimes ensures that the log has exactly `count` occurrences of `s` before timing out, no more, no less.

func logOccurredAtMostNTimes(t *testing.T, m *integration.Member, timeout time.Duration, s string, count int) {

func logOccurredExactlyNTimes(t *testing.T, m *integration.Member, timeout time.Duration, s string, count int) {

tests/integration/raft_log_test.go

ahrtr · 2024-09-25T13:40:48Z

server/etcdserver/server.go

+	// Retain all log entries up to the latest snapshot index to ensure any member can recover from that snapshot.
+	// Beyond the snapshot index, preserve the most recent s.Cfg.SnapshotCatchUpEntries entries in memory.
+	// This allows slow followers to catch up by synchronizing entries instead of requiring a full snapshot transfer.
+	if ep.snapi <= s.Cfg.SnapshotCatchUpEntries {


Suggested change

if ep.snapi <= s.Cfg.SnapshotCatchUpEntries {

if ep.appliedi <= s.Cfg.SnapshotCatchUpEntries {

snapi is more correct here, the reason is described in the comment.

ahrtr · 2024-09-25T13:40:58Z

server/etcdserver/server.go

+		return
+	}
+
+	compacti := ep.snapi - s.Cfg.SnapshotCatchUpEntries


Suggested change

compacti := ep.snapi - s.Cfg.SnapshotCatchUpEntries

compacti := ep.appliedi - s.Cfg.SnapshotCatchUpEntries

What's the purpose of the change?
If you use ep.snapi, then the behaviour is exactly the same as existing behaviour, because ep.snapi only gets updated each time after creating the (v2) snapshot.

@ahrtr #18622 made a lot of changes, but Serathius and I agreed to keep PRs small. So, in this PR, we just separated the compaction from the snapshot while keeping the existing behavior.

Please see #17098 (comment). Can we get that confirmed firstly?

This change doesn't change when compaction is run or how many times it's executed. I was aware of https://github.com/etcd-io/raft/blob/5d6eb55c4e6929e461997c9113aba99a5148e921/storage.go#L266-L269 code, that's why I was proposing compacting only ever X applies.

ahrtr · 2024-09-25T15:03:09Z

server/etcdserver/server.go

@@ -980,6 +989,7 @@ func (s *EtcdServer) applyAll(ep *etcdProgress, apply *toApply) {
 	<-apply.notifyc

 	s.triggerSnapshot(ep)
+	s.maybeCompactRaftLog(ep)


If you really want to get this small merged firstly, then please ensure it's as independent as possible. Currently the s.snapshot performs both snapshot and compaction operations. It makes sense to extract the compaction operation as an independent function/method, but let's call the method inside s.triggerSnapshot,

func (s *EtcdServer) triggerSnapshot(ep *etcdProgress) { if !s.shouldSnapshot(ep) { return } lg := s.Logger() lg.Info( "triggering snapshot", zap.String("local-member-id", s.MemberID().String()), zap.Uint64("local-member-applied-index", ep.appliedi), zap.Uint64("local-member-snapshot-index", ep.snapi), zap.Uint64("local-member-snapshot-count", s.Cfg.SnapshotCount), zap.Bool("snapshot-forced", s.forceSnapshot), ) s.forceSnapshot = false s.snapshot(ep.appliedi, ep.confState) ep.snapi = ep.appliedi s.compact(xxxxx) // call the new method here, so we still do it each time after creating a snapshot. }

No rush on merging this PR. If we do merge it, we need to ensure etcd actually benefits from it. Let's resolve #17098 (comment) first.

The goal of the PR is to make compaction independent from snapshot. Not just refactoring it to function.

Not just refactor the function.

Just refactoring the function (extract the compact into a separate method) is an independent and safe change, accordingly can be merged soon.

The goal of the PR is to make compaction independent from snapshot

It modifies the logic/semantics, so it's no longer an independent change.

"Refactoring it to function" is a subset of "refactoring".

To be clearer about #18635 (comment), I am proposing an independent & safe minor refactoring below as the very first step

ahrtr@efae0d2

serathius · 2024-09-26T07:32:29Z

tests/integration/raft_log_test.go

+	// The first snapshot and compaction should happen because applied index is 11
+	logOccurredAtMostNTimes(t, mem, 5*time.Second, "saved snapshot", 1)
+	logOccurredAtMostNTimes(t, mem, time.Second, "compacted Raft logs", 1)
+	expectMemberLog(t, mem, time.Second, "\"compact-index\": 6", 1)


Do we really need to check if compacted Raft log occured at most N times? This is hard to check, why checking if "compact-index": X is not enough ?

k8s-ci-robot · 2024-10-30T19:27:47Z

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

separate maybeCompactRaftLog function to compact raft log independent…

dff5305

…ly from snapshots Signed-off-by: Clement <gh.2lgqz@aleeas.com>

k8s-ci-robot added do-not-merge/work-in-progress needs-ok-to-test labels Sep 24, 2024

k8s-ci-robot added the size/S label Sep 24, 2024

clement2026 changed the title ~~separate maybeCompactRaftLog function to compact raft log independent…~~ separate maybeCompactRaftLog function to compact raft log independently Sep 24, 2024

clement2026 changed the title ~~separate maybeCompactRaftLog function to compact raft log independently~~ etcdserver: separate maybeCompactRaftLog function to compact raft log independently Sep 24, 2024

clement2026 marked this pull request as ready for review September 24, 2024 10:18

k8s-ci-robot removed the do-not-merge/work-in-progress label Sep 24, 2024

clement2026 added 2 commits September 24, 2024 18:25

separate maybeCompactRaftLog function to compact raft log independent…

9b04e4d

…ly from snapshots Signed-off-by: Clement <gh.2lgqz@aleeas.com>

separate maybeCompactRaftLog function to compact raft log independent…

1d08b34

…ly from snapshots Signed-off-by: Clement <gh.2lgqz@aleeas.com>

serathius reviewed Sep 24, 2024

View reviewed changes

server/etcdserver/server.go Show resolved Hide resolved

add compacti field to etcdProgress; add a test case

06a2cb7

Signed-off-by: Clement <gh.2lgqz@aleeas.com>

k8s-ci-robot added area/testing size/L and removed size/S labels Sep 24, 2024

serathius reviewed Sep 25, 2024

View reviewed changes

server/etcdserver/server.go Outdated Show resolved Hide resolved

serathius reviewed Sep 25, 2024

View reviewed changes

clement2026 added 2 commits September 25, 2024 16:44

apply suggestions

2028c24

Signed-off-by: Clement <gh.2lgqz@aleeas.com>

apply suggestions

96d8985

Signed-off-by: Clement <gh.2lgqz@aleeas.com>

serathius reviewed Sep 25, 2024

View reviewed changes

server/etcdserver/memory_storage_test.go Outdated Show resolved Hide resolved

serathius reviewed Sep 25, 2024

View reviewed changes

server/etcdserver/memory_storage_test.go Outdated Show resolved Hide resolved

serathius reviewed Sep 25, 2024

View reviewed changes

server/etcdserver/server.go Outdated Show resolved Hide resolved

serathius reviewed Sep 25, 2024

View reviewed changes

server/etcdserver/memory_storage_test.go Outdated Show resolved Hide resolved

serathius reviewed Sep 25, 2024

View reviewed changes

tests/integration/raft_log_test.go Outdated Show resolved Hide resolved

serathius reviewed Sep 25, 2024

View reviewed changes

tests/integration/raft_log_test.go Outdated Show resolved Hide resolved

serathius reviewed Sep 25, 2024

View reviewed changes

tests/integration/raft_log_test.go Outdated Show resolved Hide resolved

serathius reviewed Sep 25, 2024

View reviewed changes

tests/integration/raft_log_test.go Outdated Show resolved Hide resolved

serathius reviewed Sep 25, 2024

View reviewed changes

tests/integration/raft_log_test.go Outdated Show resolved Hide resolved

clement2026 added 2 commits September 25, 2024 17:36

apply suggestions

7f0ada2

Signed-off-by: Clement <gh.2lgqz@aleeas.com>

apply suggestions: at most N occurrences

bc9a3fa

Signed-off-by: Clement <gh.2lgqz@aleeas.com>

clement2026 force-pushed the compact-raft-log-independently-from-snapshots branch from 99db0bb to bc9a3fa Compare September 25, 2024 10:52

typo

f34bee5

Signed-off-by: Clement <gh.2lgqz@aleeas.com>

clement2026 force-pushed the compact-raft-log-independently-from-snapshots branch from 7a5bfd6 to f34bee5 Compare September 25, 2024 10:58

serathius reviewed Sep 25, 2024

View reviewed changes

tests/integration/raft_log_test.go Show resolved Hide resolved

serathius reviewed Sep 25, 2024

View reviewed changes

tests/integration/raft_log_test.go Show resolved Hide resolved

ahrtr reviewed Sep 25, 2024

View reviewed changes

clement2026 mentioned this pull request Sep 25, 2024

testutils: add ExpectExactNTimes function to LogObserver #18637

Draft

serathius reviewed Sep 26, 2024

View reviewed changes

k8s-ci-robot added the needs-rebase label Oct 30, 2024

serathius mentioned this pull request Nov 2, 2024

Run a separate in memory snapshot to reduce number of entries stored in raft memory storage #18825

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

etcdserver: separate maybeCompactRaftLog function to compact raft log independently #18635

etcdserver: separate maybeCompactRaftLog function to compact raft log independently #18635

clement2026 commented Sep 24, 2024 •

edited

Loading

k8s-ci-robot commented Sep 24, 2024

k8s-ci-robot commented Sep 24, 2024

clement2026 commented Sep 24, 2024

codecov-commenter commented Sep 24, 2024 •

edited

Loading

clement2026 commented Sep 24, 2024

serathius Sep 25, 2024

serathius Sep 25, 2024

serathius Sep 25, 2024

serathius Sep 25, 2024

clement2026 Sep 25, 2024

serathius Sep 25, 2024

clement2026 Sep 25, 2024

serathius Sep 25, 2024 •

edited

Loading

serathius Sep 25, 2024

ahrtr Sep 25, 2024

serathius Sep 26, 2024 •

edited

Loading

ahrtr Sep 25, 2024

ahrtr Sep 25, 2024

clement2026 Sep 25, 2024

ahrtr Sep 25, 2024

serathius Sep 26, 2024

ahrtr Sep 25, 2024

clement2026 Sep 25, 2024

serathius Sep 26, 2024 •

edited

Loading

ahrtr Sep 26, 2024

serathius Sep 26, 2024

ahrtr Sep 27, 2024

serathius Sep 26, 2024

k8s-ci-robot commented Oct 30, 2024

		// logOccurredAtMostNTimes ensures that the log has exactly `count` occurrences of `s` before timing out, no more, no less.
		func logOccurredAtMostNTimes(t testing.T, m integration.Member, timeout time.Duration, s string, count int) {

	if ep.snapi <= s.Cfg.SnapshotCatchUpEntries {
	if ep.appliedi <= s.Cfg.SnapshotCatchUpEntries {

	compacti := ep.snapi - s.Cfg.SnapshotCatchUpEntries
	compacti := ep.appliedi - s.Cfg.SnapshotCatchUpEntries

etcdserver: separate maybeCompactRaftLog function to compact raft log independently #18635

Are you sure you want to change the base?

etcdserver: separate maybeCompactRaftLog function to compact raft log independently #18635

Conversation

clement2026 commented Sep 24, 2024 • edited Loading

TODO

k8s-ci-robot commented Sep 24, 2024

k8s-ci-robot commented Sep 24, 2024

clement2026 commented Sep 24, 2024

codecov-commenter commented Sep 24, 2024 • edited Loading

Codecov Report

clement2026 commented Sep 24, 2024

Add log statements

Start a new etcd instance and make 110 put requests

Restart the etcd instance and make 50 put requests

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

serathius Sep 25, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

serathius Sep 26, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

serathius Sep 26, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

k8s-ci-robot commented Oct 30, 2024

clement2026 commented Sep 24, 2024 •

edited

Loading

codecov-commenter commented Sep 24, 2024 •

edited

Loading

serathius Sep 25, 2024 •

edited

Loading

serathius Sep 26, 2024 •

edited

Loading

serathius Sep 26, 2024 •

edited

Loading