-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] fix revision loss issue caused by compaction - 17780 #17815
Conversation
Skipping CI for Draft Pull Request. |
Updated that case to review. Assume that compacted main revision is The |
Thanks @fuweid Confirmed that this is a real issue. My previous comment #17780 (comment) isn't correct. The root cause is that Compaction will remove the last revision if it's a tombstone revision. One solution I can think of is to add a protection: Never compact the latest revision, no matter it's a tombstone or not.
|
Awesome finding @fuweid, can you confirm if the issue affects v3.4 or v3.5? |
Reproduced it on release-3.4 so I expect it's also on 3.5. |
For release-3.4, I add this patch and we can reproduce it in 3.4 Click!commit 366110808a969e2ace1ee7575868ed916b4f0af6 (HEAD -> with-compactBeforeSetFinishedCompact-34)
Author: Wei Fu <fuweid89@gmail.com>
Date: Wed Apr 17 23:47:30 2024 +0800
--wip-- [skip ci]
Signed-off-by: Wei Fu <fuweid89@gmail.com>
diff --git a/mvcc/kvstore_compaction.go b/mvcc/kvstore_compaction.go
index 963ebe950..10035e897 100644
--- a/mvcc/kvstore_compaction.go
+++ b/mvcc/kvstore_compaction.go
@@ -49,6 +49,7 @@ func (s *store) scheduleCompaction(compactMainRev int64, keep map[revision]struc
}
if len(keys) < s.cfg.CompactionBatchLimit {
+ // gofail: var compactBeforeSetFinishedCompact struct{}
rbytes := make([]byte, 8+1+8)
revToBytes(revision{main: compactMainRev}, rbytes)
tx.UnsafePut(metaBucketName, finishedCompactKeyName, rbytes) /tmp/17780 -test.v -test.run TestReproduce17780
=== RUN TestReproduce17780
before.go:36: Changing working directory to: /tmp/TestReproduce177801122229632/001
logger.go:146: 2024-04-17T23:56:21.409+0800 INFO starting server... {"name": "TestReproduce17780-test-0"}
logger.go:146: 2024-04-17T23:56:21.409+0800 INFO spawning process {"args": ["/home/fuweid/workspace/etcd/bin/etcd", "--name=TestReproduce17780-test-0", "--listen-client-urls=http://localhost:20000", "--advertise-client-urls=http://localhost:20000", "--listen-peer-urls=http://localhost:20001", "--initial-advertise-peer-urls=http://localhost:20001", "--initial-cluster-token=new", "--data-dir", "/tmp/TestReproduce177801122229632/002", "--snapshot-count=1000", "--experimental-compaction-batch-limit=100", "--experimental-watch-progress-notify-interval=100ms", "--initial-cluster-token=new", "--snapshot-count=1000", "--initial-cluster=TestReproduce17780-test-0=http://localhost:20001", "--initial-cluster-state=new"], "working-dir": "/tmp/TestReproduce177801122229632/001", "name": "TestReproduce17780-test-0", "environment-variables": ["GOFAIL_HTTP=127.0.0.1:12381", "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/usr/lib/wsl/lib:/mnt/c/Program Files (x86)/Microsoft SDKs/Azure/CLI2/wbin:/mnt/c/WINDOWS/system32:/mnt/c/WINDOWS:/mnt/c/WINDOWS/System32/Wbem:/mnt/c/WINDOWS/System32/WindowsPowerShell/v1.0/:/mnt/c/WINDOWS/System32/OpenSSH/:/mnt/c/Program Files/dotnet/:/mnt/c/Program Files/Go/bin:/mnt/c/Program Files/Git/cmd:/mnt/c/Users/weifu/AppData/Local/Microsoft/WindowsApps:/mnt/c/Users/weifu/AppData/Local/Programs/Microsoft VS Code/bin:/mnt/c/Users/weifu/go/bin:/home/fuweid/.fzf/bin:/usr/local/go/bin:/home/fuweid/go/bin:/opt/bin:/opt/fuwei/bin:/usr/local/go/bin:/home/fuweid/go/bin:/opt/bin:/opt/fuwei/bin", "EXPECT_DEBUG=true", "ETCD_UNSUPPORTED_ARCH=amd64", "ETCD_VERIFY=all"]}
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312607): WARNING: Package "github.com/golang/protobuf/protoc-gen-go/generator" is deprecated.
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312607): A future release of golang/protobuf will delete this package,
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312607): which has long been excluded from the compatibility promise.
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312607):
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312607): 2024-04-17 23:56:21.425693 W | pkg/flags: unrecognized environment variable ETCD_UNSUPPORTED_ARCH=amd64
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312607): 2024-04-17 23:56:21.425754 W | pkg/flags: unrecognized environment variable ETCD_VERIFY=all
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312607): [WARNING] Deprecated '--logger=capnslog' flag is set; use '--logger=zap' flag instead
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312607): 2024-04-17 23:56:21.425776 W | embed: Running http and grpc server on single port. This is not recommended for production.
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312607): 2024-04-17 23:56:21.425796 I | etcdmain: etcd Version: 3.4.31
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312607): 2024-04-17 23:56:21.425799 I | etcdmain: Git SHA: 366110808-FAILPOINTS
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312607): 2024-04-17 23:56:21.425802 I | etcdmain: Go Version: go1.22.1
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312607): 2024-04-17 23:56:21.425804 I | etcdmain: Go OS/Arch: linux/amd64
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312607): 2024-04-17 23:56:21.425822 I | etcdmain: setting maximum number of CPUs to 8, total number of available CPUs is 8
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312607): [WARNING] Deprecated '--logger=capnslog' flag is set; use '--logger=zap' flag instead
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312607): 2024-04-17 23:56:21.425892 W | embed: Running http and grpc server on single port. This is not recommended for production.
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312607): 2024-04-17 23:56:21.426226 I | embed: name = TestReproduce17780-test-0
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312607): 2024-04-17 23:56:21.426293 I | embed: data dir = /tmp/TestReproduce177801122229632/002
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312607): 2024-04-17 23:56:21.426310 I | embed: member dir = /tmp/TestReproduce177801122229632/002/member
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312607): 2024-04-17 23:56:21.426313 I | embed: heartbeat = 100ms
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312607): 2024-04-17 23:56:21.426316 I | embed: election = 1000ms
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312607): 2024-04-17 23:56:21.426318 I | embed: snapshot count = 1000
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312607): 2024-04-17 23:56:21.426324 I | embed: advertise client URLs = http://localhost:20000
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312607): 2024-04-17 23:56:21.437774 I | etcdserver: starting member ca50e9357181d758 in cluster 34f27e83b3bc2ff
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312607): raft2024/04/17 23:56:21 INFO: ca50e9357181d758 switched to configuration voters=()
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312607): raft2024/04/17 23:56:21 INFO: ca50e9357181d758 became follower at term 0
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312607): raft2024/04/17 23:56:21 INFO: newRaft ca50e9357181d758 [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312607): raft2024/04/17 23:56:21 INFO: ca50e9357181d758 became follower at term 1
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312607): raft2024/04/17 23:56:21 INFO: ca50e9357181d758 switched to configuration voters=(14578408409545168728)
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312607): 2024-04-17 23:56:21.443107 W | auth: simple token is not cryptographically signed
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312607): 2024-04-17 23:56:21.447076 I | etcdserver: starting server... [version: 3.4.31, cluster version: to_be_decided]
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312607): 2024-04-17 23:56:21.447299 I | etcdserver: ca50e9357181d758 as single-node; fast-forwarding 9 ticks (election ticks 10)
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312607): 2024-04-17 23:56:21.447373 I | pkg/fileutil: started to purge file, dir: /tmp/TestReproduce177801122229632/002/member/snap, suffix: snap.db, max: 5, interval: 30s
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312607): 2024-04-17 23:56:21.447404 I | pkg/fileutil: started to purge file, dir: /tmp/TestReproduce177801122229632/002/member/snap, suffix: snap, max: 5, interval: 30s
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312607): 2024-04-17 23:56:21.447413 I | pkg/fileutil: started to purge file, dir: /tmp/TestReproduce177801122229632/002/member/wal, suffix: wal, max: 5, interval: 30s
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312607): 2024-04-17 23:56:21.448894 I | embed: listening for peers on 127.0.0.1:20001
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312607): raft2024/04/17 23:56:21 INFO: ca50e9357181d758 switched to configuration voters=(14578408409545168728)
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312607): 2024-04-17 23:56:21.449677 I | etcdserver/membership: added member ca50e9357181d758 [http://localhost:20001] to cluster 34f27e83b3bc2ff
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312607): raft2024/04/17 23:56:22 INFO: ca50e9357181d758 is starting a new election at term 1
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312607): raft2024/04/17 23:56:22 INFO: ca50e9357181d758 became candidate at term 2
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312607): raft2024/04/17 23:56:22 INFO: ca50e9357181d758 received MsgVoteResp from ca50e9357181d758 at term 2
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312607): raft2024/04/17 23:56:22 INFO: ca50e9357181d758 became leader at term 2
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312607): raft2024/04/17 23:56:22 INFO: raft.node: ca50e9357181d758 elected leader ca50e9357181d758 at term 2
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312607): 2024-04-17 23:56:22.141822 I | etcdserver: published {Name:TestReproduce17780-test-0 ClientURLs:[http://localhost:20000]} to cluster 34f27e83b3bc2ff
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312607): 2024-04-17 23:56:22.141959 I | embed: ready to serve client requests
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312607): 2024-04-17 23:56:22.142177 I | etcdserver: setting up the initial cluster version to 3.4
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312607): 2024-04-17 23:56:22.143454 N | embed: serving insecure client requests on 127.0.0.1:20000, this is strongly discouraged!
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312607): 2024-04-17 23:56:22.143564 N | etcdserver/membership: set the initial cluster version to 3.4
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312607): 2024-04-17 23:56:22.143635 I | etcdserver/api: enabled capabilities for version 3.4
logger.go:146: 2024-04-17T23:56:22.143+0800 INFO started server. {"name": "TestReproduce17780-test-0", "pid": 312607}
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312607): 2024-04-17 23:56:22.322070 I | mvcc: store.index: compact 201
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312607): panic: failpoint panic: {}
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312607):
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312607): goroutine 179 [running]:
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312607): go.etcd.io/gofail/runtime.actPanic(0x1010100052200?)
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312607): /home/fuweid/go/pkg/mod/go.etcd.io/gofail@v0.1.0/runtime/terms.go:318 +0x65
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312607): go.etcd.io/gofail/runtime.(*term).do(...)
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312607): /home/fuweid/go/pkg/mod/go.etcd.io/gofail@v0.1.0/runtime/terms.go:290
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312607): go.etcd.io/gofail/runtime.(*terms).eval(0xc0000c9628?)
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312607): /home/fuweid/go/pkg/mod/go.etcd.io/gofail@v0.1.0/runtime/terms.go:105 +0xe3
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312607): go.etcd.io/gofail/runtime.(*Failpoint).Acquire(0xc0000c9628?)
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312607): /home/fuweid/go/pkg/mod/go.etcd.io/gofail@v0.1.0/runtime/failpoint.go:38 +0x98
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312607): go.etcd.io/etcd/mvcc.(*store).scheduleCompaction(0xc000292690, 0xc9, 0xc00060c570)
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312607): /home/fuweid/workspace/etcd/mvcc/kvstore_compaction.go:52 +0x4c5
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312607): go.etcd.io/etcd/mvcc.(*store).compact.func1({0x11a8f38, 0xc000130190})
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312607): /home/fuweid/workspace/etcd/mvcc/kvstore.go:289 +0xe7
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312607): go.etcd.io/etcd/pkg/schedule.(*fifo).run(0xc0003324e0)
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312607): /home/fuweid/workspace/etcd/pkg/schedule/schedule.go:157 +0x105
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312607): created by go.etcd.io/etcd/pkg/schedule.NewFIFOScheduler in goroutine 1
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312607): /home/fuweid/workspace/etcd/pkg/schedule/schedule.go:70 +0x156
{"level":"warn","ts":"2024-04-17T23:56:22.350004+0800","logger":"etcd-client","caller":"v3/retry_interceptor.go:65","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0001fa000/localhost:20000","method":"/etcdserverpb.KV/Compact","attempt":0,"error":"rpc error: code = Unavailable desc = error reading from server: EOF"}
logger.go:146: 2024-04-17T23:56:22.350+0800 INFO restarting server... {"name": "TestReproduce17780-test-0"}
logger.go:146: 2024-04-17T23:56:22.350+0800 INFO stopping server... {"name": "TestReproduce17780-test-0"}
logger.go:146: 2024-04-17T23:56:22.350+0800 INFO stopped server. {"name": "TestReproduce17780-test-0"}
logger.go:146: 2024-04-17T23:56:22.350+0800 INFO starting server... {"name": "TestReproduce17780-test-0"}
logger.go:146: 2024-04-17T23:56:22.350+0800 INFO spawning process {"args": ["/home/fuweid/workspace/etcd/bin/etcd", "--name=TestReproduce17780-test-0", "--listen-client-urls=http://localhost:20000", "--advertise-client-urls=http://localhost:20000", "--listen-peer-urls=http://localhost:20001", "--initial-advertise-peer-urls=http://localhost:20001", "--initial-cluster-token=new", "--data-dir", "/tmp/TestReproduce177801122229632/002", "--snapshot-count=1000", "--experimental-compaction-batch-limit=100", "--experimental-watch-progress-notify-interval=100ms", "--initial-cluster-token=new", "--snapshot-count=1000", "--initial-cluster=TestReproduce17780-test-0=http://localhost:20001", "--initial-cluster-state=new"], "working-dir": "/tmp/TestReproduce177801122229632/001", "name": "TestReproduce17780-test-0", "environment-variables": ["GOFAIL_HTTP=127.0.0.1:12381", "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/usr/lib/wsl/lib:/mnt/c/Program Files (x86)/Microsoft SDKs/Azure/CLI2/wbin:/mnt/c/WINDOWS/system32:/mnt/c/WINDOWS:/mnt/c/WINDOWS/System32/Wbem:/mnt/c/WINDOWS/System32/WindowsPowerShell/v1.0/:/mnt/c/WINDOWS/System32/OpenSSH/:/mnt/c/Program Files/dotnet/:/mnt/c/Program Files/Go/bin:/mnt/c/Program Files/Git/cmd:/mnt/c/Users/weifu/AppData/Local/Microsoft/WindowsApps:/mnt/c/Users/weifu/AppData/Local/Programs/Microsoft VS Code/bin:/mnt/c/Users/weifu/go/bin:/home/fuweid/.fzf/bin:/usr/local/go/bin:/home/fuweid/go/bin:/opt/bin:/opt/fuwei/bin:/usr/local/go/bin:/home/fuweid/go/bin:/opt/bin:/opt/fuwei/bin", "EXPECT_DEBUG=true", "ETCD_UNSUPPORTED_ARCH=amd64", "ETCD_VERIFY=all"]}
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312622): WARNING: Package "github.com/golang/protobuf/protoc-gen-go/generator" is deprecated.
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312622): A future release of golang/protobuf will delete this package,
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312622): which has long been excluded from the compatibility promise.
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312622):
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312622): 2024-04-17 23:56:22.363408 W | pkg/flags: unrecognized environment variable ETCD_UNSUPPORTED_ARCH=amd64
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312622): 2024-04-17 23:56:22.363457 W | pkg/flags: unrecognized environment variable ETCD_VERIFY=all
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312622): [WARNING] Deprecated '--logger=capnslog' flag is set; use '--logger=zap' flag instead
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312622): 2024-04-17 23:56:22.363472 W | embed: Running http and grpc server on single port. This is not recommended for production.
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312622): 2024-04-17 23:56:22.363475 I | etcdmain: etcd Version: 3.4.31
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312622): 2024-04-17 23:56:22.363477 I | etcdmain: Git SHA: 366110808-FAILPOINTS
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312622): 2024-04-17 23:56:22.363478 I | etcdmain: Go Version: go1.22.1
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312622): 2024-04-17 23:56:22.363480 I | etcdmain: Go OS/Arch: linux/amd64
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312622): 2024-04-17 23:56:22.363481 I | etcdmain: setting maximum number of CPUs to 8, total number of available CPUs is 8
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312622): 2024-04-17 23:56:22.363516 N | etcdmain: the server is already initialized as member before, starting as etcd member...
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312622): [WARNING] Deprecated '--logger=capnslog' flag is set; use '--logger=zap' flag instead
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312622): 2024-04-17 23:56:22.363541 W | embed: Running http and grpc server on single port. This is not recommended for production.
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312622): 2024-04-17 23:56:22.363770 I | embed: name = TestReproduce17780-test-0
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312622): 2024-04-17 23:56:22.363776 I | embed: data dir = /tmp/TestReproduce177801122229632/002
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312622): 2024-04-17 23:56:22.363778 I | embed: member dir = /tmp/TestReproduce177801122229632/002/member
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312622): 2024-04-17 23:56:22.363780 I | embed: heartbeat = 100ms
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312622): 2024-04-17 23:56:22.363782 I | embed: election = 1000ms
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312622): 2024-04-17 23:56:22.363785 I | embed: snapshot count = 1000
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312622): 2024-04-17 23:56:22.363797 I | embed: advertise client URLs = http://localhost:20000
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312622): 2024-04-17 23:56:22.363816 I | embed: initial advertise peer URLs = http://localhost:20001
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312622): 2024-04-17 23:56:22.363821 I | embed: initial cluster =
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312622): 2024-04-17 23:56:22.369763 I | etcdserver: restarting member ca50e9357181d758 in cluster 34f27e83b3bc2ff at commit index 205
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312622): raft2024/04/17 23:56:22 INFO: ca50e9357181d758 switched to configuration voters=()
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312622): raft2024/04/17 23:56:22 INFO: ca50e9357181d758 became follower at term 2
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312622): raft2024/04/17 23:56:22 INFO: newRaft ca50e9357181d758 [peers: [], term: 2, commit: 205, applied: 0, lastindex: 205, lastterm: 2]
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312622): 2024-04-17 23:56:22.370883 W | auth: simple token is not cryptographically signed
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312622): 2024-04-17 23:56:22.374195 I | mvcc: resume scheduled compaction at 201
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312622): 2024-04-17 23:56:22.374969 I | etcdserver: starting server... [version: 3.4.31, cluster version: to_be_decided]
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312622): 2024-04-17 23:56:22.375050 I | pkg/fileutil: started to purge file, dir: /tmp/TestReproduce177801122229632/002/member/snap, suffix: snap.db, max: 5, interval: 30s
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312622): 2024-04-17 23:56:22.375073 I | pkg/fileutil: started to purge file, dir: /tmp/TestReproduce177801122229632/002/member/snap, suffix: snap, max: 5, interval: 30s
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312622): 2024-04-17 23:56:22.375091 I | pkg/fileutil: started to purge file, dir: /tmp/TestReproduce177801122229632/002/member/wal, suffix: wal, max: 5, interval: 30s
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312622): raft2024/04/17 23:56:22 INFO: ca50e9357181d758 switched to configuration voters=(14578408409545168728)
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312622): 2024-04-17 23:56:22.375360 I | etcdserver/membership: added member ca50e9357181d758 [http://localhost:20001] to cluster 34f27e83b3bc2ff
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312622): 2024-04-17 23:56:22.375429 N | etcdserver/membership: set the initial cluster version to 3.4
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312622): 2024-04-17 23:56:22.375457 I | etcdserver/api: enabled capabilities for version 3.4
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312622): 2024-04-17 23:56:22.376267 W | etcdserver: failed to apply request "compaction:<revision:201 physical:true > header:<ID:15517309663689981645 > " with response "" took (3.312µs) to execute, err is mvcc: required revision is a future revision
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312622): 2024-04-17 23:56:22.376345 I | embed: listening for peers on 127.0.0.1:20001
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312622): raft2024/04/17 23:56:23 INFO: ca50e9357181d758 is starting a new election at term 2
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312622): raft2024/04/17 23:56:23 INFO: ca50e9357181d758 became candidate at term 3
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312622): raft2024/04/17 23:56:23 INFO: ca50e9357181d758 received MsgVoteResp from ca50e9357181d758 at term 3
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312622): raft2024/04/17 23:56:23 INFO: ca50e9357181d758 became leader at term 3
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312622): raft2024/04/17 23:56:23 INFO: raft.node: ca50e9357181d758 elected leader ca50e9357181d758 at term 3
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312622): 2024-04-17 23:56:23.376415 I | etcdserver: published {Name:TestReproduce17780-test-0 ClientURLs:[http://localhost:20000]} to cluster 34f27e83b3bc2ff
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312622): 2024-04-17 23:56:23.376505 I | embed: ready to serve client requests
logger.go:146: 2024-04-17T23:56:23.376+0800 INFO started server. {"name": "TestReproduce17780-test-0", "pid": 312622}
logger.go:146: 2024-04-17T23:56:23.376+0800 INFO restarted server {"name": "TestReproduce17780-test-0"}
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312622): 2024-04-17 23:56:23.377538 N | embed: serving insecure client requests on 127.0.0.1:20000, this is strongly discouraged!
reproduce_17780_test.go:84:
Error Trace: /home/fuweid/workspace/etcd/tests/e2e/reproduce_17780_test.go:84
Error: "198" is not greater than or equal to "201"
Test: TestReproduce17780
logger.go:146: 2024-04-17T23:56:23.379+0800 INFO stopping server... {"name": "TestReproduce17780-test-0"}
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312622): 2024-04-17 23:56:23.380180 N | pkg/osutil: received terminated signal, shutting down...
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312622): 2024-04-17 23:56:23.380339 W | embed: stopping insecure grpc server due to error: accept tcp 127.0.0.1:20000: use of closed network connection
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312622): 2024-04-17 23:56:23.380398 W | embed: stopped insecure grpc server due to error: accept tcp 127.0.0.1:20000: use of closed network connection
/home/fuweid/workspace/etcd/bin/etcd (TestReproduce17780-test-0) (312622): 2024-04-17 23:56:23.380425 I | etcdserver: skipped leadership transfer for single voting member cluster
logger.go:146: 2024-04-17T23:56:23.387+0800 INFO stopped server. {"name": "TestReproduce17780-test-0"}
--- FAIL: TestReproduce17780 (1.98s)
FAIL
For release-3.5, I can reproduce it as well. Click!```diff commit 9d571ff1a76597f515da12dad17b9d69cf2ddb04 (HEAD -> with-compactBeforeSetFinishedCompact-35) Author: Wei Fu Date: Wed Apr 17 23:53:24 2024 +0800
diff --git a/server/mvcc/kvstore_compaction.go b/server/mvcc/kvstore_compaction.go
|
tests/e2e/reproduce_17780_test.go
Outdated
resp, err := cli.Get(ctx, fmt.Sprintf("%d", 3)) | ||
require.NoError(t, err) | ||
require.True(t, resp.Count == 1) | ||
// The 199/200/201 revision has been deleted. The max rev is 198. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add to show that compaction has actually been persisted.
resp, err = cli.Get(ctx, fmt.Sprintf("%d", 99))
require.NoError(t, err)
require.True(t, resp.Count == 0)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. I added it!
Great find fuwed! etcd/server/storage/mvcc/kvstore.go Lines 363 to 368 in e37a67e
https://github.com/etcd-io/etcd/compare/main...siyuanfoundation:etcd:issue/17780?expand=1 |
Just my 2 cents I feel we should point the
Edit: resuming the previous |
Seems like another valid solution.
|
7822a88
to
a59d051
Compare
Sorry for late reply. Try to update the |
Signed-off-by: Wei Fu <fuweid89@gmail.com>
Signed-off-by: Wei Fu <fuweid89@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Great work!
// Revision: 2 -> 8 for new keys | ||
n := compactionBatchLimit - 2 | ||
valueSize := 16 | ||
for i := 2; i <= n; i++ { | ||
_, err := cli.Put(ctx, fmt.Sprintf("%d", i), stringutil.RandString(uint(valueSize))) | ||
require.NoError(t, err) | ||
} | ||
|
||
// Revision: 9 -> 11 for delete keys with compared revision | ||
// | ||
// We need last compaction batch is no-op and all the tombstones should | ||
// be deleted in previous compaction batch. So that we just lost the | ||
// finishedCompactRev after panic. | ||
for i := 9; i <= compactionBatchLimit+1; i++ { | ||
rev := i - 5 | ||
key := fmt.Sprintf("%d", rev) | ||
|
||
_, err := cli.Delete(ctx, key) | ||
require.NoError(t, err) | ||
} | ||
|
||
require.NoError(t, clus.Procs[targetIdx].Failpoints().SetupHTTP(ctx, "compactBeforeSetFinishedCompact", `panic`)) | ||
|
||
_, err := cli.Compact(ctx, 11, clientv3.WithCompactPhysical()) | ||
require.Error(t, err) | ||
|
||
require.NoError(t, clus.Procs[targetIdx].Restart(ctx)) | ||
|
||
// NOTE: We should not decrease the revision if there is no record | ||
// about finished compact operation. | ||
resp, err := cli.Get(ctx, fmt.Sprintf("%d", n)) | ||
require.NoError(t, err) | ||
assert.GreaterOrEqual(t, resp.Header.Revision, int64(11)) | ||
|
||
// Revision 4 should be deleted by compaction. | ||
resp, err = cli.Get(ctx, fmt.Sprintf("%d", 4)) | ||
require.NoError(t, err) | ||
require.True(t, resp.Count == 0) | ||
|
||
next := 20 | ||
for i := 12; i <= next; i++ { | ||
_, err := cli.Put(ctx, fmt.Sprintf("%d", i), stringutil.RandString(uint(valueSize))) | ||
require.NoError(t, err) | ||
} | ||
|
||
expectedRevision := next | ||
for procIdx, proc := range clus.Procs { | ||
cli = newClient(t, proc.EndpointsGRPC(), e2e.ClientConfig{}) | ||
resp, err := cli.Get(ctx, fmt.Sprintf("%d", next)) | ||
require.NoError(t, err) | ||
|
||
assert.GreaterOrEqual(t, resp.Header.Revision, int64(expectedRevision), | ||
fmt.Sprintf("LeaderIdx: %d, Current: %d", leaderIdx, procIdx)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The checking of revisions is a little convoluted, would be good to clean it up to extract the most important properties. Not blocking for merging.
@fuweid can you please backport the PR to 3.5 and 3.4? Thanks |
@ahrtr will do. |
Please read https://github.com/etcd-io/etcd/blob/main/CONTRIBUTING.md#contribution-flow.