Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Robustness] Etcd v3.4 required more than expected leader elections #17455

Closed
4 tasks
serathius opened this issue Feb 19, 2024 · 10 comments · May be fixed by etcd-io/raft#169
Closed
4 tasks

[Robustness] Etcd v3.4 required more than expected leader elections #17455

serathius opened this issue Feb 19, 2024 · 10 comments · May be fixed by etcd-io/raft#169

Comments

@serathius
Copy link
Member

Bug report criteria

What happened?

We got 2 consecutive robustness test failures on v3.4

https://github.com/etcd-io/etcd/actions/runs/7928589548/job/21647130490
https://github.com/etcd-io/etcd/actions/runs/7940776245/job/21682419728

The cause was low average traffic:

2024-02-16T11:00:05.7694187Z     logger.go:130: 2024-02-16T11:00:05.557Z	INFO	Recorded operations	{"operationCount": 809}
2024-02-16T11:00:05.7695071Z     logger.go:130: 2024-02-16T11:00:05.557Z	INFO	Average traffic	{"qps": 59.03054745137809}
2024-02-16T11:00:05.7695949Z     traffic.go:105: Requiring minimal 100.000000 qps for test results to be reliable, got 59.030547 qps

The reason for traffic was that cluster was unavailable for longer then normally and required multiple leader elections.

What did you expect to happen?

Cluster should recover quickly after member crash and not require multiple leader elections.

How can we reproduce it (as minimally and precisely as possible)?

No repro yet

Anything else we need to know?

Expect that this might be due to v3.4 not using --pre-vote, which is default in v3.5.

Etcd version (please run commands below)

$ etcd --version
# paste output here

$ etcdctl version
# paste output here

Etcd configuration (command line flags or environment variables)

paste your configuration here

Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)

$ etcdctl member list -w table
# paste output here

$ etcdctl --endpoints=<member list> endpoint status -w table
# paste output here

Relevant log output

(member-0) (19805): 
(member-0) (19805): goroutine 204 [running]:
(member-0) (19805): go.etcd.io/gofail/runtime.actPanic(0x0?)
(member-0) (19805): 	/home/runner/go/pkg/mod/go.etcd.io/gofail@v0.1.0/runtime/terms.go:318 +0x65
(member-0) (19805): go.etcd.io/gofail/runtime.(*term).do(...)
(member-0) (19805): 	/home/runner/go/pkg/mod/go.etcd.io/gofail@v0.1.0/runtime/terms.go:290
(member-0) (19805): go.etcd.io/gofail/runtime.(*terms).eval(0x0?)
(member-0) (19805): 	/home/runner/go/pkg/mod/go.etcd.io/gofail@v0.1.0/runtime/terms.go:105 +0xe3
(member-0) (19805): go.etcd.io/gofail/runtime.(*Failpoint).Acquire(0xc000290340?)
(member-0) (19805): 	/home/runner/go/pkg/mod/go.etcd.io/gofail@v0.1.0/runtime/failpoint.go:38 +0x98
(member-0) (19805): go.etcd.io/etcd/etcdserver.(*raftNode).start.func1()
(member-0) (19805): 	/tmp/etcd-release-3.4-failpoints/etcdserver/raft.go:261 +0xe29
(member-0) (19805): created by go.etcd.io/etcd/etcdserver.(*raftNode).start in goroutine 212
(member-0) (19805): 	/tmp/etcd-release-3.4-failpoints/etcdserver/raft.go:165 +0x6f
(member-1) (19804): 2024-02-17 09:53:13.891057 E | rafthttp: failed to dial eabdbb777cf498cb on stream MsgApp v2 (EOF)
(member-2) (19803): 2024-02-17 09:53:13.891103 E | rafthttp: failed to dial eabdbb777cf498cb on stream MsgApp v2 (EOF)
(member-2) (19803): 2024-02-17 09:53:13.891120 I | rafthttp: peer eabdbb777cf498cb became inactive (message send to peer failed)
(member-1) (19804): 2024-02-17 09:53:13.891076 I | rafthttp: peer eabdbb777cf498cb became inactive (message send to peer failed)
    logger.go:130: 2024-02-17T09:53:13.891Z	INFO	server exited	{"name": "member-0", "code": 2}
    logger.go:130: 2024-02-17T09:53:13.891Z	INFO	Member exited as expected	{"member": "member-0"}
    logger.go:130: 2024-02-17T09:53:13.891Z	INFO	starting server...	{"name": "member-0"}
    logger.go:130: 2024-02-17T09:53:13.891Z	INFO	spawning process	{"args": ["/tmp/etcd-release-3.4-failpoints/bin/etcd", "--name=member-0", "--listen-client-urls=http://localhost:20000", "--advertise-client-urls=http://localhost:20000", "--listen-peer-urls=https://localhost:20001", "--initial-advertise-peer-urls=https://localhost:20003", "--initial-cluster-token=new", "--data-dir", "/tmp/TestRobustnessExploratoryKubernetesLowTrafficClusterOfSize33603748955/001", "--snapshot-count=50", "--peer-cert-file", "/home/runner/work/etcd/etcd/tests/fixtures/server.crt", "--peer-key-file", "/home/runner/work/etcd/etcd/tests/fixtures/server.key.insecure", "--peer-trusted-ca-file", "/home/runner/work/etcd/etcd/tests/fixtures/ca.crt", "--initial-cluster-token=new", "--snapshot-count=50", "--experimental-compaction-batch-limit=100", "--experimental-watch-progress-notify-interval=100ms", "--initial-cluster=member-0=https://localhost:20003,member-1=https://localhost:20008,member-2=https://localhost:20013", "--initial-cluster-state=new"], "working-dir": "/tmp/TestRobustnessExploratory2929896518/001", "name": "member-0", "environment-variables": ["GOFAIL_HTTP=127.0.0.1:12381", "ETCD_VERIFY=all", "ETCD_BRANCH=release-3.4", "EXPECT_DEBUG=true", "PATH=/opt/hostedtoolcache/go/1.21.6/x64/bin:/home/runner/go/bin:/opt/hostedtoolcache/go/1.21.6/x64/bin:/snap/bin:/home/runner/.local/bin:/opt/pipx_bin:/home/runner/.cargo/bin:/home/runner/.config/composer/vendor/bin:/usr/local/.ghcup/bin:/home/runner/.dotnet/tools:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin", "ETCD_UNSUPPORTED_ARCH=amd64"]}
(member-0) (19844): WARNING: Package "github.com/golang/protobuf/protoc-gen-go/generator" is deprecated.
(member-0) (19844): 	A future release of golang/protobuf will delete this package,
(member-0) (19844): 	which has long been excluded from the compatibility promise.
(member-0) (19844): 
(member-0) (19844): 2024-02-17 09:53:13.900690 W | pkg/flags: unrecognized environment variable ETCD_VERIFY=all
(member-0) (19844): 2024-02-17 09:53:13.900727 W | pkg/flags: unrecognized environment variable ETCD_BRANCH=release-3.4
(member-0) (19844): 2024-02-17 09:53:13.900731 W | pkg/flags: unrecognized environment variable ETCD_UNSUPPORTED_ARCH=amd64
(member-0) (19844): [WARNING] Deprecated '--logger=capnslog' flag is set; use '--logger=zap' flag instead
(member-0) (19844): 2024-02-17 09:53:13.900757 W | embed: Running http and grpc server on single port. This is not recommended for production.
(member-0) (19844): 2024-02-17 09:53:13.900769 I | etcdmain: etcd Version: 3.4.30
(member-0) (19844): 2024-02-17 09:53:13.900778 I | etcdmain: Git SHA: 2662038-FAILPOINTS
(member-0) (19844): 2024-02-17 09:53:13.900786 I | etcdmain: Go Version: go1.21.6
(member-0) (19844): 2024-02-17 09:53:13.900794 I | etcdmain: Go OS/Arch: linux/amd64
(member-0) (19844): 2024-02-17 09:53:13.900799 I | etcdmain: setting maximum number of CPUs to 8, total number of available CPUs is 8
(member-0) (19844): 2024-02-17 09:53:13.900860 N | etcdmain: the server is already initialized as member before, starting as etcd member...
(member-0) (19844): [WARNING] Deprecated '--logger=capnslog' flag is set; use '--logger=zap' flag instead
(member-0) (19844): 2024-02-17 09:53:13.900886 W | embed: Running http and grpc server on single port. This is not recommended for production.
(member-0) (19844): 2024-02-17 09:53:13.900909 I | embed: peerTLS: cert = /home/runner/work/etcd/etcd/tests/fixtures/server.crt, key = /home/runner/work/etcd/etcd/tests/fixtures/server.key.insecure, trusted-ca = /home/runner/work/etcd/etcd/tests/fixtures/ca.crt, client-cert-auth = false, crl-file = 
(member-0) (19844): 2024-02-17 09:53:13.901630 I | embed: name = member-0
(member-0) (19844): 2024-02-17 09:53:13.901641 I | embed: data dir = /tmp/TestRobustnessExploratoryKubernetesLowTrafficClusterOfSize33603748955/001
(member-0) (19844): 2024-02-17 09:53:13.901645 I | embed: member dir = /tmp/TestRobustnessExploratoryKubernetesLowTrafficClusterOfSize33603748955/001/member
(member-0) (19844): 2024-02-17 09:53:13.901652 I | embed: heartbeat = 100ms
(member-0) (19844): 2024-02-17 09:53:13.901658 I | embed: election = 1000ms
(member-0) (19844): 2024-02-17 09:53:13.901663 I | embed: snapshot count = 50
(member-0) (19844): 2024-02-17 09:53:13.901677 I | embed: advertise client URLs = http://localhost:20000
(member-0) (19844): 2024-02-17 09:53:13.901685 I | embed: initial advertise peer URLs = https://localhost:20003
(member-0) (19844): 2024-02-17 09:53:13.901691 I | embed: initial cluster = 
(member-0) (19844): 2024-02-17 09:53:13.908619 I | etcdserver: recovered store from snapshot at index 309
(member-0) (19844): 2024-02-17 09:53:13.910169 I | etcdserver: restarting member eabdbb777cf498cb in cluster b3bc0c1919fe5d7e at commit index 342
(member-0) (19844): raft 2024/02/17 09:53:13 INFO: eabdbb777cf498cb switched to configuration voters=(12416079282240904009 13770228943176794332 16914881897345358027)
(member-0) (19844): raft 2024/02/17 09:53:13 INFO: eabdbb777cf498cb became follower at term 2
(member-0) (19844): raft 2024/02/17 09:53:13 INFO: newRaft eabdbb777cf498cb [peers: [ac4ec652f10e5b49,bf19ae4419db00dc,eabdbb777cf498cb], term: 2, commit: 342, applied: 309, lastindex: 342, lastterm: 2]
(member-0) (19844): 2024-02-17 09:53:13.910393 I | etcdserver/api: enabled capabilities for version 3.4
(member-0) (19844): 2024-02-17 09:53:13.910407 I | etcdserver/membership: added member eabdbb777cf498cb [https://localhost:20003] to cluster b3bc0c1919fe5d7e from store
(member-0) (19844): 2024-02-17 09:53:13.910417 I | etcdserver/membership: added member ac4ec652f10e5b49 [https://localhost:20008] to cluster b3bc0c1919fe5d7e from store
(member-0) (19844): 2024-02-17 09:53:13.910424 I | etcdserver/membership: added member bf19ae4419db00dc [https://localhost:20013] to cluster b3bc0c1919fe5d7e from store
(member-0) (19844): 2024-02-17 09:53:13.910432 I | etcdserver/membership: set the cluster version to 3.4 from store
(member-0) (19844): 2024-02-17 09:53:13.910779 W | auth: simple token is not cryptographically signed
(member-0) (19844): 2024-02-17 09:53:13.912967 I | rafthttp: starting peer ac4ec652f10e5b49...
(member-0) (19844): 2024-02-17 09:53:13.913005 I | rafthttp: started HTTP pipelining with peer ac4ec652f10e5b49
(member-0) (19844): 2024-02-17 09:53:13.913418 I | rafthttp: started streaming with peer ac4ec652f10e5b49 (writer)
(member-0) (19844): 2024-02-17 09:53:13.913487 I | rafthttp: started streaming with peer ac4ec652f10e5b49 (writer)
(member-0) (19844): 2024-02-17 09:53:13.913896 I | rafthttp: started peer ac4ec652f10e5b49
(member-0) (19844): 2024-02-17 09:53:13.913917 I | rafthttp: started streaming with peer ac4ec652f10e5b49 (stream Message reader)
(member-0) (19844): 2024-02-17 09:53:13.913947 I | rafthttp: started streaming with peer ac4ec652f10e5b49 (stream MsgApp v2 reader)
(member-0) (19844): 2024-02-17 09:53:13.914004 I | rafthttp: added peer ac4ec652f10e5b49
(member-0) (19844): 2024-02-17 09:53:13.914029 I | rafthttp: starting peer bf19ae4419db00dc...
(member-0) (19844): 2024-02-17 09:53:13.914050 I | rafthttp: started HTTP pipelining with peer bf19ae4419db00dc
(member-0) (19844): 2024-02-17 09:53:13.914300 I | rafthttp: started streaming with peer bf19ae4419db00dc (writer)
(member-0) (19844): 2024-02-17 09:53:13.914587 I | rafthttp: started streaming with peer bf19ae4419db00dc (writer)
(member-0) (19844): 2024-02-17 09:53:13.914711 I | rafthttp: started peer bf19ae4419db00dc
(member-0) (19844): 2024-02-17 09:53:13.914738 I | rafthttp: started streaming with peer bf19ae4419db00dc (stream MsgApp v2 reader)
(member-0) (19844): 2024-02-17 09:53:13.914750 I | rafthttp: added peer bf19ae4419db00dc
(member-0) (19844): 2024-02-17 09:53:13.914773 I | etcdserver: starting server... [version: 3.4.30, cluster version: 3.4]
(member-0) (19844): 2024-02-17 09:53:13.915197 I | rafthttp: started streaming with peer bf19ae4419db00dc (stream Message reader)
(member-0) (19844): 2024-02-17 09:53:13.915320 I | pkg/fileutil: started to purge file, dir: /tmp/TestRobustnessExploratoryKubernetesLowTrafficClusterOfSize33603748955/001/member/snap, suffix: snap.db, max: 5, interval: 30s
(member-0) (19844): 2024-02-17 09:53:13.915342 I | pkg/fileutil: started to purge file, dir: /tmp/TestRobustnessExploratoryKubernetesLowTrafficClusterOfSize33603748955/001/member/snap, suffix: snap, max: 5, interval: 30s
(member-0) (19844): 2024-02-17 09:53:13.915363 I | pkg/fileutil: started to purge file, dir: /tmp/TestRobustnessExploratoryKubernetesLowTrafficClusterOfSize33603748955/001/member/wal, suffix: wal, max: 5, interval: 30s
(member-0) (19844): 2024-02-17 09:53:13.915588 I | pkg/fileutil: purged file /tmp/TestRobustnessExploratoryKubernetesLowTrafficClusterOfSize33603748955/001/member/snap/0000000000000002-0000000000000033.snap successfully
(member-0) (19844): 2024-02-17 09:53:13.917953 I | embed: listening for peers on 127.0.0.1:20001
(member-0) (19844): 2024-02-17 09:53:13.920514 I | rafthttp: peer bf19ae4419db00dc became active
(member-0) (19844): 2024-02-17 09:53:13.920530 I | rafthttp: established a TCP streaming connection with peer bf19ae4419db00dc (stream MsgApp v2 reader)
(member-2) (19803): 2024-02-17 09:53:13.920611 I | rafthttp: peer eabdbb777cf498cb became active
(member-2) (19803): 2024-02-17 09:53:13.920632 W | rafthttp: closed an existing TCP streaming connection with peer eabdbb777cf498cb (stream MsgApp v2 writer)
(member-2) (19803): 2024-02-17 09:53:13.920655 I | rafthttp: established a TCP streaming connection with peer eabdbb777cf498cb (stream MsgApp v2 writer)
(member-0) (19844): 2024-02-17 09:53:13.920977 I | rafthttp: peer ac4ec652f10e5b49 became active
(member-0) (19844): 2024-02-17 09:53:13.920994 I | rafthttp: established a TCP streaming connection with peer ac4ec652f10e5b49 (stream MsgApp v2 reader)
(member-1) (19804): 2024-02-17 09:53:13.921080 I | rafthttp: peer eabdbb777cf498cb became active
(member-1) (19804): 2024-02-17 09:53:13.921101 W | rafthttp: closed an existing TCP streaming connection with peer eabdbb777cf498cb (stream MsgApp v2 writer)
(member-1) (19804): 2024-02-17 09:53:13.921121 I | rafthttp: established a TCP streaming connection with peer eabdbb777cf498cb (stream MsgApp v2 writer)
(member-0) (19844): 2024-02-17 09:53:13.922741 I | rafthttp: established a TCP streaming connection with peer ac4ec652f10e5b49 (stream Message reader)
(member-1) (19804): 2024-02-17 09:53:13.922796 W | rafthttp: closed an existing TCP streaming connection with peer eabdbb777cf498cb (stream Message writer)
(member-1) (19804): 2024-02-17 09:53:13.922851 I | rafthttp: established a TCP streaming connection with peer eabdbb777cf498cb (stream Message writer)
(member-0) (19844): 2024-02-17 09:53:13.923547 I | rafthttp: established a TCP streaming connection with peer bf19ae4419db00dc (stream Message reader)
(member-2) (19803): 2024-02-17 09:53:13.923799 W | rafthttp: closed an existing TCP streaming connection with peer eabdbb777cf498cb (stream Message writer)
(member-2) (19803): 2024-02-17 09:53:13.923811 I | rafthttp: established a TCP streaming connection with peer eabdbb777cf498cb (stream Message writer)
(member-0) (19844): raft 2024/02/17 09:53:13 INFO: eabdbb777cf498cb no leader at term 2; dropping index reading msg
(member-0) (19844): 2024-02-17 09:53:13.966491 I | etcdserver: eabdbb777cf498cb initialized peer connection; fast-forwarding 8 ticks (election ticks 10) with 2 active peer(s)
(member-0) (19844): 2024-02-17 09:53:13.992095 I | rafthttp: established a TCP streaming connection with peer ac4ec652f10e5b49 (stream Message writer)
(member-1) (19804): 2024-02-17 09:53:13.992170 I | rafthttp: established a TCP streaming connection with peer eabdbb777cf498cb (stream Message reader)
(member-0) (19844): 2024-02-17 09:53:13.993699 I | rafthttp: established a TCP streaming connection with peer bf19ae4419db00dc (stream MsgApp v2 writer)
(member-2) (19803): 2024-02-17 09:53:13.993761 I | rafthttp: established a TCP streaming connection with peer eabdbb777cf498cb (stream MsgApp v2 reader)
(member-0) (19844): 2024-02-17 09:53:13.994311 I | rafthttp: established a TCP streaming connection with peer ac4ec652f10e5b49 (stream MsgApp v2 writer)
(member-1) (19804): 2024-02-17 09:53:13.994358 I | rafthttp: established a TCP streaming connection with peer eabdbb777cf498cb (stream MsgApp v2 reader)
(member-0) (19844): 2024-02-17 09:53:13.995721 I | rafthttp: established a TCP streaming connection with peer bf19ae4419db00dc (stream Message writer)
(member-2) (19803): 2024-02-17 09:53:13.995821 I | rafthttp: established a TCP streaming connection with peer eabdbb777cf498cb (stream Message reader)
(member-2) (19803): 2024-02-17 09:53:14.410022 W | etcdserver: waiting for ReadIndex response took too long, retrying (sent-request-id: 62080309779226459, retry-timeout: 500ms)
(member-0) (19844): raft 2024/02/17 09:53:14 INFO: eabdbb777cf498cb no leader at term 2; dropping index reading msg
(member-1) (19804): 2024-02-17 09:53:14.426041 W | etcdserver: waiting for ReadIndex response took too long, retrying (sent-request-id: 6577944545654202502, retry-timeout: 500ms)
(member-0) (19844): raft 2024/02/17 09:53:14 INFO: eabdbb777cf498cb no leader at term 2; dropping index reading msg
(member-0) (19844): raft 2024/02/17 09:53:14 INFO: eabdbb777cf498cb is starting a new election at term 2
(member-0) (19844): raft 2024/02/17 09:53:14 INFO: eabdbb777cf498cb became candidate at term 3
(member-0) (19844): raft 2024/02/17 09:53:14 INFO: eabdbb777cf498cb received MsgVoteResp from eabdbb777cf498cb at term 3
(member-0) (19844): raft 2024/02/17 09:53:14 INFO: eabdbb777cf498cb [logterm: 2, index: 342] sent MsgVote request to ac4ec652f10e5b49 at term 3
(member-0) (19844): raft 2024/02/17 09:53:14 INFO: eabdbb777cf498cb [logterm: 2, index: 342] sent MsgVote request to bf19ae4419db00dc at term 3
(member-2) (19803): raft 2024/02/17 09:53:14 INFO: bf19ae4419db00dc [logterm: 2, index: 343, vote: eabdbb777cf498cb] ignored MsgVote from eabdbb777cf498cb [logterm: 2, index: 342] at term 2: lease is not expired (remaining ticks: 1)
(member-1) (19804): raft 2024/02/17 09:53:14 INFO: ac4ec652f10e5b49 [logterm: 2, index: 343, vote: ac4ec652f10e5b49] ignored MsgVote from eabdbb777cf498cb [logterm: 2, index: 342] at term 2: lease is not expired (remaining ticks: 1)
(member-2) (19803): raft 2024/02/17 09:53:14 INFO: bf19ae4419db00dc is starting a new election at term 2
(member-2) (19803): raft 2024/02/17 09:53:14 INFO: bf19ae4419db00dc became candidate at term 3
(member-2) (19803): raft 2024/02/17 09:53:14 INFO: bf19ae4419db00dc received MsgVoteResp from bf19ae4419db00dc at term 3
(member-2) (19803): raft 2024/02/17 09:53:14 INFO: bf19ae4419db00dc [logterm: 2, index: 343] sent MsgVote request to ac4ec652f10e5b49 at term 3
(member-2) (19803): raft 2024/02/17 09:53:14 INFO: bf19ae4419db00dc [logterm: 2, index: 343] sent MsgVote request to eabdbb777cf498cb at term 3
(member-2) (19803): raft 2024/02/17 09:53:14 INFO: raft.node: bf19ae4419db00dc lost leader eabdbb777cf498cb at term 3
(member-0) (19844): raft 2024/02/17 09:53:14 INFO: eabdbb777cf498cb [logterm: 2, index: 342, vote: eabdbb777cf498cb] rejected MsgVote from bf19ae4419db00dc [logterm: 2, index: 343] at term 3
(member-1) (19804): raft 2024/02/17 09:53:14 INFO: ac4ec652f10e5b49 [logterm: 2, index: 343, vote: ac4ec652f10e5b49] ignored MsgVote from bf19ae4419db00dc [logterm: 2, index: 343] at term 2: lease is not expired (remaining ticks: 1)
(member-2) (19803): raft 2024/02/17 09:53:14 INFO: bf19ae4419db00dc received MsgVoteResp rejection from eabdbb777cf498cb at term 3
(member-2) (19803): raft 2024/02/17 09:53:14 INFO: bf19ae4419db00dc has received 1 MsgVoteResp votes and 1 vote rejections
(member-2) (19803): 2024-02-17 09:53:14.910867 W | etcdserver: waiting for ReadIndex response took too long, retrying (sent-request-id: 62080309779226459, retry-timeout: 500ms)
(member-1) (19804): 2024-02-17 09:53:14.926237 W | etcdserver: waiting for ReadIndex response took too long, retrying (sent-request-id: 6577944545654202502, retry-timeout: 500ms)
(member-2) (19803): 2024-02-17 09:53:15.411436 W | etcdserver: waiting for ReadIndex response took too long, retrying (sent-request-id: 62080309779226459, retry-timeout: 500ms)
(member-1) (19804): 2024-02-17 09:53:15.427252 W | etcdserver: waiting for ReadIndex response took too long, retrying (sent-request-id: 6577944545654202502, retry-timeout: 500ms)
(member-1) (19804): raft 2024/02/17 09:53:15 INFO: ac4ec652f10e5b49 is starting a new election at term 2
(member-1) (19804): raft 2024/02/17 09:53:15 INFO: ac4ec652f10e5b49 became candidate at term 3
(member-1) (19804): raft 2024/02/17 09:53:15 INFO: ac4ec652f10e5b49 received MsgVoteResp from ac4ec652f10e5b49 at term 3
(member-1) (19804): raft 2024/02/17 09:53:15 INFO: ac4ec652f10e5b49 [logterm: 2, index: 343] sent MsgVote request to bf19ae4419db00dc at term 3
(member-1) (19804): raft 2024/02/17 09:53:15 INFO: ac4ec652f10e5b49 [logterm: 2, index: 343] sent MsgVote request to eabdbb777cf498cb at term 3
(member-1) (19804): raft 2024/02/17 09:53:15 INFO: raft.node: ac4ec652f10e5b49 lost leader eabdbb777cf498cb at term 3
(member-2) (19803): raft 2024/02/17 09:53:15 INFO: bf19ae4419db00dc [logterm: 2, index: 343, vote: bf19ae4419db00dc] rejected MsgVote from ac4ec652f10e5b49 [logterm: 2, index: 343] at term 3
(member-0) (19844): raft 2024/02/17 09:53:15 INFO: eabdbb777cf498cb [logterm: 2, index: 342, vote: eabdbb777cf498cb] rejected MsgVote from ac4ec652f10e5b49 [logterm: 2, index: 343] at term 3
(member-1) (19804): raft 2024/02/17 09:53:15 INFO: ac4ec652f10e5b49 received MsgVoteResp rejection from bf19ae4419db00dc at term 3
(member-1) (19804): raft 2024/02/17 09:53:15 INFO: ac4ec652f10e5b49 has received 1 MsgVoteResp votes and 1 vote rejections
(member-1) (19804): raft 2024/02/17 09:53:15 INFO: ac4ec652f10e5b49 received MsgVoteResp rejection from eabdbb777cf498cb at term 3
(member-1) (19804): raft 2024/02/17 09:53:15 INFO: ac4ec652f10e5b49 has received 1 MsgVoteResp votes and 2 vote rejections
(member-1) (19804): raft 2024/02/17 09:53:15 INFO: ac4ec652f10e5b49 became follower at term 3
(member-2) (19803): 2024-02-17 09:53:15.911826 W | etcdserver: waiting for ReadIndex response took too long, retrying (sent-request-id: 62080309779226459, retry-timeout: 500ms)
(member-1) (19804): 2024-02-17 09:53:15.928352 W | etcdserver: waiting for ReadIndex response took too long, retrying (sent-request-id: 6577944545654202502, retry-timeout: 500ms)
(member-1) (19804): raft 2024/02/17 09:53:15 INFO: ac4ec652f10e5b49 no leader at term 3; dropping index reading msg
(member-0) (19844): raft 2024/02/17 09:53:16 INFO: eabdbb777cf498cb is starting a new election at term 3
(member-0) (19844): raft 2024/02/17 09:53:16 INFO: eabdbb777cf498cb became candidate at term 4
(member-0) (19844): raft 2024/02/17 09:53:16 INFO: eabdbb777cf498cb received MsgVoteResp from eabdbb777cf498cb at term 4
(member-0) (19844): raft 2024/02/17 09:53:16 INFO: eabdbb777cf498cb [logterm: 2, index: 342] sent MsgVote request to ac4ec652f10e5b49 at term 4
(member-0) (19844): raft 2024/02/17 09:53:16 INFO: eabdbb777cf498cb [logterm: 2, index: 342] sent MsgVote request to bf19ae4419db00dc at term 4
(member-1) (19804): raft 2024/02/17 09:53:16 INFO: ac4ec652f10e5b49 [term: 3] received a MsgVote message with higher term from eabdbb777cf498cb [term: 4]
(member-2) (19803): raft 2024/02/17 09:53:16 INFO: bf19ae4419db00dc [term: 3] received a MsgVote message with higher term from eabdbb777cf498cb [term: 4]
(member-2) (19803): raft 2024/02/17 09:53:16 INFO: bf19ae4419db00dc became follower at term 4
(member-2) (19803): raft 2024/02/17 09:53:16 INFO: bf19ae4419db00dc [logterm: 2, index: 343, vote: 0] rejected MsgVote from eabdbb777cf498cb [logterm: 2, index: 342] at term 4
(member-1) (19804): raft 2024/02/17 09:53:16 INFO: ac4ec652f10e5b49 became follower at term 4
(member-1) (19804): raft 2024/02/17 09:53:16 INFO: ac4ec652f10e5b49 [logterm: 2, index: 343, vote: 0] rejected MsgVote from eabdbb777cf498cb [logterm: 2, index: 342] at term 4
(member-0) (19844): raft 2024/02/17 09:53:16 INFO: eabdbb777cf498cb received MsgVoteResp rejection from bf19ae4419db00dc at term 4
(member-0) (19844): raft 2024/02/17 09:53:16 INFO: eabdbb777cf498cb has received 1 MsgVoteResp votes and 1 vote rejections
(member-0) (19844): raft 2024/02/17 09:53:16 INFO: eabdbb777cf498cb received MsgVoteResp rejection from ac4ec652f10e5b49 at term 4
(member-0) (19844): raft 2024/02/17 09:53:16 INFO: eabdbb777cf498cb has received 1 MsgVoteResp votes and 2 vote rejections
(member-0) (19844): raft 2024/02/17 09:53:16 INFO: eabdbb777cf498cb became follower at term 4
(member-2) (19803): 2024-02-17 09:53:16.412078 W | etcdserver: waiting for ReadIndex response took too long, retrying (sent-request-id: 62080309779226459, retry-timeout: 500ms)
(member-2) (19803): raft 2024/02/17 09:53:16 INFO: bf19ae4419db00dc no leader at term 4; dropping index reading msg
(member-1) (19804): 2024-02-17 09:53:16.429381 W | etcdserver: waiting for ReadIndex response took too long, retrying (sent-request-id: 6577944545654202502, retry-timeout: 500ms)
(member-1) (19804): raft 2024/02/17 09:53:16 INFO: ac4ec652f10e5b49 no leader at term 4; dropping index reading msg
(member-2) (19803): 2024-02-17 09:53:16.912379 W | etcdserver: waiting for ReadIndex response took too long, retrying (sent-request-id: 62080309779226459, retry-timeout: 500ms)
(member-2) (19803): raft 2024/02/17 09:53:16 INFO: bf19ae4419db00dc no leader at term 4; dropping index reading msg
(member-1) (19804): 2024-02-17 09:53:16.930420 W | etcdserver: waiting for ReadIndex response took too long, retrying (sent-request-id: 6577944545654202502, retry-timeout: 500ms)
(member-1) (19804): raft 2024/02/17 09:53:16 INFO: ac4ec652f10e5b49 no leader at term 4; dropping index reading msg
(member-2) (19803): 2024-02-17 09:53:17.412934 W | etcdserver: waiting for ReadIndex response took too long, retrying (sent-request-id: 62080309779226459, retry-timeout: 500ms)
(member-2) (19803): raft 2024/02/17 09:53:17 INFO: bf19ae4419db00dc no leader at term 4; dropping index reading msg
(member-1) (19804): 2024-02-17 09:53:17.430470 W | etcdserver: waiting for ReadIndex response took too long, retrying (sent-request-id: 6577944545654202502, retry-timeout: 500ms)
(member-1) (19804): raft 2024/02/17 09:53:17 INFO: ac4ec652f10e5b49 no leader at term 4; dropping index reading msg
(member-0) (19844): raft 2024/02/17 09:53:17 INFO: eabdbb777cf498cb is starting a new election at term 4
(member-0) (19844): raft 2024/02/17 09:53:17 INFO: eabdbb777cf498cb became candidate at term 5
(member-0) (19844): raft 2024/02/17 09:53:17 INFO: eabdbb777cf498cb received MsgVoteResp from eabdbb777cf498cb at term 5
(member-0) (19844): raft 2024/02/17 09:53:17 INFO: eabdbb777cf498cb [logterm: 2, index: 342] sent MsgVote request to ac4ec652f10e5b49 at term 5
(member-0) (19844): raft 2024/02/17 09:53:17 INFO: eabdbb777cf498cb [logterm: 2, index: 342] sent MsgVote request to bf19ae4419db00dc at term 5
(member-1) (19804): raft 2024/02/17 09:53:17 INFO: ac4ec652f10e5b49 [term: 4] received a MsgVote message with higher term from eabdbb777cf498cb [term: 5]
(member-1) (19804): raft 2024/02/17 09:53:17 INFO: ac4ec652f10e5b49 became follower at term 5
(member-1) (19804): raft 2024/02/17 09:53:17 INFO: ac4ec652f10e5b49 [logterm: 2, index: 343, vote: 0] rejected MsgVote from eabdbb777cf498cb [logterm: 2, index: 342] at term 5
(member-2) (19803): raft 2024/02/17 09:53:17 INFO: bf19ae4419db00dc [term: 4] received a MsgVote message with higher term from eabdbb777cf498cb [term: 5]
(member-2) (19803): raft 2024/02/17 09:53:17 INFO: bf19ae4419db00dc became follower at term 5
(member-2) (19803): raft 2024/02/17 09:53:17 INFO: bf19ae4419db00dc [logterm: 2, index: 343, vote: 0] rejected MsgVote from eabdbb777cf498cb [logterm: 2, index: 342] at term 5
(member-0) (19844): raft 2024/02/17 09:53:17 INFO: eabdbb777cf498cb received MsgVoteResp rejection from bf19ae4419db00dc at term 5
(member-0) (19844): raft 2024/02/17 09:53:17 INFO: eabdbb777cf498cb has received 1 MsgVoteResp votes and 1 vote rejections
(member-0) (19844): raft 2024/02/17 09:53:17 INFO: eabdbb777cf498cb received MsgVoteResp rejection from ac4ec652f10e5b49 at term 5
(member-0) (19844): raft 2024/02/17 09:53:17 INFO: eabdbb777cf498cb has received 1 MsgVoteResp votes and 2 vote rejections
(member-0) (19844): raft 2024/02/17 09:53:17 INFO: eabdbb777cf498cb became follower at term 5
(member-2) (19803): 2024-02-17 09:53:17.913724 W | etcdserver: waiting for ReadIndex response took too long, retrying (sent-request-id: 62080309779226459, retry-timeout: 500ms)
(member-2) (19803): raft 2024/02/17 09:53:17 INFO: bf19ae4419db00dc no leader at term 5; dropping index reading msg
(member-1) (19804): 2024-02-17 09:53:17.931086 W | etcdserver: waiting for ReadIndex response took too long, retrying (sent-request-id: 6577944545654202502, retry-timeout: 500ms)
(member-1) (19804): raft 2024/02/17 09:53:17 INFO: ac4ec652f10e5b49 no leader at term 5; dropping index reading msg
(member-2) (19803): 2024-02-17 09:53:18.414063 W | etcdserver: waiting for ReadIndex response took too long, retrying (sent-request-id: 62080309779226459, retry-timeout: 500ms)
(member-2) (19803): raft 2024/02/17 09:53:18 INFO: bf19ae4419db00dc no leader at term 5; dropping index reading msg
(member-1) (19804): 2024-02-17 09:53:18.432140 W | etcdserver: waiting for ReadIndex response took too long, retrying (sent-request-id: 6577944545654202502, retry-timeout: 500ms)
(member-1) (19804): raft 2024/02/17 09:53:18 INFO: ac4ec652f10e5b49 no leader at term 5; dropping index reading msg
(member-0) (19844): raft 2024/02/17 09:53:18 INFO: eabdbb777cf498cb is starting a new election at term 5
(member-0) (19844): raft 2024/02/17 09:53:18 INFO: eabdbb777cf498cb became candidate at term 6
(member-0) (19844): raft 2024/02/17 09:53:18 INFO: eabdbb777cf498cb received MsgVoteResp from eabdbb777cf498cb at term 6
(member-0) (19844): raft 2024/02/17 09:53:18 INFO: eabdbb777cf498cb [logterm: 2, index: 342] sent MsgVote request to ac4ec652f10e5b49 at term 6
(member-0) (19844): raft 2024/02/17 09:53:18 INFO: eabdbb777cf498cb [logterm: 2, index: 342] sent MsgVote request to bf19ae4419db00dc at term 6
(member-1) (19804): raft 2024/02/17 09:53:18 INFO: ac4ec652f10e5b49 [term: 5] received a MsgVote message with higher term from eabdbb777cf498cb [term: 6]
(member-2) (19803): raft 2024/02/17 09:53:18 INFO: bf19ae4419db00dc [term: 5] received a MsgVote message with higher term from eabdbb777cf498cb [term: 6]
(member-1) (19804): raft 2024/02/17 09:53:18 INFO: ac4ec652f10e5b49 became follower at term 6
(member-2) (19803): raft 2024/02/17 09:53:18 INFO: bf19ae4419db00dc became follower at term 6
(member-2) (19803): raft 2024/02/17 09:53:18 INFO: bf19ae4419db00dc [logterm: 2, index: 343, vote: 0] rejected MsgVote from eabdbb777cf498cb [logterm: 2, index: 342] at term 6
(member-1) (19804): raft 2024/02/17 09:53:18 INFO: ac4ec652f10e5b49 [logterm: 2, index: 343, vote: 0] rejected MsgVote from eabdbb777cf498cb [logterm: 2, index: 342] at term 6
(member-0) (19844): raft 2024/02/17 09:53:18 INFO: eabdbb777cf498cb received MsgVoteResp rejection from bf19ae4419db00dc at term 6
(member-0) (19844): raft 2024/02/17 09:53:18 INFO: eabdbb777cf498cb has received 1 MsgVoteResp votes and 1 vote rejections
(member-0) (19844): raft 2024/02/17 09:53:18 INFO: eabdbb777cf498cb received MsgVoteResp rejection from ac4ec652f10e5b49 at term 6
(member-0) (19844): raft 2024/02/17 09:53:18 INFO: eabdbb777cf498cb has received 1 MsgVoteResp votes and 2 vote rejections
(member-0) (19844): raft 2024/02/17 09:53:18 INFO: eabdbb777cf498cb became follower at term 6
(member-2) (19803): 2024-02-17 09:53:18.915067 W | etcdserver: waiting for ReadIndex response took too long, retrying (sent-request-id: 62080309779226459, retry-timeout: 500ms)
(member-2) (19803): raft 2024/02/17 09:53:18 INFO: bf19ae4419db00dc no leader at term 6; dropping index reading msg
(member-1) (19804): 2024-02-17 09:53:18.932961 W | etcdserver: waiting for ReadIndex response took too long, retrying (sent-request-id: 6577944545654202502, retry-timeout: 500ms)
(member-1) (19804): raft 2024/02/17 09:53:18 INFO: ac4ec652f10e5b49 no leader at term 6; dropping index reading msg
(member-2) (19803): 2024-02-17 09:53:19.415502 W | etcdserver: waiting for ReadIndex response took too long, retrying (sent-request-id: 62080309779226459, retry-timeout: 500ms)
(member-2) (19803): raft 2024/02/17 09:53:19 INFO: bf19ae4419db00dc no leader at term 6; dropping index reading msg
(member-1) (19804): 2024-02-17 09:53:19.433834 W | etcdserver: waiting for ReadIndex response took too long, retrying (sent-request-id: 6577944545654202502, retry-timeout: 500ms)
(member-1) (19804): raft 2024/02/17 09:53:19 INFO: ac4ec652f10e5b49 no leader at term 6; dropping index reading msg
(member-2) (19803): raft 2024/02/17 09:53:19 INFO: bf19ae4419db00dc is starting a new election at term 6
(member-2) (19803): raft 2024/02/17 09:53:19 INFO: bf19ae4419db00dc became candidate at term 7
(member-2) (19803): raft 2024/02/17 09:53:19 INFO: bf19ae4419db00dc received MsgVoteResp from bf19ae4419db00dc at term 7
(member-2) (19803): raft 2024/02/17 09:53:19 INFO: bf19ae4419db00dc [logterm: 2, index: 343] sent MsgVote request to ac4ec652f10e5b49 at term 7
(member-2) (19803): raft 2024/02/17 09:53:19 INFO: bf19ae4419db00dc [logterm: 2, index: 343] sent MsgVote request to eabdbb777cf498cb at term 7
(member-1) (19804): raft 2024/02/17 09:53:19 INFO: ac4ec652f10e5b49 [term: 6] received a MsgVote message with higher term from bf19ae4419db00dc [term: 7]
(member-1) (19804): raft 2024/02/17 09:53:19 INFO: ac4ec652f10e5b49 became follower at term 7
(member-1) (19804): raft 2024/02/17 09:53:19 INFO: ac4ec652f10e5b49 [logterm: 2, index: 343, vote: 0] cast MsgVote for bf19ae4419db00dc [logterm: 2, index: 343] at term 7
(member-0) (19844): raft 2024/02/17 09:53:19 INFO: eabdbb777cf498cb [term: 6] received a MsgVote message with higher term from bf19ae4419db00dc [term: 7]
(member-0) (19844): raft 2024/02/17 09:53:19 INFO: eabdbb777cf498cb became follower at term 7
(member-0) (19844): raft 2024/02/17 09:53:19 INFO: eabdbb777cf498cb [logterm: 2, index: 342, vote: 0] cast MsgVote for bf19ae4419db00dc [logterm: 2, index: 343] at term 7
(member-2) (19803): raft 2024/02/17 09:53:19 INFO: bf19ae4419db00dc received MsgVoteResp from ac4ec652f10e5b49 at term 7
(member-2) (19803): raft 2024/02/17 09:53:19 INFO: bf19ae4419db00dc has received 2 MsgVoteResp votes and 0 vote rejections
(member-2) (19803): raft 2024/02/17 09:53:19 INFO: bf19ae4419db00dc became leader at term 7
(member-2) (19803): raft 2024/02/17 09:53:19 INFO: raft.node: bf19ae4419db00dc elected leader bf19ae4419db00dc at term 7
(member-1) (19804): raft 2024/02/17 09:53:19 INFO: raft.node: ac4ec652f10e5b49 elected leader bf19ae4419db00dc at term 7
(member-0) (19844): raft 2024/02/17 09:53:19 INFO: raft.node: eabdbb777cf498cb elected leader bf19ae4419db00dc at term 7
(member-2) (19803): 2024-02-17 09:53:19.862731 I | etcdserver: first commit in current term: resending ReadIndex request
(member-1) (19804): 2024-02-17 09:53:19.862912 I | etcdserver: first commit in current term: resending ReadIndex request
(member-0) (19844): 2024-02-17 09:53:19.864555 I | etcdserver: published {Name:member-0 ClientURLs:[http://localhost:20000]} to cluster b3bc0c1919fe5d7e
(member-0) (19844): 2024-02-17 09:53:19.864657 I | embed: ready to serve client requests
(member-0) (19844): 2024-02-17 09:53:19.865664 N | embed: serving insecure client requests on 127.0.0.1:20000, this is strongly discouraged!
(member-1) (19804): 2024-02-17 09:53:19.866925 W | etcdserver: ignored out-of-date read index response; local node read indexes queueing up and waiting to be in sync with leader (request ID want 6577944545654202507, got 6577944545654202506)
    logger.go:130: 2024-02-17T09:53:19.868Z	INFO	started server.	{"name": "member-0", "pid": 19844}
    logger.go:130: 2024-02-17T09:53:19.868Z	INFO	Verifying cluster health after failpoint	{"failpoint": "raftBeforeSave=panic()"}
(member-2) (19803): 2024-02-17 09:53:19.887084 I | etcdserver: start to snapshot (applied: 360, lastsnap: 309)
(member-0) (19844): 2024-02-17 09:53:19.887385 I | etcdserver: start to snapshot (applied: 360, lastsnap: 309)
(member-1) (19804): 2024-02-17 09:53:19.887411 I | etcdserver: start to snapshot (applied: 360, lastsnap: 309)
(member-2) (19803): 2024-02-17 09:53:19.888698 I | etcdserver: saved snapshot at index 360
(member-1) (19804): 2024-02-17 09:53:19.889838 I | etcdserver: saved snapshot at index 360
(member-0) (19844): 2024-02-17 09:53:19.889889 I | etcdserver: saved snapshot at index 360
(member-1) (19804): 2024-02-17 09:53:19.911063 I | etcdserver: start to snapshot (applied: 411, lastsnap: 360)
(member-0) (19844): 2024-02-17 09:53:19.911267 I | etcdserver: start to snapshot (applied: 411, lastsnap: 360)
(member-2) (19803): 2024-02-17 09:53:19.912674 I | etcdserver: start to snapshot (applied: 411, lastsnap: 360)
(member-0) (19844): 2024-02-17 09:53:19.913919 I | etcdserver: saved snapshot at index 411
(member-1) (19804): 2024-02-17 09:53:19.914669 I | etcdserver: saved snapshot at index 411
(member-2) (19803): 2024-02-17 09:53:19.914929 I | etcdserver: saved snapshot at index 411
(member-2) (19803): 2024-02-17 09:53:20.129705 I | etcdserver: start to snapshot (applied: 462, lastsnap: 411)
(member-0) (19844): 2024-02-17 09:53:20.129893 I | etcdserver: start to snapshot (applied: 462, lastsnap: 411)
(member-1) (19804): 2024-02-17 09:53:20.130036 I | etcdserver: start to snapshot (applied: 462, lastsnap: 411)
(member-2) (19803): 2024-02-17 09:53:20.131646 I | etcdserver: saved snapshot at index 462
(member-0) (19844): 2024-02-17 09:53:20.131981 I | etcdserver: saved snapshot at index 462
(member-1) (19804): 2024-02-17 09:53:20.132042 I | etcdserver: saved snapshot at index 462
(member-2) (19803): 2024-02-17 09:53:20.454693 I | etcdserver: start to snapshot (applied: 513, lastsnap: 462)
(member-1) (19804): 2024-02-17 09:53:20.454801 I | etcdserver: start to snapshot (applied: 513, lastsnap: 462)
(member-0) (19844): 2024-02-17 09:53:20.454782 I | etcdserver: start to snapshot (applied: 513, lastsnap: 462)
(member-0) (19844): 2024-02-17 09:53:20.455898 I | etcdserver: saved snapshot at index 513
(member-2) (19803): 2024-02-17 09:53:20.455910 I | etcdserver: saved snapshot at index 513
(member-1) (19804): 2024-02-17 09:53:20.455902 I | etcdserver: saved snapshot at index 513
@joshuazh-x
Copy link
Contributor

member-0 with less up-to-date log became candidates earlier than the other two nodes in several round of elections. It could not win election and prevented the other two from stepping into candidate. Enabling prevote can reduce such problem.

@serathius please assign this to me.

@serathius
Copy link
Member Author

Just a note, I didn't treat this as a flake as there is no signs of this being issue with robustness. We have been running robustness tests on v3.4 for long time and this is the first time it happen twice in row. It could point to some regression in v3.4.

cc @ahrtr @jmhbnz @siyuanfoundation are you aware of any changes that could cause regressions in v3.4 for number required of leader elections?

@serathius
Copy link
Member Author

member-0 with less up-to-date log became candidates earlier than the other two nodes in several round of elections. It could not win election and prevented the other two from stepping into candidate. Enabling prevote can reduce such problem.

Right, I haven't yet dive deep into the logs but I would want to make sure that we know:

  • why it repeated multiple times?
  • why it prevented other candidates stepping up? members should only vote on members with higher raft index.

@joshuazh-x
Copy link
Contributor

joshuazh-x commented Feb 20, 2024

member-0 with less up-to-date log became candidates earlier than the other two nodes in several round of elections. It could not win election and prevented the other two from stepping into candidate. Enabling prevote can reduce such problem.

Right, I haven't yet dive deep into the logs but I would want to make sure that we know:

  • why it repeated multiple times?
  • why it prevented other candidates stepping up? members should only vote on members with higher raft index.

Say all 3 members were in term 3, and there was no leader. member-0 has less up-to-date log than the other two.

When member-0 became candidate first (got a small random timeout), it stepped to term 4 and sent vote requests to the other two (which were still in term 3). Upon receiving vote messages, member-1 and member-2 were changed to follower of term 4, and its election expiration counting was reset so they had no chance to become candidate of term 3. And member-1 and member-2 both rejected voting to member-0 because they have more up-to-date logs.

Now we have 3 members in term 4 with no leader, waiting another round of election.

This happened multiple times in a row until member-0 finally did not become candidate than the other two.

@serathius
Copy link
Member Author

Right, this seems like a clear case for pre-vote. Thanks for investigating!

@siyuanfoundation
Copy link
Contributor

Just a note, I didn't treat this as a flake as there is no signs of this being issue with robustness. We have been running robustness tests on v3.4 for long time and this is the first time it happen twice in row. It could point to some regression in v3.4.

cc @ahrtr @jmhbnz @siyuanfoundation are you aware of any changes that could cause regressions in v3.4 for number required of leader elections?

No, there are no real changes to the 3.4 binary recently. All the changes are related to tests and metrics.
Downgrade feature is merged only today.

@ahrtr
Copy link
Member

ahrtr commented Feb 21, 2024

It's expected behaviour when pre-vote isn't enabled. I think we should

@joshuazh-x
Copy link
Contributor

A lower expected QPS for 3.4 may hide other potential issues that impact availability. Improvement in raft sounds good, but is it possible to just cherry pick this change? I don't see a 3.4 release branch in raft repo.

@serathius
Copy link
Member Author

I think that instead of forcing prevote in robustness test, we should go with #17775

@serathius
Copy link
Member Author

Done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

5 participants