-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[etcd-3.1.4][performance] It takes 13s to select leader after an instance rejoin etcd cluster #9464
Comments
Hi @silenceshell thanks for details, for a full picture would you mind attaching the output of GET request against ref:#8526 |
Hi @hexfusion thanks for your reply! Unfortunately, the problem environment is gone. I only have related logs..
|
seems we need disable advanceTicksForElection and enable --experimental-pre-vote. can you tell us which 3.1.x release include these features? and what's the default value for them? and how to disable them?and how to enable them? thanks! |
Adjusted
|
so is there any plan to put --pre-vote flag into V3.1.x? |
anyone can help to answer it? thanks! |
@langkeer Pre-vote feature won't be backported, since it requires a set of critical Raft patches. We are still stabilizing it. |
OK. thank you! |
3.4 will set
|
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions. |
etcd Version: 3.1.4
Instance List:
mn-0: 4bc7141c11bf71da
mn-1: 8296553e8d2c027d
sn-2: 60d080439b99d9ca
Main Question:
mn-1 re-join etcd cluster at 05:51:08, but etcd cluster become stable at 05:51:22. Why does it takes 13s to select leader? Can it be improved?
Operation:
DOWN mn-1's interal interface at 05:50:49, and it triggers mn-1 reboot at 05:50:52.
UP mn-1's internal interface at 05:51:08. mn-1's etcd startup at 05:51:08.
Log and Config:
https://github.com/silenceshell/myscripts/tree/master/etcd_issue
Event Analysis:
mn-1 starts a new election with term 152 before receiving SIGTERM after connection lost.
Then mn-1 goes down.
Till now, mn-1 down; mn-0 thinks leader is sn-2; sn-2 thinks leader is sn-2.
Record:
mn-0 (term: 151, logterm: 151, -)
mn-1 (term: 152, logterm: 151, index: 2602517)
sn-2 (term: 151, logterm: 151, -)
mn-0, sn-2 also detect mn-1 active.
sn-2 receives mn-1's MsgAppResp with higher term. sn-2 updates its term to 152.
But mn-0 doesn't receive MsgAppResp from mn-1, so it has no idea about leader switch.
Till now, mn-1 just in; mn-0 thinks leader is sn-2; sn-2 thinks leader is mn-1.
Record:
mn-0 (term: 151, logterm: 151, -)
mn-1 (term: 152, logterm: 151, index: 2602517)
sn-2 (term: 152, logterm: 151, -)
But mn-0 and sn-2 ignore mn-1's MsgVote, due to lease is not expired.
Till now, nothing changes, mn-1 thinks no leader in etcd cluster; mn-0 thinks leader is sn-2; sn-2 thinks leader is mn-1.
Record:
mn-0 (term: 151, logterm: 151, index: 2603551)
mn-1 (term: 153, logterm: 151, index: 2602517)
sn-2 (term: 152, logterm: 151, index: 2603551)
sn-2 votes mn-0 at term 152.
mn-1 ignores mn-0's vote request with lower term.
Till now, mn-1 thinks no leader in etcd cluster; mn-0 thinks leader is mn-0; sn-2 thinks leader is mn-0.
Record:
mn-0 (term: 152, logterm: 152, index: 2603551)
mn-1 (term: 153, logterm: 151, index: 2602517)
sn-2 (term: 152, logterm: 152, index: 2603551)
Till now, mn-1 thinks no leader in etcd cluster; mn-0 thinks leader is mn-1; sn-2 thinks leader is mn-0.
Record:
mn-0 (term: 153, logterm: 152, index: 2603551)
mn-1 (term: 153, logterm: 151, index: 2602517)
sn-2 (term: 152, logterm: 152, index: 2603551)
Still mn-0 and sn-2 ignore mn-1's MsgVote, due to lease is not expired.
Till now, nothing changes, mn-1 thinks no leader in etcd cluster; mn-0 thinks leader is mn-1; sn-2 thinks leader is mn-0.
Record:
mn-0 (term: 153, logterm: 152, index: 2603552)
mn-1 (term: 154, logterm: 151, index: 2602517)
sn-2 (term: 152, logterm: 152, index: 2603552)
mn-0 votes sn-2 at term 153.
mn-1 ignores sn-2's vote request with lower term.
Till now, mn-1 thinks no leader in etcd cluster; mn-0 thinks leader is sn-2; sn-2 thinks leader is sn-2.
Record:
mn-0 (term: 153, logterm: 153, index: 2603552)
mn-1 (term: 154, logterm: 151, index: 2602517)
sn-2 (term: 153, logterm: 153, index: 2603552)
Till now, mn-1 thinks no leader in etcd cluster; mn-0 thinks leader is sn-2; sn-2 thinks leader is mn-1.
Record:
mn-0 (term: 153, logterm: 153, index: 2603552)
mn-1 (term: 154, logterm: 151, index: 2602517)
sn-2 (term: 154, logterm: 153, index: 2603552)
Still mn-0 and sn-2 ignore mn-1's MsgVote, due to lease is not expired.
Till now, nothing changes, mn-1 thinks no leader in etcd cluster; mn-0 thinks leader is sn-2; sn-2 thinks leader is mn-1.
Record:
mn-0 (term: 153, logterm: 153, index: 2603553)
mn-1 (term: 155, logterm: 151, index: 2602517)
sn-2 (term: 154, logterm: 153, index: 2603553)
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Summary for above events:
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
mn-0 receives mn-1's MsgVote and update its local term, but vetos mn-1.
sn-2 receives mn-1's MsgVote and update its local term, but vetos mn-1.
Till now, all instances think itself a follower.
Record:
mn-0 (term: 156, logterm: 153, index: 2603553)
mn-1 (term: 156, logterm: 151, index: 2602517)
sn-2 (term: 156, logterm: 153, index: 2603553)
mn-0 receives mn-1's MsgVote and update its local term, but vetos mn-1.
sn-2 receives mn-1's MsgVote and update its local term, but vetos mn-1.
Till now, nothing changes, all instances think itself a follower.
Record:
mn-0 (term: 157, logterm: 153, index: 2603553)
mn-1 (term: 157, logterm: 151, index: 2602517)
sn-2 (term: 157, logterm: 153, index: 2603553)
mn-0 receives mn-1's MsgVote and update its local term, but vetos mn-1.
sn-2 receives mn-1's MsgVote and update its local term, but vetos mn-1.
Till now, nothing changes, all instances think itself a follower.
Record:
mn-0 (term: 158, logterm: 153, index: 2603553)
mn-1 (term: 158, logterm: 151, index: 2602517)
sn-2 (term: 158, logterm: 153, index: 2603553)
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Summary for above events:
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
mn-1 receives mn-0's MsgVote and update its local term, and sends a yes vote to mn-0.
sn-2 receives mn-0's MsgVote and update its local term, and sends a yes vote to mn-0.
Till now, all instances select mn-0 as leader.
Record:
mn-0 (term: 159, logterm: 153, index: 2603553)
mn-1 (term: 159, logterm: 151, index: 2602517)
sn-2 (term: 159, logterm: 153, index: 2603553)
The text was updated successfully, but these errors were encountered: