-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Leader doesn't step down when establishLeadership fails #5047
Comments
It seems to me that we do not retry Lines 215 to 216 in 3c110d5
establishLeadership is not being called from the loop, only once after acquiring leadership: Line 169 in 3c110d5
Lines 221 to 222 in 3c110d5
reassert revokes leadership before it tries to establish it again.
That being said, I think we are doing everything correctly and there is no need to change anything. |
The bug I think @i0rek is that So I don't think the leader loop should retry The fact it doesn't is what can leave some specific errors that impact In general an error from |
There is some progress. I implemented leadership transition in raft and as soon that is merged and revendored in consul, we can finally step down if establishleadership fails. hashicorp/raft#306. |
Currently the leaderLoop method will periodically retry the establishLeadership operations until successful, instead of stepping down immediately after a failure. This could theoretically cause problems because it's still able to do other normal leader operations like reconciling nodes from Serf or KV read/writes while it's waiting to retry, which violates the assumption that establishLeadership has to succeed before we can handle requests as the leader.
We should look into this and see if there's any negative consequences to immediately stepping down as leader when the establishLeadership method returns an error.
The text was updated successfully, but these errors were encountered: