-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question: state machine transition caused by the passage of time (especially lease) #7320
Comments
@mitake The Grant won't be lost-- both Grant/Revoke must go through raft. It's the Renews that are handled asynchronously through the leader to avoid raft pressure on an otherwise quiescent cluster. There have been problems with getting a Grant then the Renew thinking the lease doesn't exist; this has been fixed. The leader still tracks lease expiry but it must issue a Revoke through raft. |
@heyitsanthony yes, the log entries of Grant/Revoke will be appended to the logs of the nodes via Raft. However, in the phase of applying, I thought there would be a possibility that every node applies the entry as follower (I'm not fully sure about this, though). In such a case, this line (https://github.com/coreos/etcd/blob/master/lease/lessor.go#L224) won't be executed in the cluster? Forgive me if I'm wrong. |
@mitake the lease will still be added. The difference is followers will assign an infinite expiration deadline whereas the leader assigns a finite deadline. |
@heyitsanthony so in some very rare cases, the configured expiration deadline won't be applied until clients configure again? |
@mitake it's not a hard deadline. If the leader is lost the lease's expiration deadline is reset via |
@heyitsanthony ah I see, so a new leader will handle the progress of time. Thanks a lot for your detailed explanation! |
Confusion seems cleared up; closing. |
I have a question about the state machine transition caused by the passage of time, especially that affects leases.
Currently, the expiration of leases are performed in
lessor.runLoop()
. The timer progresses based on a local machine's clock and it is not controlled by Raft. So the state machines can cause its transition based on the states that don't have consensus.The lease functionality carefully avoids the divergence of the states by implementation efforts e.g. in a case of Renew, forwarding lease requests to a leader from followers. However, it would introduce difficulty in the reasoning about the behavior of etcd. For example, in a case of Grant,
LeaseGrantRequest
in an asynchronous mannerIn the above case, the lease grant request would be lost even though the client receives the response without error. It is because applying
LeaseGrantRequest
depends on the non replicated state of a node: leader or not. So it isn't guaranteed that the effect of the command appears always because every node that execute the command can be follower.For avoiding the above situation, it would be possible to exploit keepalive of Raft and use it for the state transition. I'm not facing serious problems caused by the above situation. But I'm interested in other people's opinion and future plans (e.g. is this todo? https://github.com/coreos/etcd/blob/master/lease/lessor.go#L183)
The text was updated successfully, but these errors were encountered: