Question: state machine transition caused by the passage of time (especially lease) #7320

mitake · 2017-02-14T09:04:31Z

I have a question about the state machine transition caused by the passage of time, especially that affects leases.

Currently, the expiration of leases are performed in lessor.runLoop(). The timer progresses based on a local machine's clock and it is not controlled by Raft. So the state machines can cause its transition based on the states that don't have consensus.

The lease functionality carefully avoids the divergence of the states by implementation efforts e.g. in a case of Renew, forwarding lease requests to a leader from followers. However, it would introduce difficulty in the reasoning about the behavior of etcd. For example, in a case of Grant,

a client sends a request of lease grant to a cluster
the cluster forms a consensus about the request so send a response to the client and apply LeaseGrantRequest in an asynchronous manner
the client receives the successful response
before completion of the applying, the leader dies and revives

In the above case, the lease grant request would be lost even though the client receives the response without error. It is because applying LeaseGrantRequest depends on the non replicated state of a node: leader or not. So it isn't guaranteed that the effect of the command appears always because every node that execute the command can be follower.

For avoiding the above situation, it would be possible to exploit keepalive of Raft and use it for the state transition. I'm not facing serious problems caused by the above situation. But I'm interested in other people's opinion and future plans (e.g. is this todo? https://github.com/coreos/etcd/blob/master/lease/lessor.go#L183)

The text was updated successfully, but these errors were encountered:

heyitsanthony · 2017-02-14T18:11:53Z

@mitake The Grant won't be lost-- both Grant/Revoke must go through raft. It's the Renews that are handled asynchronously through the leader to avoid raft pressure on an otherwise quiescent cluster. There have been problems with getting a Grant then the Renew thinking the lease doesn't exist; this has been fixed. The leader still tracks lease expiry but it must issue a Revoke through raft.

mitake · 2017-02-15T04:43:16Z

@heyitsanthony yes, the log entries of Grant/Revoke will be appended to the logs of the nodes via Raft. However, in the phase of applying, I thought there would be a possibility that every node applies the entry as follower (I'm not fully sure about this, though). In such a case, this line (https://github.com/coreos/etcd/blob/master/lease/lessor.go#L224) won't be executed in the cluster? Forgive me if I'm wrong.

heyitsanthony · 2017-02-15T04:49:59Z

@mitake the lease will still be added. The difference is followers will assign an infinite expiration deadline whereas the leader assigns a finite deadline.

mitake · 2017-02-15T05:21:10Z

@heyitsanthony so in some very rare cases, the configured expiration deadline won't be applied until clients configure again?

heyitsanthony · 2017-02-15T05:28:18Z

@mitake it's not a hard deadline. If the leader is lost the lease's expiration deadline is reset via Promote. Requests for the current deadline are forwarded to the current known leader.

mitake · 2017-02-15T05:32:13Z

@heyitsanthony ah I see, so a new leader will handle the progress of time. Thanks a lot for your detailed explanation!

heyitsanthony · 2017-02-15T18:15:50Z

Confusion seems cleared up; closing.

heyitsanthony self-assigned this Feb 14, 2017

heyitsanthony closed this as completed Feb 15, 2017

philips unassigned heyitsanthony Aug 28, 2018

hexfusion mentioned this issue Sep 13, 2018

lease read safety #10082

Closed

mitake mentioned this issue Feb 23, 2023

All leases are revoked when the etcd leader is stuck in handling raft Ready due to slow fdatasync or high CPU. #15247

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: state machine transition caused by the passage of time (especially lease) #7320

Question: state machine transition caused by the passage of time (especially lease) #7320

mitake commented Feb 14, 2017

heyitsanthony commented Feb 14, 2017

mitake commented Feb 15, 2017

heyitsanthony commented Feb 15, 2017

mitake commented Feb 15, 2017

heyitsanthony commented Feb 15, 2017

mitake commented Feb 15, 2017

heyitsanthony commented Feb 15, 2017

Question: state machine transition caused by the passage of time (especially lease) #7320

Question: state machine transition caused by the passage of time (especially lease) #7320

Comments

mitake commented Feb 14, 2017

heyitsanthony commented Feb 14, 2017

mitake commented Feb 15, 2017

heyitsanthony commented Feb 15, 2017

mitake commented Feb 15, 2017

heyitsanthony commented Feb 15, 2017

mitake commented Feb 15, 2017

heyitsanthony commented Feb 15, 2017