-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Lease] Refactor lease renew request via raft #14094
base: main
Are you sure you want to change the base?
Conversation
Codecov Report
@@ Coverage Diff @@
## main #14094 +/- ##
==========================================
- Coverage 75.08% 75.04% -0.05%
==========================================
Files 452 452
Lines 36781 36826 +45
==========================================
+ Hits 27618 27636 +18
- Misses 7424 7447 +23
- Partials 1739 1743 +4
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
Discussion on lease issues and proposed fixes are so spread I find myself unable to comprehend it all. @ahrtr can you help me understand how this change refers to #13915 and what is our overall plan. It would be great to have one umbrella issue that shows high level plan so we can make sure we are going in right direction. Not sure I remember ever discussing moving Grant to raft as there were some performance concerns. With a lot of leases, grant requests can be done very frequently and they are latency sensitive making them much more vulnerable to network hiccup. Before we proceed with this PR would be good to dig up the original discussion and do some loadtesting. I haven't look to deep into this PR so I might be wrong about performance concerns. |
This is a standalone refactor on It took me sometime to go through some historical PRs ( 9924, 9526, 9699, 13508 ) before delivering this PR. I have thought about three solutions (see below) to refactor lease, but eventually I realized that none of them are feasible.
This solution can greatly simplify the overall design and implementation, because we don't need the checkpoint functionality anymore. If a member's time jumps forwards or backwards drastically, then we will run into issue no matter which solution we follow. But this solution will be even worse in this case, because existing implementation will only be impacted when the leader's time jumps; but this solution might run into issue if any member's time jumps. Of course, each time revoking & renewing can fix & reset the time jumps. Another downside of this solution is when the cluster is down for some time (such as maintenance window), then all leases might be expired when the cluster starts again. The existing implementation will not have this issue, because it persists the remainingTTL instead of expiryTime. The customer/client shouldn't pay for the server side issue. So persisting remainingTTL makes more sense then persisting expiryTime from this perspective.
Speaking to the overall plan, my thoughts are:
So in summary, we only need to refactor renew (this PR) and modify |
49dcdc9
to
96bf559
Compare
eb441c1
to
da0353a
Compare
62e541c
to
5bc4390
Compare
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions. |
Executed: 1. ./scripts/genproto.sh 2. ./scripts/update_proto_annotations.sh Signed-off-by: Benjamin Wang <wachao@vmware.com>
ac7d4b1
to
899a60a
Compare
eb63556
to
c16bc84
Compare
c16bc84
to
ba9d731
Compare
Previously, the renew request can only be processed by the leader. If a follower receives the renew request, it just forwards the request to the leader via a internal http channel. This isn't accurate because the leader may change during the process. When a leader receives the renew request, the previous implementation follows a three stage workflow: pre-raft, raft and post-raft. It's too complicated and error prone, and the raft is more like just a network transport channel instead of a concensus mechanism in this case. So we process the renew request via raft directly, it can greatly simplify the code. Signed-off-by: Benjamin Wang <wachao@vmware.com>
…>= 3.6 Signed-off-by: Benjamin Wang <wachao@vmware.com>
ba9d731
to
bafc656
Compare
@@ -207,6 +208,11 @@ func (a *applierV3backend) LeaseRevoke(lc *pb.LeaseRevokeRequest) (*pb.LeaseRevo | |||
return &pb.LeaseRevokeResponse{Header: a.newHeader()}, err | |||
} | |||
|
|||
func (a *applierV3backend) LeaseRenew(lc *pb.LeaseKeepAliveRequest) (*pb.LeaseKeepAliveResponse, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's still possible to apply LeaseRenew
entries issued by a stale leader unless it provides a mechanism like comparing term #15247 (comment) ?
I think all Raft messages issued by etcd itself might have similar problems potentially.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a simpler approach is not using MsgProp
and issuing lease related requests as MsgApp
(related discussion: #15944 (comment)). With this approach we might be able to solve the issue without changing the WAL format. Its implementation will be tricky though. I'll try this idea this weekend if I can have time.
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
@ahrtr: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Previously the lease renew request can only be processed by the leader. When a follower receives the renew request, it just forwards the request to the leader via an internal http channel. This isn't accurate because the leader may change during the process.
When a leader receives the renew request, the previous implementation follows a three stages workflow: pre-raft, raft and post-raft. It's too complicated and error prone, and the raft is more like just a network transport channel instead of a consensus mechanism in this case. Please also see issuecomment-975817268.
So in this PR, we process the renew request via the raft directly, it can greatly simplify the code.
The client facing API keeps unchanged, so it has no any impact on client applications, including Kubernetes.