-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
*: support raft learner in etcd - part 3 #10730
Conversation
(see [error cases when promoting a member] section for more details). | ||
In this case, user should wait and retry later. | ||
|
||
In v3.4, etcd server limits the number of learners that cluster can have to one. The main consideration is to limit the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel we should make this configurable. One seems to be a small number.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some people want to increase read perf. So they want more learners for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @gyuho
I kind of agree that 1 is not enough for some use cases, such as upsizing cluster from 1 to 3, or 3 to 5, and live-migrating a 3-node cluster to a new 3-node cluster. On the other hand, we do not want users to add too many learners at the same time, which might result in too much overhead for the leader. One goal of using learner is to make add members safer - not if it results in too much overhead on leader and then cause the cluster to fail. I think we can hard-code the limit to a small number, and make it configurable as an experimental feature in 3.4.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Configuring this limit is cluster-wide reconfiguration (very similar to member change), which means we need to add an API, and maybe a new config change type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xiang90 Do we keep the limit of 1? I do not have strong opinions on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let us keep this as it is for now. we can remove the limit later.
ping @xiang90 |
CIs have been broken. Should be fix now. Could you rebase? |
@@ -1635,6 +1635,38 @@ func (s *EtcdServer) RemoveMember(ctx context.Context, id uint64) ([]*membership | |||
|
|||
// PromoteMember promotes a learner node to a voting node. | |||
func (s *EtcdServer) PromoteMember(ctx context.Context, id uint64) ([]*membership.Member, error) { | |||
resp, err := s.promoteMember(ctx, id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wont this go through raft? why do we need to forward the promote request to leader in a side channel after it goes through raft?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to decide if the learner is ready to be promoted, before sending the request to raft. Only etcd server whose local raft node is leader has the information on learner's progress. So the request is first routed to leader.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok. now i understand we actually check if this local node is leader inside the s.promoteMember func. I guess we should just move that checking code out of the s.promoteMember func. so it would be more clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead I added comments 23511d2 to explain checking local node is leader is part of the function.
|
||
cctx, cancel := context.WithTimeout(ctx, s.Cfg.ReqTimeout()) | ||
defer cancel() | ||
// forward to leader |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably comment on why do we need to forward to leader.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
LGTM after resolving the merge conflicts and the nits. |
Adjust StrictReconfigCheck logic to accommodate learner members in the cluster.
Make learner return code.Unavailable when the request is not supported by learner. Client balancer will retry a different endpoint.
If member does not exist in cluster, IsLearner will panic.
Hard-coded the maximum number of learners to 1.
Use ReadyNotify instead of time.Sleep to wait for server ready.
If learner is not ready to be promoted, use etcdserver.ErrLearnerNotReady instead of using membership.ErrLearnerNotReady.
Check http StatusCode. Only Unmarshal body if StatusCode is statusOK.
Update TestMemberPromote to include both learner not-ready and learner ready test cases. Removed unit test TestPromoteMember, it requires underlying raft node to be started and running. The member promote is covered by the integration test.
Rebased. |
@jingyih can you check why the CI failed? |
Travis failed with a known flaky test: Semaphore failed with 20m timeout, which is also known: |
Last part of #10645. Continuation of #10727.
This PR incldues the last 12 commits in #10645. Some of the commits are modified during rebasing to #10727.
cc @xiang90