-
Notifications
You must be signed in to change notification settings - Fork 220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fallback to follower when leader is busy #916
Conversation
Signed-off-by: you06 <you1474600@gmail.com>
@@ -1191,9 +1195,11 @@ func (c *RegionCache) reloadRegion(regionID uint64) { | |||
// ignore error and use old region info. | |||
logutil.Logger(bo.GetCtx()).Error("load region failure", | |||
zap.Uint64("regionID", regionID), zap.Error(err)) | |||
c.mu.RLock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this an extra fix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, a possible data race.
} | ||
|
||
func (state *tryFollower) onSendSuccess(selector *replicaSelector) { | ||
if !selector.region.switchWorkLeaderToPeer(selector.targetReplica().peer) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The former naming and meaning of the switchWorkLeaderToPeer
function is quite confusing, I don't understand what's the purpose of it..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The former usage of tryFollower
is after the failure of accessKnownLeader
, in this case, if one of the follower can serve the leader-read request, it's the new leader, so switch the leader to this peer.
@@ -888,6 +902,22 @@ func (s *replicaSelector) updateLeader(leader *metapb.Peer) { | |||
s.region.invalidate(StoreNotFound) | |||
} | |||
|
|||
// For some reason, the leader is unreachable by now, try followers instead. | |||
func (s *replicaSelector) fallback2Follower(ctx *RPCContext) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By now is the only situation that would be used the stale read fallback -> leader -> fallback replicas
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, when fallbacking to replica from leader, it's a follower read request, not stale read.
/cc @crazycs520 PTAL |
Signed-off-by: you06 <you1474600@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Co-authored-by: cfzjywxk <cfzjywxk@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary to give it a hint or constraint on which follower to try (first)? For example we may want it to do the follower read in its local zone as much as possible?
Signed-off-by: you06 <you1474600@gmail.com>
Implemented this strategy, PTAL. |
Signed-off-by: you06 <you1474600@gmail.com>
|
||
if len(state.labels) > 0 { | ||
idx, selectReplica := filterReplicas(func(selectReplica *replica) bool { | ||
return selectReplica.store.IsLabelsMatch(state.labels) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the selectReplica.isExhausted(1)
check missing here? How about putting it into the default filterReplicas
and pass a nil
checker function if there's no labels?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The replica may be exhausted by data-is-not-ready, which does not affect follower read.
* fallback to follower when leader is busy Signed-off-by: you06 <you1474600@gmail.com> * add comment Signed-off-by: you06 <you1474600@gmail.com> * Update internal/locate/region_request.go Co-authored-by: cfzjywxk <cfzjywxk@gmail.com> * after fallback to replica read from leader, retry local follower first Signed-off-by: you06 <you1474600@gmail.com> * address comment Signed-off-by: you06 <you1474600@gmail.com> --------- Signed-off-by: you06 <you1474600@gmail.com> Co-authored-by: cfzjywxk <cfzjywxk@gmail.com>
* fallback to follower when leader is busy Signed-off-by: you06 <you1474600@gmail.com> Co-authored-by: cfzjywxk <cfzjywxk@gmail.com>
* fallback to follower when leader is busy Signed-off-by: you06 <you1474600@gmail.com> Co-authored-by: cfzjywxk <cfzjywxk@gmail.com>
* fallback to follower when leader is busy Signed-off-by: you06 <you1474600@gmail.com> Co-authored-by: cfzjywxk <cfzjywxk@gmail.com> Signed-off-by: you06 <you1474600@gmail.com>
* reload region cache when store is resolved from invalid status (#843) Signed-off-by: you06 <you1474600@gmail.com> Co-authored-by: disksing <i@disksing.com> * fallback to follower when leader is busy (#916) (#923) * fallback to follower when leader is busy Signed-off-by: you06 <you1474600@gmail.com> Co-authored-by: cfzjywxk <cfzjywxk@gmail.com> Co-authored-by: cfzjywxk <lsswxrxr@163.com> * Resume max retry time check for stale read retry with leader option(#903) (#911) * Resume max retry time check for stale read retry with leader option Signed-off-by: cfzjywxk <lsswxrxr@163.com> * add cancel Signed-off-by: cfzjywxk <lsswxrxr@163.com> --------- Signed-off-by: cfzjywxk <lsswxrxr@163.com> * add region cache state test & fix some issues of replica selector (#910) Signed-off-by: you06 <you1474600@gmail.com> remove duplicate code Signed-off-by: you06 <you1474600@gmail.com> * enable workflow for tidb-7.1 Signed-off-by: you06 <you1474600@gmail.com> * update Signed-off-by: you06 <you1474600@gmail.com> update Signed-off-by: you06 <you1474600@gmail.com> fix test Signed-off-by: you06 <you1474600@gmail.com> fix test Signed-off-by: you06 <you1474600@gmail.com> * lint Signed-off-by: you06 <you1474600@gmail.com> * lint Signed-off-by: you06 <you1474600@gmail.com> * fix flaky test Signed-off-by: you06 <you1474600@gmail.com> --------- Signed-off-by: you06 <you1474600@gmail.com> Signed-off-by: cfzjywxk <lsswxrxr@163.com> Co-authored-by: disksing <i@disksing.com> Co-authored-by: cfzjywxk <cfzjywxk@gmail.com> Co-authored-by: cfzjywxk <lsswxrxr@163.com>
Fallback to follower when leader is busy.
Inject data-is-not-ready for stale read and server-is-busy for leader, so fallback to leader will be stucked.
With this patch, server-is-busy on leader will try followers.