fallback to follower when leader is busy #916

you06 · 2023-07-26T12:15:23Z

Fallback to follower when leader is busy.

Inject data-is-not-ready for stale read and server-is-busy for leader, so fallback to leader will be stucked.

With this patch, server-is-busy on leader will try followers.

Signed-off-by: you06 <you1474600@gmail.com>

cfzjywxk · 2023-07-27T03:00:05Z

internal/locate/region_cache.go

@@ -1191,9 +1195,11 @@ func (c *RegionCache) reloadRegion(regionID uint64) {
 		// ignore error and use old region info.
 		logutil.Logger(bo.GetCtx()).Error("load region failure",
 			zap.Uint64("regionID", regionID), zap.Error(err))
+		c.mu.RLock()


Is this an extra fix?

Yes, a possible data race.

internal/locate/region_request.go

cfzjywxk · 2023-07-27T03:06:17Z

internal/locate/region_request.go

 }

 func (state *tryFollower) onSendSuccess(selector *replicaSelector) {
-	if !selector.region.switchWorkLeaderToPeer(selector.targetReplica().peer) {


The former naming and meaning of the switchWorkLeaderToPeer function is quite confusing, I don't understand what's the purpose of it..

The former usage of tryFollower is after the failure of accessKnownLeader, in this case, if one of the follower can serve the leader-read request, it's the new leader, so switch the leader to this peer.

cfzjywxk · 2023-07-27T03:07:23Z

internal/locate/region_request.go

@@ -888,6 +902,22 @@ func (s *replicaSelector) updateLeader(leader *metapb.Peer) {
 	s.region.invalidate(StoreNotFound)
 }

+// For some reason, the leader is unreachable by now, try followers instead.
+func (s *replicaSelector) fallback2Follower(ctx *RPCContext) bool {


By now is the only situation that would be used the stale read fallback -> leader -> fallback replicas?

Yes, when fallbacking to replica from leader, it's a follower read request, not stale read.

cfzjywxk · 2023-07-27T03:10:03Z

/cc @crazycs520 PTAL

Signed-off-by: you06 <you1474600@gmail.com>

crazycs520

LGTM

internal/locate/region_request.go

Co-authored-by: cfzjywxk <cfzjywxk@gmail.com>

ekexium

Is it necessary to give it a hint or constraint on which follower to try (first)? For example we may want it to do the follower read in its local zone as much as possible?

Signed-off-by: you06 <you1474600@gmail.com>

you06 · 2023-07-28T04:18:27Z

Is it necessary to give it a hint or constraint on which follower to try (first)? For example we may want it to do the follower read in its local zone as much as possible?

Implemented this strategy, PTAL.

internal/locate/region_request.go

Signed-off-by: you06 <you1474600@gmail.com>

cfzjywxk · 2023-07-28T09:28:15Z

internal/locate/region_request.go

+
+	if len(state.labels) > 0 {
+		idx, selectReplica := filterReplicas(func(selectReplica *replica) bool {
+			return selectReplica.store.IsLabelsMatch(state.labels)


Is the selectReplica.isExhausted(1) check missing here? How about putting it into the default filterReplicas and pass a nil checker function if there's no labels?

The replica may be exhausted by data-is-not-ready, which does not affect follower read.

* fallback to follower when leader is busy Signed-off-by: you06 <you1474600@gmail.com> * add comment Signed-off-by: you06 <you1474600@gmail.com> * Update internal/locate/region_request.go Co-authored-by: cfzjywxk <cfzjywxk@gmail.com> * after fallback to replica read from leader, retry local follower first Signed-off-by: you06 <you1474600@gmail.com> * address comment Signed-off-by: you06 <you1474600@gmail.com> --------- Signed-off-by: you06 <you1474600@gmail.com> Co-authored-by: cfzjywxk <cfzjywxk@gmail.com>

* fallback to follower when leader is busy Signed-off-by: you06 <you1474600@gmail.com> Co-authored-by: cfzjywxk <cfzjywxk@gmail.com>

* fallback to follower when leader is busy Signed-off-by: you06 <you1474600@gmail.com> Co-authored-by: cfzjywxk <cfzjywxk@gmail.com> Signed-off-by: you06 <you1474600@gmail.com>

* fallback to follower when leader is busy Signed-off-by: you06 <you1474600@gmail.com> Co-authored-by: cfzjywxk <cfzjywxk@gmail.com> Co-authored-by: cfzjywxk <lsswxrxr@163.com>

* reload region cache when store is resolved from invalid status (#843) Signed-off-by: you06 <you1474600@gmail.com> Co-authored-by: disksing <i@disksing.com> * fallback to follower when leader is busy (#916) (#923) * fallback to follower when leader is busy Signed-off-by: you06 <you1474600@gmail.com> Co-authored-by: cfzjywxk <cfzjywxk@gmail.com> Co-authored-by: cfzjywxk <lsswxrxr@163.com> * Resume max retry time check for stale read retry with leader option(#903) (#911) * Resume max retry time check for stale read retry with leader option Signed-off-by: cfzjywxk <lsswxrxr@163.com> * add cancel Signed-off-by: cfzjywxk <lsswxrxr@163.com> --------- Signed-off-by: cfzjywxk <lsswxrxr@163.com> * add region cache state test & fix some issues of replica selector (#910) Signed-off-by: you06 <you1474600@gmail.com> remove duplicate code Signed-off-by: you06 <you1474600@gmail.com> * enable workflow for tidb-7.1 Signed-off-by: you06 <you1474600@gmail.com> * update Signed-off-by: you06 <you1474600@gmail.com> update Signed-off-by: you06 <you1474600@gmail.com> fix test Signed-off-by: you06 <you1474600@gmail.com> fix test Signed-off-by: you06 <you1474600@gmail.com> * lint Signed-off-by: you06 <you1474600@gmail.com> * lint Signed-off-by: you06 <you1474600@gmail.com> * fix flaky test Signed-off-by: you06 <you1474600@gmail.com> --------- Signed-off-by: you06 <you1474600@gmail.com> Signed-off-by: cfzjywxk <lsswxrxr@163.com> Co-authored-by: disksing <i@disksing.com> Co-authored-by: cfzjywxk <cfzjywxk@gmail.com> Co-authored-by: cfzjywxk <lsswxrxr@163.com>

fallback to follower when leader is busy

dff471a

Signed-off-by: you06 <you1474600@gmail.com>

cfzjywxk reviewed Jul 27, 2023

View reviewed changes

add comment

add9f79

Signed-off-by: you06 <you1474600@gmail.com>

crazycs520 approved these changes Jul 27, 2023

View reviewed changes

cfzjywxk requested review from zyguan and ekexium July 27, 2023 12:39

cfzjywxk reviewed Jul 27, 2023

View reviewed changes

internal/locate/region_request.go Outdated Show resolved Hide resolved

Update internal/locate/region_request.go

ebc2437

Co-authored-by: cfzjywxk <cfzjywxk@gmail.com>

ekexium reviewed Jul 27, 2023

View reviewed changes

after fallback to replica read from leader, retry local follower first

aac21b7

Signed-off-by: you06 <you1474600@gmail.com>

cfzjywxk reviewed Jul 28, 2023

View reviewed changes

internal/locate/region_request.go Outdated Show resolved Hide resolved

address comment

7da51eb

Signed-off-by: you06 <you1474600@gmail.com>

cfzjywxk reviewed Jul 28, 2023

View reviewed changes

cfzjywxk approved these changes Jul 28, 2023

View reviewed changes

cfzjywxk requested a review from ekexium July 28, 2023 12:34

cfzjywxk merged commit 8ed240d into tikv:tidb-6.5 Jul 28, 2023
7 of 9 checks passed

you06 mentioned this pull request Aug 2, 2023

use tidb_kv_read_timeout as first kv request timeout #919

Merged

you06 added a commit to you06/client-go that referenced this pull request Aug 3, 2023

fallback to follower when leader is busy (tikv#916)

46bf08a

* fallback to follower when leader is busy Signed-off-by: you06 <you1474600@gmail.com> Co-authored-by: cfzjywxk <cfzjywxk@gmail.com>

This was referenced Aug 3, 2023

fallback to follower when leader is busy (#916) #923

Merged

When leader is selected as target peer at first, no other replicas could be retied for stale read #906

Closed

Only fallback to leader read with data-is-not-ready error #907

Closed

cfzjywxk mentioned this pull request Jul 26, 2023

storage: backport the stale read enhancement and bug fix to release 6.5 pingcap/tidb#43481

Closed

you06 added a commit to you06/client-go that referenced this pull request Aug 7, 2023

fallback to follower when leader is busy (tikv#916)

cf84acf

* fallback to follower when leader is busy Signed-off-by: you06 <you1474600@gmail.com> Co-authored-by: cfzjywxk <cfzjywxk@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fallback to follower when leader is busy #916

fallback to follower when leader is busy #916

you06 commented Jul 26, 2023 •

edited

Loading

cfzjywxk Jul 27, 2023

you06 Jul 27, 2023

cfzjywxk Jul 27, 2023

you06 Jul 27, 2023

cfzjywxk Jul 27, 2023

you06 Jul 27, 2023

cfzjywxk commented Jul 27, 2023

crazycs520 left a comment

ekexium left a comment

you06 commented Jul 28, 2023

cfzjywxk Jul 28, 2023

you06 Jul 28, 2023

fallback to follower when leader is busy #916

fallback to follower when leader is busy #916

Conversation

you06 commented Jul 26, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cfzjywxk commented Jul 27, 2023

crazycs520 left a comment

Choose a reason for hiding this comment

ekexium left a comment

Choose a reason for hiding this comment

you06 commented Jul 28, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

you06 commented Jul 26, 2023 •

edited

Loading