Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fallback to follower when leader is busy #916

Merged
merged 5 commits into from
Jul 28, 2023

Conversation

you06
Copy link
Contributor

@you06 you06 commented Jul 26, 2023

Fallback to follower when leader is busy.

Inject data-is-not-ready for stale read and server-is-busy for leader, so fallback to leader will be stucked.

fail

With this patch, server-is-busy on leader will try followers.

success

Signed-off-by: you06 <you1474600@gmail.com>
@@ -1191,9 +1195,11 @@ func (c *RegionCache) reloadRegion(regionID uint64) {
// ignore error and use old region info.
logutil.Logger(bo.GetCtx()).Error("load region failure",
zap.Uint64("regionID", regionID), zap.Error(err))
c.mu.RLock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this an extra fix?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, a possible data race.

internal/locate/region_request.go Show resolved Hide resolved
}

func (state *tryFollower) onSendSuccess(selector *replicaSelector) {
if !selector.region.switchWorkLeaderToPeer(selector.targetReplica().peer) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The former naming and meaning of the switchWorkLeaderToPeer function is quite confusing, I don't understand what's the purpose of it..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The former usage of tryFollower is after the failure of accessKnownLeader, in this case, if one of the follower can serve the leader-read request, it's the new leader, so switch the leader to this peer.

@@ -888,6 +902,22 @@ func (s *replicaSelector) updateLeader(leader *metapb.Peer) {
s.region.invalidate(StoreNotFound)
}

// For some reason, the leader is unreachable by now, try followers instead.
func (s *replicaSelector) fallback2Follower(ctx *RPCContext) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By now is the only situation that would be used the stale read fallback -> leader -> fallback replicas?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, when fallbacking to replica from leader, it's a follower read request, not stale read.

@cfzjywxk
Copy link
Contributor

/cc @crazycs520 PTAL

Signed-off-by: you06 <you1474600@gmail.com>
Copy link
Contributor

@crazycs520 crazycs520 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@cfzjywxk cfzjywxk requested review from zyguan and ekexium July 27, 2023 12:39
Co-authored-by: cfzjywxk <cfzjywxk@gmail.com>
Copy link
Contributor

@ekexium ekexium left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to give it a hint or constraint on which follower to try (first)? For example we may want it to do the follower read in its local zone as much as possible?

@you06
Copy link
Contributor Author

you06 commented Jul 28, 2023

Is it necessary to give it a hint or constraint on which follower to try (first)? For example we may want it to do the follower read in its local zone as much as possible?

Implemented this strategy, PTAL.

Signed-off-by: you06 <you1474600@gmail.com>

if len(state.labels) > 0 {
idx, selectReplica := filterReplicas(func(selectReplica *replica) bool {
return selectReplica.store.IsLabelsMatch(state.labels)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the selectReplica.isExhausted(1) check missing here? How about putting it into the default filterReplicas and pass a nil checker function if there's no labels?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The replica may be exhausted by data-is-not-ready, which does not affect follower read.

@cfzjywxk cfzjywxk requested a review from ekexium July 28, 2023 12:34
@cfzjywxk cfzjywxk merged commit 8ed240d into tikv:tidb-6.5 Jul 28, 2023
7 of 9 checks passed
you06 added a commit to you06/client-go that referenced this pull request Aug 3, 2023
* fallback to follower when leader is busy

Signed-off-by: you06 <you1474600@gmail.com>

* add comment

Signed-off-by: you06 <you1474600@gmail.com>

* Update internal/locate/region_request.go

Co-authored-by: cfzjywxk <cfzjywxk@gmail.com>

* after fallback to replica read from leader, retry local follower first

Signed-off-by: you06 <you1474600@gmail.com>

* address comment

Signed-off-by: you06 <you1474600@gmail.com>

---------

Signed-off-by: you06 <you1474600@gmail.com>
Co-authored-by: cfzjywxk <cfzjywxk@gmail.com>
you06 added a commit to you06/client-go that referenced this pull request Aug 3, 2023
* fallback to follower when leader is busy

Signed-off-by: you06 <you1474600@gmail.com>
Co-authored-by: cfzjywxk <cfzjywxk@gmail.com>
you06 added a commit to you06/client-go that referenced this pull request Aug 7, 2023
* fallback to follower when leader is busy

Signed-off-by: you06 <you1474600@gmail.com>
Co-authored-by: cfzjywxk <cfzjywxk@gmail.com>
you06 added a commit to you06/client-go that referenced this pull request Aug 11, 2023
* fallback to follower when leader is busy

Signed-off-by: you06 <you1474600@gmail.com>
Co-authored-by: cfzjywxk <cfzjywxk@gmail.com>
Signed-off-by: you06 <you1474600@gmail.com>
cfzjywxk added a commit that referenced this pull request Aug 11, 2023
* fallback to follower when leader is busy

Signed-off-by: you06 <you1474600@gmail.com>
Co-authored-by: cfzjywxk <cfzjywxk@gmail.com>
Co-authored-by: cfzjywxk <lsswxrxr@163.com>
you06 added a commit to you06/client-go that referenced this pull request Aug 14, 2023
* fallback to follower when leader is busy

Signed-off-by: you06 <you1474600@gmail.com>
Co-authored-by: cfzjywxk <cfzjywxk@gmail.com>
Co-authored-by: cfzjywxk <lsswxrxr@163.com>
you06 added a commit to you06/client-go that referenced this pull request Aug 15, 2023
* fallback to follower when leader is busy

Signed-off-by: you06 <you1474600@gmail.com>
Co-authored-by: cfzjywxk <cfzjywxk@gmail.com>
Co-authored-by: cfzjywxk <lsswxrxr@163.com>
you06 added a commit to you06/client-go that referenced this pull request Aug 15, 2023
* fallback to follower when leader is busy

Signed-off-by: you06 <you1474600@gmail.com>
Co-authored-by: cfzjywxk <cfzjywxk@gmail.com>
Co-authored-by: cfzjywxk <lsswxrxr@163.com>
cfzjywxk added a commit that referenced this pull request Sep 26, 2023
* reload region cache when store is resolved from invalid status (#843)

Signed-off-by: you06 <you1474600@gmail.com>
Co-authored-by: disksing <i@disksing.com>

* fallback to follower when leader is busy (#916) (#923)

* fallback to follower when leader is busy

Signed-off-by: you06 <you1474600@gmail.com>
Co-authored-by: cfzjywxk <cfzjywxk@gmail.com>
Co-authored-by: cfzjywxk <lsswxrxr@163.com>

* Resume max retry time check for stale read retry with leader option(#903) (#911)

* Resume max retry time check for stale read retry with leader option

Signed-off-by: cfzjywxk <lsswxrxr@163.com>

* add cancel

Signed-off-by: cfzjywxk <lsswxrxr@163.com>

---------

Signed-off-by: cfzjywxk <lsswxrxr@163.com>

* add region cache state test & fix some issues of replica selector (#910)

Signed-off-by: you06 <you1474600@gmail.com>

remove duplicate code

Signed-off-by: you06 <you1474600@gmail.com>

* enable workflow for tidb-7.1

Signed-off-by: you06 <you1474600@gmail.com>

* update

Signed-off-by: you06 <you1474600@gmail.com>

update

Signed-off-by: you06 <you1474600@gmail.com>

fix test

Signed-off-by: you06 <you1474600@gmail.com>

fix test

Signed-off-by: you06 <you1474600@gmail.com>

* lint

Signed-off-by: you06 <you1474600@gmail.com>

* lint

Signed-off-by: you06 <you1474600@gmail.com>

* fix flaky test

Signed-off-by: you06 <you1474600@gmail.com>

---------

Signed-off-by: you06 <you1474600@gmail.com>
Signed-off-by: cfzjywxk <lsswxrxr@163.com>
Co-authored-by: disksing <i@disksing.com>
Co-authored-by: cfzjywxk <cfzjywxk@gmail.com>
Co-authored-by: cfzjywxk <lsswxrxr@163.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants