avoid dataIsNotReady error while retrying stale read on the leader #765

Tema · 2023-04-12T18:32:56Z

When Stale Read fails with DataIsNotReady it reties on the leader. as a Stale Read again and still can fail with DataIsNotReady if safe_ts is not advanced on leader either. We can easily avoid it by sending the read against leader without stale flag. This way it always succeeds if there is no active conflict. This should considerably help in cases when safe_ts is not advancing due to long commit transactions.

you06 · 2023-04-14T02:39:14Z

internal/locate/region_request.go

+		if retryTimes > 0 && s.replicaSelector != nil && s.replicaSelector.regionStore != nil &&
+			s.replicaSelector.targetIdx == s.replicaSelector.regionStore.workTiKVIdx {
+			// retry on the leader should not use stale read to avoid possible DataIsNotReady error as it always can serve any read
+			req.StaleRead = false


The region state may not up-to-date in the client, may set req.ReplicaRead to false to avoid it served by a follower replica.

@you06 could you please educate me what is req.ReplicaRead=false supposed to do? My understanding that even if the request lands on the replica, it will do a ReadIndex and return data w/o danger of repeated DataIsNotReady.

Sorry req.ReplicaRead is set to false when stale read, the comment is not correct, my idea is to avoid follower read when retrying, and I'm inclined to read from leader after DataIsNotReady is returned once.
The request may got multi DataIsNotReady before it choosing the leader replica and switch off stale read in your implementation.
If the resolved ts is blocked by write locks(maybe large transactions), there will be DataIsNotReady in another replica node with high possibiliy.
Maybe you can checkout this commit to figure out my suggestion.

I'm inclined to read from leader after DataIsNotReady is returned once.
The request may got multi DataIsNotReady before it choosing the leader replica
Thanks @you06 for explanation. I believe, this should choose the leader after the first DataIsNotReady.

your commit does it at a different level. The big difference is that in my commit this logic only for global stale read, but you commit does it for all. I'm ok do go ahead with your commit and removing, but we will need to remove this code with it as it becomes unrelevant.

@you06 I moved my code to the place you recommended, but keep the approach to rely on storeSelectorOp.leaderOnly instead of resetting selector state completely as the latter might have larger consequences than expected (e.g. result in more retries)

I’ve also ran integration tests with with --with-tikv and it passed all test locally

PASS ok integration_tests 220.231s

This implementation is correct, but I can find some region miss with it, from the metrics, 1/3 of the DataIsNotReady suffers from region miss.

I've carefully debugged with it and find the problem may happens here, when the stale read is served by leader in the first time, it may receive a DataIsNotReady error, the leader.attempts is increased, then the check leader.isExhausted(1) will fail here. Response with region error will retry later, but introduce unexpected latency.

client-go/internal/locate/region_request.go

Lines 595 to 599 in 92db9f7

if leader.isEpochStale() || leader.isExhausted(1) {

metrics.TiKVReplicaSelectorFailureCounter.WithLabelValues("exhausted").Inc()

selector.invalidateRegion()

return nil, nil

}

I think we have 2 options:

Reset leader.attempts in the path with state.option.leaderOnly.

Disable stale read if choosing leader replica to serve the read in the first try.

The leaderOnly flag in the previous implementation of accessFollower does not function as intended, which can be considered a bug.
To implement the "retry leader peer without stale read flag" strategy, an alternative approach is to modify accessFollower.next to return the corresponding leader peer's RPCContext. Then, in SendReqCtx, we can remove the StaleRead flag from the request based on information provided by the returned RPCContext. This ensures that peer selection strategy remains unified and centralized within the selector.

@cfzjywxk @you06 I've moved the logic to accessFollower and bypass the leader.isExhausted(1) check if state.option.leaderOnly is set. I used RPCContext#contextPatcher to reset StaleRedad flag.

Signed-off-by: artem_danilov <artem_danilov@airbnb.com>

zhangjinpeng87 · 2023-04-16T16:27:35Z

If stale read encounter a DataIsNotReady error, it will backoff a duration and retry here https://github.com/tikv/client-go/blob/master/internal/locate/region_request.go#L1863-L1877, in your situation, does the backoff is necessary since the request will transfer to read leader directly?

cfzjywxk · 2023-04-17T12:27:53Z

If stale read encounter a DataIsNotReady error, it will backoff a duration and retry here https://github.com/tikv/client-go/blob/master/internal/locate/region_request.go#L1863-L1877, in your situation, does the backoff is necessary since the request will transfer to read leader directly?

@zhangjinpeng1987
IMO it's unnecessary if the strategy is "just retrying the leader once the replica stale read has failed because of dataIsNotReady", retrying other replicas may also fail if the resolved-ts could not be advanced in time. So backoff is not needed too.

Signed-off-by: artem_danilov <artem_danilov@airbnb.com>

you06

Great improvement by removing backoffer!

Tema · 2023-04-18T19:36:15Z

@cfzjywxk PTAL

cfzjywxk

LGTM. Great improvement~

…ikv#765) * avoid dataIsNotReady error while retrying stale read on the leader Signed-off-by: artem_danilov <artem_danilov@airbnb.com> * move StaleRead flag reset to retry section Signed-off-by: artem_danilov <artem_danilov@airbnb.com> * move all logic to #next and allow retry on the leader Signed-off-by: artem_danilov <artem_danilov@airbnb.com> --------- Signed-off-by: artem_danilov <artem_danilov@airbnb.com> Co-authored-by: artem_danilov <artem_danilov@airbnb.com>

…ikv#765) * avoid dataIsNotReady error while retrying stale read on the leader Signed-off-by: artem_danilov <artem_danilov@airbnb.com> * move StaleRead flag reset to retry section Signed-off-by: artem_danilov <artem_danilov@airbnb.com> * move all logic to #next and allow retry on the leader Signed-off-by: artem_danilov <artem_danilov@airbnb.com> --------- Signed-off-by: artem_danilov <artem_danilov@airbnb.com> Co-authored-by: artem_danilov <artem_danilov@airbnb.com> Signed-off-by: you06 <you1474600@gmail.com>

…) (#789) * avoid dataIsNotReady error while retrying stale read on the leader (#765) * avoid dataIsNotReady error while retrying stale read on the leader Signed-off-by: artem_danilov <artem_danilov@airbnb.com> * move StaleRead flag reset to retry section Signed-off-by: artem_danilov <artem_danilov@airbnb.com> * move all logic to #next and allow retry on the leader Signed-off-by: artem_danilov <artem_danilov@airbnb.com> --------- Signed-off-by: artem_danilov <artem_danilov@airbnb.com> Co-authored-by: artem_danilov <artem_danilov@airbnb.com> Signed-off-by: you06 <you1474600@gmail.com> * add context patcher for 65 Signed-off-by: you06 <you1474600@gmail.com> * fmt Signed-off-by: you06 <you1474600@gmail.com> --------- Signed-off-by: artem_danilov <artem_danilov@airbnb.com> Signed-off-by: you06 <you1474600@gmail.com> Co-authored-by: Artem Danilov <329970+Tema@users.noreply.github.com> Co-authored-by: artem_danilov <artem_danilov@airbnb.com>

cfzjywxk requested review from cfzjywxk, zyguan, you06 and ekexium April 14, 2023 02:10

you06 reviewed Apr 14, 2023

View reviewed changes

avoid dataIsNotReady error while retrying stale read on the leader

df222dc

Signed-off-by: artem_danilov <artem_danilov@airbnb.com>

Tema force-pushed the retry-data-is-not-ready branch from b6dc236 to df222dc Compare April 14, 2023 19:35

move StaleRead flag reset to retry section

58f50b7

Signed-off-by: artem_danilov <artem_danilov@airbnb.com>

move all logic to #next and allow retry on the leader

69c9e18

Signed-off-by: artem_danilov <artem_danilov@airbnb.com>

you06 approved these changes Apr 18, 2023

View reviewed changes

Merge branch 'master' into retry-data-is-not-ready

a9a6bad

cfzjywxk approved these changes Apr 19, 2023

View reviewed changes

cfzjywxk merged commit 4157137 into tikv:master Apr 19, 2023

This was referenced Apr 19, 2023

storage: stale read issues and possible enhancements tikv/tikv#14553

Open

storage: use local read for stale read request on valid leader tikv/tikv#14554

Closed

you06 mentioned this pull request Apr 24, 2023

store/copr: fix missting Request.TxnScope pingcap/tidb#43367

Merged

12 tasks

ti-chi-bot mentioned this pull request Apr 24, 2023

store/copr: fix missting Request.TxnScope (#43367) pingcap/tidb#43374

Merged

12 tasks

cfzjywxk mentioned this pull request Apr 28, 2023

storage: backport the stale read enhancement and bug fix to release 6.5 pingcap/tidb#43481

Closed

you06 mentioned this pull request May 5, 2023

avoid dataIsNotReady error while retrying stale read on the leader(#765) #789

Merged

ti-chi-bot mentioned this pull request May 5, 2023

store/copr: fix missting Request.TxnScope (#43367) pingcap/tidb#43553

Merged

12 tasks

Tema mentioned this pull request Jun 19, 2023

Configurable Replica Read Timeout with Retry pingcap/tidb#44771

Open

cfzjywxk mentioned this pull request Jun 27, 2023

20x performance regression going from v6.5.2 to v6.5.3 on K8s pingcap/tidb#44715

Open

Tema deleted the retry-data-is-not-ready branch July 27, 2023 00:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

avoid dataIsNotReady error while retrying stale read on the leader #765

avoid dataIsNotReady error while retrying stale read on the leader #765

Tema commented Apr 12, 2023

you06 Apr 14, 2023

Tema Apr 14, 2023

you06 Apr 14, 2023

Tema Apr 14, 2023

Tema Apr 14, 2023 •

edited

Loading

you06 Apr 17, 2023

cfzjywxk Apr 17, 2023

Tema Apr 17, 2023

zhangjinpeng87 commented Apr 16, 2023

cfzjywxk commented Apr 17, 2023 •

edited

Loading

you06 left a comment

Tema commented Apr 18, 2023

cfzjywxk left a comment •

edited

Loading

	if leader.isEpochStale() \|\| leader.isExhausted(1) {
	metrics.TiKVReplicaSelectorFailureCounter.WithLabelValues("exhausted").Inc()
	selector.invalidateRegion()
	return nil, nil
	}

avoid dataIsNotReady error while retrying stale read on the leader #765

avoid dataIsNotReady error while retrying stale read on the leader #765

Conversation

Tema commented Apr 12, 2023

you06 Apr 14, 2023

Choose a reason for hiding this comment

Tema Apr 14, 2023

Choose a reason for hiding this comment

you06 Apr 14, 2023

Choose a reason for hiding this comment

Tema Apr 14, 2023

Choose a reason for hiding this comment

Tema Apr 14, 2023 • edited Loading

Choose a reason for hiding this comment

you06 Apr 17, 2023

Choose a reason for hiding this comment

cfzjywxk Apr 17, 2023

Choose a reason for hiding this comment

Tema Apr 17, 2023

Choose a reason for hiding this comment

zhangjinpeng87 commented Apr 16, 2023

cfzjywxk commented Apr 17, 2023 • edited Loading

you06 left a comment

Choose a reason for hiding this comment

Tema commented Apr 18, 2023

cfzjywxk left a comment • edited Loading

Choose a reason for hiding this comment

Tema Apr 14, 2023 •

edited

Loading

cfzjywxk commented Apr 17, 2023 •

edited

Loading

cfzjywxk left a comment •

edited

Loading