Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reload region cache when store is resolved from invalid status #843

Merged
merged 14 commits into from
Jul 14, 2023

Conversation

you06
Copy link
Contributor

@you06 you06 commented Jun 16, 2023

close #841

Before this PR:
image

image

With this PR:
image

image

Signed-off-by: you06 <you1474600@gmail.com>
@you06 you06 changed the title reload region cache when store is resolved from invalid reload region cache when store is resolved from invalid status Jun 16, 2023
Signed-off-by: you06 <you1474600@gmail.com>
Signed-off-by: you06 <you1474600@gmail.com>
@@ -573,18 +573,22 @@ func (state *accessFollower) next(bo *retry.Backoffer, selector *replicaSelector
if state.option.preferLeader {
state.lastIdx = state.leaderIdx
}
offset := rand.Intn(replicaSize)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under PreferLeader mode, offset should be set to 0 to make the selection starting from leaderIdx.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I changed the code and not to use offset in the first selection.

Signed-off-by: you06 <you1474600@gmail.com>
Signed-off-by: you06 <you1474600@gmail.com>
@cfzjywxk
Copy link
Contributor

/cc @crazycs520

// as candidates to serve the Read request. Meanwhile, we should make the choice of next() meet Uniform Distribution.
for cnt := 0; cnt < replicaSize && !state.isCandidate(idx, selector.replicas[idx]); cnt++ {
idx = AccessIndex((int(idx) + rand.Intn(replicaSize)) % replicaSize)
var idx AccessIndex
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still recommend to revert to the original one, current implementation is bit more complex and hard to read.

Signed-off-by: you06 <you1474600@gmail.com>
// async reload triggered by other thread.
return
}
go func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be risky to spawn a reload task for each region, especially if many regions become invalid due to the related store being marked as such. To mitigate this risk, we should implement a refreshing strategy in the region cache-related workers and consider incorporating the store's "health checker."

@crazycs520 is now considering refactoring the region cache component, he could give some advice about it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the v6.5.x release version, a quick fix may still be needed.

@@ -573,18 +573,36 @@ func (state *accessFollower) next(bo *retry.Backoffer, selector *replicaSelector
if state.option.preferLeader {
state.lastIdx = state.leaderIdx
}
var offset int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Peer selection can become quite complex when the accessFollower is used for all follower read and stale read processing, with various configurations. To prevent any potential issues, it may be beneficial to separate them into different categories.

ReplicaSelector itself is quite complex too..

Copy link
Contributor

@cfzjywxk cfzjywxk Jun 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussing with @you06, we could do some refactoring on the master branch like

match access_type {
     leade_only =>
     follower_read_random =>
     follower_read_replica_only =>
     follower_read_closet => 
     stale_read_random => 
     stale_read_closest =>
     stale_read_retry =>
}

while keeping the change on release-6.5 as simple as possible to reduce the risk.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the refactor can be in another PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok let's merge this first

cfzjywxk pushed a commit that referenced this pull request Jun 28, 2023
#846)

* reload region cache when store is resolved from invalid

Signed-off-by: you06 <you1474600@gmail.com>

* reload region once

Signed-off-by: you06 <you1474600@gmail.com>

* build in go1.18

Signed-off-by: you06 <you1474600@gmail.com>

* build in go1.18

Signed-off-by: you06 <you1474600@gmail.com>

* handle region reload in resolve goroutine

Signed-off-by: you06 <you1474600@gmail.com>

* retest

Signed-off-by: you06 <you1474600@gmail.com>

* fix data race (#736)

Signed-off-by: Smityz <smityz@qq.com>
Co-authored-by: disksing <i@disksing.com>
Signed-off-by: you06 <you1474600@gmail.com>

* build with go 1.18

Signed-off-by: you06 <you1474600@gmail.com>

* fix integration test (#673)

Signed-off-by: disksing <i@disksing.com>
Signed-off-by: you06 <you1474600@gmail.com>

* Update internal/locate/region_cache.go

Co-authored-by: crazycs <crazycs520@gmail.com>
Signed-off-by: you06 <you1474600@gmail.com>

* address comment

Signed-off-by: you06 <you1474600@gmail.com>

* address comment

Signed-off-by: you06 <you1474600@gmail.com>

---------

Signed-off-by: you06 <you1474600@gmail.com>
Signed-off-by: Smityz <smityz@qq.com>
Signed-off-by: disksing <i@disksing.com>
Co-authored-by: Smilencer <smityz@qq.com>
Co-authored-by: disksing <i@disksing.com>
Co-authored-by: crazycs <crazycs520@gmail.com>
internal/locate/region_cache.go Show resolved Hide resolved
internal/locate/region_cache.go Outdated Show resolved Hide resolved
@cfzjywxk cfzjywxk requested a review from ekexium June 29, 2023 07:18
Signed-off-by: you06 <you1474600@gmail.com>
@you06 you06 mentioned this pull request Jul 5, 2023
@disksing disksing merged commit 85fc8f3 into tikv:master Jul 14, 2023
iosmanthus added a commit that referenced this pull request Aug 11, 2023
* client-go: add some key range info to error when PD returned no region (#862)

Signed-off-by: Chao Wang <cclcwangchao@hotmail.com>

* *: refine non-global stale-read request retry logic (#863)

Signed-off-by: crazycs520 <crazycs520@gmail.com>

* Fix the issue that primary pessimistic lock may be left not cleared after GC (#866)

* Fix the issue that primary pessimistic lock may be left not cleared after GC

Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com>

* Fix mysteriously shown up thing that makes compilation failed

Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com>

* Fix test effectiveness (forgot to set txn2 to pessimistic txn); add more strict checks

Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com>

* Address comments

Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com>

---------

Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com>
Co-authored-by: MyonKeminta <MyonKeminta@users.noreply.github.com>

* add explicit request source type to label the external request like lightning/br (#868)

Signed-off-by: nolouch <nolouch@gmail.com>

* use '%d' instead of '%q' for some int values in error message (#875)

Signed-off-by: Chao Wang <cclcwangchao@hotmail.com>

* format key in error message in method `scanRegions` (#876)

Signed-off-by: Chao Wang <cclcwangchao@hotmail.com>

* make cop request timeout a config paramter (#865)

* update

Signed-off-by: Spade A <u6748471@anu.edu.au>

* update

Signed-off-by: Spade A <u6748471@anu.edu.au>

* update

Signed-off-by: Spade A <u6748471@anu.edu.au>

* update

Signed-off-by: Spade A <u6748471@anu.edu.au>

---------

Signed-off-by: Spade A <u6748471@anu.edu.au>

* region_cache: support check pending tiflash peer (#821)

Signed-off-by: guo-shaoge <shaoge1994@163.com>
Co-authored-by: disksing <i@disksing.com>

* *: add `SnapshotIterReverse` and make `iterReverse` supports `lowerBound` (#883)

Signed-off-by: Jason Mo <mohangjie1995@gmail.com>

* *: fix stale read ops metric (#878) (#889)

Signed-off-by: crazycs520 <crazycs520@gmail.com>
Co-authored-by: disksing <i@disksing.com>

* add gc options (#828)

Signed-off-by: weedge <weege007@gmail.com>
Co-authored-by: disksing <i@disksing.com>

* reload region cache when store is resolved from invalid status (#843)

Signed-off-by: you06 <you1474600@gmail.com>
Co-authored-by: disksing <i@disksing.com>

* ci: update setup-go action (#904)

Signed-off-by: disksing <i@disksing.com>

* fix unexpected slow query during GC running after stop 1 tikv-server (#899) (#909)

* fix unexpected slow query during GC running after stop 1 tikv-server

Signed-off-by: crazycs520 <crazycs520@gmail.com>

* fix test

Signed-off-by: crazycs520 <crazycs520@gmail.com>

---------

Signed-off-by: crazycs520 <crazycs520@gmail.com>

* resource_manager: ignore ru metrics for background request (#872)

Signed-off-by: husharp <jinhao.hu@pingcap.com>
Co-authored-by: disksing <i@disksing.com>

* add more log for diagnose (#915)

* add more log for diagnose

Signed-off-by: crazycs520 <crazycs520@gmail.com>

* fix

Signed-off-by: crazycs520 <crazycs520@gmail.com>

* add more log for diagnose

Signed-off-by: crazycs520 <crazycs520@gmail.com>

* add more log

Signed-off-by: crazycs520 <crazycs520@gmail.com>

* address comment

Signed-off-by: crazycs520 <crazycs520@gmail.com>

---------

Signed-off-by: crazycs520 <crazycs520@gmail.com>

* use context logger as much as possible (#908)

* use context logger as much as possible

Signed-off-by: crazycs520 <crazycs520@gmail.com>

* refine

Signed-off-by: crazycs520 <crazycs520@gmail.com>

---------

Signed-off-by: crazycs520 <crazycs520@gmail.com>

* Resume max retry time check for stale read retry with leader option(#903) (#911)

* Resume max retry time check for stale read retry with leader option

Signed-off-by: cfzjywxk <lsswxrxr@163.com>

* add cancel

Signed-off-by: cfzjywxk <lsswxrxr@163.com>

---------

Signed-off-by: cfzjywxk <lsswxrxr@163.com>

* request_source: remove default label (#890)

* request_source: remove default label

Signed-off-by: nolouch <nolouch@gmail.com>

* add a function to set request source task type (#925)

* add a function to set request source task type

Signed-off-by: glorv <glorvs@163.com>

* ci: update go version (#936)

* ci: update go version

Signed-off-by: crazycs520 <crazycs520@gmail.com>

* fix test

Signed-off-by: crazycs520 <crazycs520@gmail.com>

---------

Signed-off-by: crazycs520 <crazycs520@gmail.com>

* use tidb_kv_read_timeout as first kv request timeout (#919)

* support tidb_kv_read_timeout as first round kv request timeout

Signed-off-by: crazycs520 <crazycs520@gmail.com>

* fix ci

Signed-off-by: crazycs520 <crazycs520@gmail.com>

* fix ci

Signed-off-by: crazycs520 <crazycs520@gmail.com>

* fix ci

Signed-off-by: crazycs520 <crazycs520@gmail.com>

* fix ci

Signed-off-by: crazycs520 <crazycs520@gmail.com>

* fix ci

Signed-off-by: crazycs520 <crazycs520@gmail.com>

* update comment

Signed-off-by: crazycs520 <crazycs520@gmail.com>

* refine test

Signed-off-by: crazycs520 <crazycs520@gmail.com>

---------

Signed-off-by: crazycs520 <crazycs520@gmail.com>

* [pick] resource_control: bypass some internal urgent request (#938)

* resource_control: bypass some internal urgent request (#884)

Signed-off-by: nolouch <nolouch@gmail.com>

* resourcecontrol: fix nil pointer (#900)

Signed-off-by: nolouch <nolouch@gmail.com>

---------

Signed-off-by: nolouch <nolouch@gmail.com>

---------

Signed-off-by: Chao Wang <cclcwangchao@hotmail.com>
Signed-off-by: crazycs520 <crazycs520@gmail.com>
Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com>
Signed-off-by: nolouch <nolouch@gmail.com>
Signed-off-by: Spade A <u6748471@anu.edu.au>
Signed-off-by: guo-shaoge <shaoge1994@163.com>
Signed-off-by: Jason Mo <mohangjie1995@gmail.com>
Signed-off-by: weedge <weege007@gmail.com>
Signed-off-by: you06 <you1474600@gmail.com>
Signed-off-by: disksing <i@disksing.com>
Signed-off-by: husharp <jinhao.hu@pingcap.com>
Signed-off-by: cfzjywxk <lsswxrxr@163.com>
Signed-off-by: glorv <glorvs@163.com>
Signed-off-by: iosmanthus <myosmanthustree@gmail.com>
Co-authored-by: 王超 <cclcwangchao@hotmail.com>
Co-authored-by: crazycs <crazycs520@gmail.com>
Co-authored-by: MyonKeminta <9948422+MyonKeminta@users.noreply.github.com>
Co-authored-by: MyonKeminta <MyonKeminta@users.noreply.github.com>
Co-authored-by: ShuNing <nolouch@gmail.com>
Co-authored-by: Spade  A <71589810+SpadeA-Tang@users.noreply.github.com>
Co-authored-by: guo-shaoge <shaoge1994@163.com>
Co-authored-by: disksing <i@disksing.com>
Co-authored-by: Hangjie Mo <mohangjie1995@gmail.com>
Co-authored-by: weedge <weege007@gmail.com>
Co-authored-by: you06 <you1474600@gmail.com>
Co-authored-by: Hu# <jinhao.hu@pingcap.com>
Co-authored-by: cfzjywxk <lsswxrxr@163.com>
Co-authored-by: glorv <glorvs@163.com>
you06 added a commit to you06/client-go that referenced this pull request Aug 11, 2023
)

Signed-off-by: you06 <you1474600@gmail.com>
Co-authored-by: disksing <i@disksing.com>
you06 added a commit to you06/client-go that referenced this pull request Aug 15, 2023
)

Signed-off-by: you06 <you1474600@gmail.com>
Co-authored-by: disksing <i@disksing.com>
you06 added a commit to you06/client-go that referenced this pull request Aug 15, 2023
)

Signed-off-by: you06 <you1474600@gmail.com>
Co-authored-by: disksing <i@disksing.com>
cfzjywxk added a commit that referenced this pull request Sep 26, 2023
* reload region cache when store is resolved from invalid status (#843)

Signed-off-by: you06 <you1474600@gmail.com>
Co-authored-by: disksing <i@disksing.com>

* fallback to follower when leader is busy (#916) (#923)

* fallback to follower when leader is busy

Signed-off-by: you06 <you1474600@gmail.com>
Co-authored-by: cfzjywxk <cfzjywxk@gmail.com>
Co-authored-by: cfzjywxk <lsswxrxr@163.com>

* Resume max retry time check for stale read retry with leader option(#903) (#911)

* Resume max retry time check for stale read retry with leader option

Signed-off-by: cfzjywxk <lsswxrxr@163.com>

* add cancel

Signed-off-by: cfzjywxk <lsswxrxr@163.com>

---------

Signed-off-by: cfzjywxk <lsswxrxr@163.com>

* add region cache state test & fix some issues of replica selector (#910)

Signed-off-by: you06 <you1474600@gmail.com>

remove duplicate code

Signed-off-by: you06 <you1474600@gmail.com>

* enable workflow for tidb-7.1

Signed-off-by: you06 <you1474600@gmail.com>

* update

Signed-off-by: you06 <you1474600@gmail.com>

update

Signed-off-by: you06 <you1474600@gmail.com>

fix test

Signed-off-by: you06 <you1474600@gmail.com>

fix test

Signed-off-by: you06 <you1474600@gmail.com>

* lint

Signed-off-by: you06 <you1474600@gmail.com>

* lint

Signed-off-by: you06 <you1474600@gmail.com>

* fix flaky test

Signed-off-by: you06 <you1474600@gmail.com>

---------

Signed-off-by: you06 <you1474600@gmail.com>
Signed-off-by: cfzjywxk <lsswxrxr@163.com>
Co-authored-by: disksing <i@disksing.com>
Co-authored-by: cfzjywxk <cfzjywxk@gmail.com>
Co-authored-by: cfzjywxk <lsswxrxr@163.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

need to reload region cache when invalid store become health
5 participants