-
Notifications
You must be signed in to change notification settings - Fork 220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix unexpected slow query during GC running after stop 1 tikv-server #899
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: crazycs520 <crazycs520@gmail.com>
This problem reminds me of the issue from the TiDB forum: https://ask.pingcap.com/t/every-10-minutes-my-in-flight-stale-reads-fail/518 |
cfzjywxk
reviewed
Jul 24, 2023
Signed-off-by: crazycs520 <crazycs520@gmail.com>
cfzjywxk
approved these changes
Jul 24, 2023
you06
approved these changes
Jul 24, 2023
MyonKeminta
approved these changes
Jul 24, 2023
zyguan
reviewed
Jul 24, 2023
Signed-off-by: crazycs520 <crazycs520@gmail.com>
zyguan
approved these changes
Jul 24, 2023
/hold since the test failed. |
crazycs520
added a commit
to crazycs520/client-go
that referenced
this pull request
Aug 7, 2023
…ikv#899) Signed-off-by: crazycs520 <crazycs520@gmail.com>
iosmanthus
added a commit
that referenced
this pull request
Aug 11, 2023
* client-go: add some key range info to error when PD returned no region (#862) Signed-off-by: Chao Wang <cclcwangchao@hotmail.com> * *: refine non-global stale-read request retry logic (#863) Signed-off-by: crazycs520 <crazycs520@gmail.com> * Fix the issue that primary pessimistic lock may be left not cleared after GC (#866) * Fix the issue that primary pessimistic lock may be left not cleared after GC Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com> * Fix mysteriously shown up thing that makes compilation failed Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com> * Fix test effectiveness (forgot to set txn2 to pessimistic txn); add more strict checks Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com> * Address comments Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com> --------- Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com> Co-authored-by: MyonKeminta <MyonKeminta@users.noreply.github.com> * add explicit request source type to label the external request like lightning/br (#868) Signed-off-by: nolouch <nolouch@gmail.com> * use '%d' instead of '%q' for some int values in error message (#875) Signed-off-by: Chao Wang <cclcwangchao@hotmail.com> * format key in error message in method `scanRegions` (#876) Signed-off-by: Chao Wang <cclcwangchao@hotmail.com> * make cop request timeout a config paramter (#865) * update Signed-off-by: Spade A <u6748471@anu.edu.au> * update Signed-off-by: Spade A <u6748471@anu.edu.au> * update Signed-off-by: Spade A <u6748471@anu.edu.au> * update Signed-off-by: Spade A <u6748471@anu.edu.au> --------- Signed-off-by: Spade A <u6748471@anu.edu.au> * region_cache: support check pending tiflash peer (#821) Signed-off-by: guo-shaoge <shaoge1994@163.com> Co-authored-by: disksing <i@disksing.com> * *: add `SnapshotIterReverse` and make `iterReverse` supports `lowerBound` (#883) Signed-off-by: Jason Mo <mohangjie1995@gmail.com> * *: fix stale read ops metric (#878) (#889) Signed-off-by: crazycs520 <crazycs520@gmail.com> Co-authored-by: disksing <i@disksing.com> * add gc options (#828) Signed-off-by: weedge <weege007@gmail.com> Co-authored-by: disksing <i@disksing.com> * reload region cache when store is resolved from invalid status (#843) Signed-off-by: you06 <you1474600@gmail.com> Co-authored-by: disksing <i@disksing.com> * ci: update setup-go action (#904) Signed-off-by: disksing <i@disksing.com> * fix unexpected slow query during GC running after stop 1 tikv-server (#899) (#909) * fix unexpected slow query during GC running after stop 1 tikv-server Signed-off-by: crazycs520 <crazycs520@gmail.com> * fix test Signed-off-by: crazycs520 <crazycs520@gmail.com> --------- Signed-off-by: crazycs520 <crazycs520@gmail.com> * resource_manager: ignore ru metrics for background request (#872) Signed-off-by: husharp <jinhao.hu@pingcap.com> Co-authored-by: disksing <i@disksing.com> * add more log for diagnose (#915) * add more log for diagnose Signed-off-by: crazycs520 <crazycs520@gmail.com> * fix Signed-off-by: crazycs520 <crazycs520@gmail.com> * add more log for diagnose Signed-off-by: crazycs520 <crazycs520@gmail.com> * add more log Signed-off-by: crazycs520 <crazycs520@gmail.com> * address comment Signed-off-by: crazycs520 <crazycs520@gmail.com> --------- Signed-off-by: crazycs520 <crazycs520@gmail.com> * use context logger as much as possible (#908) * use context logger as much as possible Signed-off-by: crazycs520 <crazycs520@gmail.com> * refine Signed-off-by: crazycs520 <crazycs520@gmail.com> --------- Signed-off-by: crazycs520 <crazycs520@gmail.com> * Resume max retry time check for stale read retry with leader option(#903) (#911) * Resume max retry time check for stale read retry with leader option Signed-off-by: cfzjywxk <lsswxrxr@163.com> * add cancel Signed-off-by: cfzjywxk <lsswxrxr@163.com> --------- Signed-off-by: cfzjywxk <lsswxrxr@163.com> * request_source: remove default label (#890) * request_source: remove default label Signed-off-by: nolouch <nolouch@gmail.com> * add a function to set request source task type (#925) * add a function to set request source task type Signed-off-by: glorv <glorvs@163.com> * ci: update go version (#936) * ci: update go version Signed-off-by: crazycs520 <crazycs520@gmail.com> * fix test Signed-off-by: crazycs520 <crazycs520@gmail.com> --------- Signed-off-by: crazycs520 <crazycs520@gmail.com> * use tidb_kv_read_timeout as first kv request timeout (#919) * support tidb_kv_read_timeout as first round kv request timeout Signed-off-by: crazycs520 <crazycs520@gmail.com> * fix ci Signed-off-by: crazycs520 <crazycs520@gmail.com> * fix ci Signed-off-by: crazycs520 <crazycs520@gmail.com> * fix ci Signed-off-by: crazycs520 <crazycs520@gmail.com> * fix ci Signed-off-by: crazycs520 <crazycs520@gmail.com> * fix ci Signed-off-by: crazycs520 <crazycs520@gmail.com> * update comment Signed-off-by: crazycs520 <crazycs520@gmail.com> * refine test Signed-off-by: crazycs520 <crazycs520@gmail.com> --------- Signed-off-by: crazycs520 <crazycs520@gmail.com> * [pick] resource_control: bypass some internal urgent request (#938) * resource_control: bypass some internal urgent request (#884) Signed-off-by: nolouch <nolouch@gmail.com> * resourcecontrol: fix nil pointer (#900) Signed-off-by: nolouch <nolouch@gmail.com> --------- Signed-off-by: nolouch <nolouch@gmail.com> --------- Signed-off-by: Chao Wang <cclcwangchao@hotmail.com> Signed-off-by: crazycs520 <crazycs520@gmail.com> Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com> Signed-off-by: nolouch <nolouch@gmail.com> Signed-off-by: Spade A <u6748471@anu.edu.au> Signed-off-by: guo-shaoge <shaoge1994@163.com> Signed-off-by: Jason Mo <mohangjie1995@gmail.com> Signed-off-by: weedge <weege007@gmail.com> Signed-off-by: you06 <you1474600@gmail.com> Signed-off-by: disksing <i@disksing.com> Signed-off-by: husharp <jinhao.hu@pingcap.com> Signed-off-by: cfzjywxk <lsswxrxr@163.com> Signed-off-by: glorv <glorvs@163.com> Signed-off-by: iosmanthus <myosmanthustree@gmail.com> Co-authored-by: 王超 <cclcwangchao@hotmail.com> Co-authored-by: crazycs <crazycs520@gmail.com> Co-authored-by: MyonKeminta <9948422+MyonKeminta@users.noreply.github.com> Co-authored-by: MyonKeminta <MyonKeminta@users.noreply.github.com> Co-authored-by: ShuNing <nolouch@gmail.com> Co-authored-by: Spade A <71589810+SpadeA-Tang@users.noreply.github.com> Co-authored-by: guo-shaoge <shaoge1994@163.com> Co-authored-by: disksing <i@disksing.com> Co-authored-by: Hangjie Mo <mohangjie1995@gmail.com> Co-authored-by: weedge <weege007@gmail.com> Co-authored-by: you06 <you1474600@gmail.com> Co-authored-by: Hu# <jinhao.hu@pingcap.com> Co-authored-by: cfzjywxk <lsswxrxr@163.com> Co-authored-by: glorv <glorvs@163.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
close #898
Why issue #898 happen?
After stop 1 tikv-server, some region replicas are marked by
replica.isEpochStale
istrue
. ThenaccessFollower
won't choose the replica anymore.But when TiDB GC leader start to running GC, it will reload all region, then all region replicas epoch will be update, which means all region replica's
isEpochStale
will change tofalse
. ThenaccessFollower
may choose the replica which in down tikv-server. Then TiDB may send kv request to down tikv-server will receivecontext deadline exceeded
error and re-send kv request to the region leader. This is what causes slow queries.How this fix work?
In short,
accessFollower
need to check the replica's storeLivenessState
when chose target replica.Before This PR:
This PR: