-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
store/tikv: keepalive with pd #14118
Conversation
Signed-off-by: nolouch <nolouch@gmail.com>
/rebuild |
@nolouch plugin CI will be fixed after PD repo merge |
|
@DanielZhangQD I have test kill -9 but not found the same issue in k8s. anyway, the keepalive is needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Have we already used keepalive for TiKV? |
LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/run-all-tests |
LGTM |
/merge |
/run-all-tests |
cherry pick to release-2.1 failed |
cherry pick to release-3.0 failed |
It seems that, not for sure, we failed to cherry-pick this commit to release-2.1. Please comment '/run-cherry-picker' to try to trigger the cherry-picker if we did fail to cherry-pick this commit before. @nolouch PTAL. |
Signed-off-by: nolouch nolouch@gmail.com
What problem does this PR solve?
After all 3 instances of PD are killed in AWS(k8s environment), it takes a long time (15 minutes) for TiDB server instances to reconnect to new PD instances. and we found the stale TCP connection after all pod IP is changed.
This problem same as #7099. may k8s CNI dropping all packets send to the removed node(Indeterminate), that cause a stall conneciton, until kernel TCP retransmission times out and closes the connection.
What is changed and how it works?
update pd client and use keepalive in gRPC.
Check List
Tests
no stale connection appear.