-
Notifications
You must be signed in to change notification settings - Fork 719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TSO primary election took 14 minutes after PD(API) Pods being deleted at the same time and PD(API) leader being re-elected #6554
Comments
Maybe etcd client still uses the old connection. |
@lhy1024 Any progress on this issue? |
I am trying to reproduce it. |
This issue includes at least two parts, one of which is a watch issue for the secondary. Regarding whether the delete event is missing, "current leadership is deleted" appears after "required revision has been compacted", indicating that the delete event is later than the previous watch 1800213. So the previous delete event did not miss. However, we cannot exclude the watcher's handling of the compact revision. The pd log shows that the compact of 2353567 was executed at 2:26, the pd was restarted at 3:07, and the tso secondary received the compact from 1800213 to 2353567 only at 3:21. The compact case should have been received half an hour ago. |
The other part is about connections. I found a similar issue where the request failed for 15 minutes after the pod update and resumed thereafter, the comments show it is related to TCP_USER_TIMEOUT. Checking tcp_retries2 may not disconnect effectively when the keepalive mechanism is not triggered, the documentation says that the timeout may be between 13 and 30 minutes, and the documentation also says that TCP_USER_TIMEOUT can be be used to configure. According to https://github.com/grpc/proposal/blob/master/A18-tcp-user-timeout.md, after grpc/grpc-go#2307, TCP_USER_TIMEOUT can be configured via KeepaliveParams to configure TCP_USER_TIMEOUT. But unfortunately, my local attempts with iptable and tcpkill did not reproduce this timeout. |
We can consider temporarily using multiendpoint + keepalive again to avoid this problem, but note that this will cause pd-leader-io-hang to always fail In the meantime, we can also introduce withRequireLeader and the handling of closeErr, and revision = wresp.Header.Revision + 1, before investigating the watch problem in the secondary. |
Maybe there is no any problem with handling of the compact revision, I check the log again.
These logs show that at 13:53 tso last updated the leader to tso-1, some other key updates triggered multiple compacts, and at 03:06 pd restarted, which was still using tso-1 until the connection was re-elected later. To prove this guess, I implemented a simple unit test that updates the other keys and then compacts them to see if the watcher receives the message.
This test proves that if it is just compact, the watcher does not receive information until it updates the key it is watching. |
I also try to reproduct it in k8s
|
close #6554 Signed-off-by: lhy1024 <admin@liudos.us>
close tikv#6554 Signed-off-by: lhy1024 <admin@liudos.us>
Enhancement Task
What did you do?
In dev env,
What did you expect to see?
TSO primary election within 15 seconds
What did you see instead?
TSO primary election took 14 minutes
What version of PD are you using (pd-server -V)?
tidbcloud/pd-cse release-6.6-keyspace 9e1e2de
tikv/pd master
The text was updated successfully, but these errors were encountered: