Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TSO primary election took 14 minutes after PD(API) Pods being deleted at the same time and PD(API) leader being re-elected #6554

Closed
binshi-bing opened this issue Jun 5, 2023 · 8 comments · Fixed by #6577
Assignees
Labels
type/enhancement The issue or PR belongs to an enhancement.

Comments

@binshi-bing
Copy link
Contributor

binshi-bing commented Jun 5, 2023

Enhancement Task

What did you do?
In dev env,

  1. kubectl delete pod serverless-cluster-pd-0 serverless-cluster-pd-1 serverless-cluster-pd-2 -n tidb-serverless --force --grace-period=0
  2. tso_0 secondary
[2023/06/02 03:06:31.612 +00:00] [ERROR] [register.go:82] ["keep alive failed"] [key=/ms/7187976276065784319/tso/registry/http://pd-tso-server-0.tso-service.tidb-serverless.svc:2379]
**repeat the following 4 lines every 1s**
[2023/06/02 03:06:46.624 +00:00] [WARN] [retry_interceptor.go:62] ["retrying of unary invoker failed"] [target=endpoint://client-bf54c660-20ab-44b5-ab7b-5bc8cbf44d8b/serverless-cluster-pd.tidb-serverless.svc:2379] [attempt=0] [error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"]
[2023/06/02 03:06:46.624 +00:00] [WARN] [etcdutil.go:136] ["kv gets too slow"] [request-key=/ms/7187976276065784319/tso/00000/dc-location] [cost=10.000772311s] [error="context deadline exceeded"]
[2023/06/02 03:06:46.624 +00:00] [ERROR] [etcdutil.go:141] ["load from etcd meet error"] [key=/ms/7187976276065784319/tso/00000/dc-location] [error="[PD:etcd:ErrEtcdKVGet]context deadline exceeded: context deadline exceeded"]
[2023/06/02 03:06:46.624 +00:00] [ERROR] [allocator_manager.go:848] ["get cluster dc-locations from etcd failed"] [] [error="[PD:etcd:ErrEtcdKVGet]context deadline exceeded: context deadline exceeded"]
...
[2023/06/02 03:20:46.623 +00:00] [WARN] [etcdutil.go:136] ["kv gets too slow"] [request-key=/ms/7187976276065784319/tso/00000/dc-location] [cost=10.000127145s] [error="context deadline exceeded"]
[2023/06/02 03:20:46.623 +00:00] [ERROR] [etcdutil.go:141] ["load from etcd meet error"] [key=/ms/7187976276065784319/tso/00000/dc-location] [error="[PD:etcd:ErrEtcdKVGet]context deadline exceeded: context deadline exceeded"]
[2023/06/02 03:20:46.623 +00:00] [ERROR] [allocator_manager.go:848] ["get cluster dc-locations from etcd failed"] [] [error="[PD:etcd:ErrEtcdKVGet]context deadline exceeded: context deadline exceeded"]
[2023/06/02 03:20:46.623 +00:00] [WARN] [retry_interceptor.go:62] ["retrying of unary invoker failed"] [target=endpoint://client-bf54c660-20ab-44b5-ab7b-5bc8cbf44d8b/serverless-cluster-pd.tidb-serverless.svc:2379] [attempt=0] [error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"]
[2023/06/02 03:21:46.624 +00:00] [WARN] [etcdutil.go:136] ["kv gets too slow"] [request-key=/ms/7187976276065784319/tso/00000/dc-location] [cost=10.000692425s] [error="context deadline exceeded"]
[2023/06/02 03:21:46.624 +00:00] [ERROR] [etcdutil.go:141] ["load from etcd meet error"] [key=/ms/7187976276065784319/tso/00000/dc-location] [error="[PD:etcd:ErrEtcdKVGet]context deadline exceeded: context deadline exceeded"]
[2023/06/02 03:21:46.624 +00:00] [ERROR] [allocator_manager.go:848] ["get cluster dc-locations from etcd failed"] [] [error="[PD:etcd:ErrEtcdKVGet]context deadline exceeded: context deadline exceeded"]
[2023/06/02 03:21:46.624 +00:00] [WARN] [retry_interceptor.go:62] ["retrying of unary invoker failed"] [target=endpoint://client-bf54c660-20ab-44b5-ab7b-5bc8cbf44d8b/serverless-cluster-pd.tidb-serverless.svc:2379] [attempt=0] [error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"]
**[2023/06/02 03:21:53.550 +00:00] [WARN] [retry_interceptor.go:62] ["retrying of unary invoker failed"] [target=endpoint://client-bf54c660-20ab-44b5-ab7b-5bc8cbf44d8b/serverless-cluster-pd.tidb-serverless.svc:2379] [attempt=0] [error="rpc error: code = Unavailable desc = transport is closing"]**
[2023/06/02 03:21:53.564 +00:00] [WARN] [leadership.go:194] ["required revision has been compacted, use the compact revision"] [required-revision=1800213] [compact-revision=2353567]
[2023/06/02 03:21:53.804 +00:00] [INFO] [leadership.go:211] ["current leadership is deleted"] [leader-key=/ms/7187976276065784319/tso/00000/primary] [purpose="keyspace group primary election"]
[2023/06/02 03:21:53.804 +00:00] [INFO] [global_allocator.go:553] ["the tso primary has changed, try to re-campaign a primary"] []
[2023/06/02 03:21:53.804 +00:00] [INFO] [global_allocator.go:562] ["start to campaign the primary"] [] [campaign-tso-primary-name=http://pd-tso-server-0.tso-service.tidb-serverless.svc:2379-00000]
[2023/06/02 03:21:53.806 +00:00] [INFO] [lease.go:66] ["lease granted"] [lease-id=7473592164536214571] [lease-timeout=3] [purpose="keyspace group primary election"]
[2023/06/02 03:21:53.809 +00:00] [INFO] [leadership.go:122] ["check campaign resp"] [resp="{\"header\":{\"cluster_id\":10920548605668718190,\"member_id\":13485048042260555703,\"revision\":2481213,\"raft_term\":23},\"succeeded\":true,\"responses\":[{\"Response\":{\"ResponsePut\":{\"header\":{\"revision\":2481213}}}}]}"]
[2023/06/02 03:21:53.809 +00:00] [INFO] [leadership.go:131] ["write leaderData to leaderPath ok"] [leaderPath=/ms/7187976276065784319/tso/00000/primary] [purpose="keyspace group primary election"]
[2023/06/02 03:21:53.809 +00:00] [INFO] [global_allocator.go:595] ["campaign tso primary ok"] [] [campaign-tso-primary-name=http://pd-tso-server-0.tso-service.tidb-serverless.svc:2379-00000]
[2023/06/02 03:21:53.809 +00:00] [INFO] [global_allocator.go:606] ["initializing the global tso allocator"]
[2023/06/02 03:21:53.809 +00:00] [INFO] [lease.go:143] ["start lease keep alive worker"] [interval=1s] [purpose="keyspace group primary election"]
[2023/06/02 03:21:53.931 +00:00] [INFO] [tso.go:185] ["sync and save timestamp"] [last=2023/06/02 03:06:29.715 +00:00] [save=2023/06/02 03:21:56.918 +00:00] [next=2023/06/02 03:21:53.918 +00:00]
[2023/06/02 03:21:53.931 +00:00] [INFO] [global_allocator.go:625] ["tso primary is ready to serve"] [] [tso-primary-name=http://pd-tso-server-0.tso-service.tidb-serverless.svc:2379-00000]
  1. tso_1 was the primary
[2023/06/02 03:06:29.675 +00:00] [INFO] [lease.go:126] ["lease timeout"] [expire=2023/06/02 03:06:31.673 +00:00] [purpose="keyspace group primary election"]
[2023/06/02 03:06:29.675 +00:00] [WARN] [lease.go:158] ["lease keep alive failed"] [purpose="keyspace group primary election"] [error="context canceled"]
[2023/06/02 03:06:29.675 +00:00] [INFO] [lease.go:172] ["stop lease keep alive worker"] [purpose="keyspace group primary election"]
[2023/06/02 03:06:29.675 +00:00] [WARN] [lease.go:158] ["lease keep alive failed"] [purpose="keyspace group primary election"] [error="context canceled"]
[2023/06/02 03:06:29.675 +00:00] [WARN] [lease.go:158] ["lease keep alive failed"] [purpose="keyspace group primary election"] [error="context canceled"]
[2023/06/02 03:06:30.999 +00:00] [ERROR] [register.go:82] ["keep alive failed"] [key=/ms/7187976276065784319/tso/registry/http://pd-tso-server-1.tso-service.tidb-serverless.svc:2379]
[2023/06/02 03:06:31.681 +00:00] [INFO] [global_allocator.go:636] ["no longer a primary because lease has expired, the tso primary will step down"] []
[2023/06/02 03:06:32.682 +00:00] [WARN] [retry_interceptor.go:62] ["retrying of unary invoker failed"] [target=endpoint://client-13874eab-e8c6-4950-b46f-f708fdd4ca9d/serverless-cluster-pd.tidb-serverless.svc:2379] [attempt=0] [error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"]
[2023/06/02 03:06:32.682 +00:00] [INFO] [tso.go:384] ["reset the timestamp in memory"]
[2023/06/02 03:06:39.716 +00:00] [WARN] [retry_interceptor.go:62] ["retrying of unary invoker failed"] [target=endpoint://client-13874eab-e8c6-4950-b46f-f708fdd4ca9d/serverless-cluster-pd.tidb-serverless.svc:2379] [attempt=0] [error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"]
[2023/06/02 03:06:39.716 +00:00] [WARN] [etcdutil.go:136] ["kv gets too slow"] [request-key=/pd/7187976276065784319/timestamp] [cost=10.00049442s] [error="context deadline exceeded"]
[2023/06/02 03:06:39.716 +00:00] [ERROR] [etcdutil.go:141] ["load from etcd meet error"] [key=/pd/7187976276065784319/timestamp] [error="[PD:etcd:ErrEtcdKVGet]context deadline exceeded: context deadline exceeded"]
[2023/06/02 03:06:39.716 +00:00] [WARN] [tso.go:322] ["save timestamp failed"] [dc-location=global] [timestamp-path=timestamp] [error="[PD:etcd:ErrEtcdKVGet]context deadline exceeded: context deadline exceeded"]
[2023/06/02 03:06:39.716 +00:00] [WARN] [allocator_manager.go:801] ["failed to update allocator's timestamp"] [] [dc-location=global] [name=http://pd-tso-server-1.tso-service.tidb-serverless.svc:2379-00000] [error="[PD:etcd:ErrEtcdKVGet]context deadline exceeded: context deadline exceeded"]
[2023/06/02 03:06:39.716 +00:00] [INFO] [tso.go:384] ["reset the timestamp in memory"]
[2023/06/02 03:06:42.683 +00:00] [WARN] [retry_interceptor.go:62] ["retrying of unary invoker failed"] [target=endpoint://client-13874eab-e8c6-4950-b46f-f708fdd4ca9d/serverless-cluster-pd.tidb-serverless.svc:2379] [attempt=0] [error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"]
[2023/06/02 03:06:42.683 +00:00] [WARN] [etcdutil.go:136] ["kv gets too slow"] [request-key=/ms/7187976276065784319/tso/00000/primary] [cost=10.001048119s] [error="context deadline exceeded"]
[2023/06/02 03:06:42.683 +00:00] [ERROR] [etcdutil.go:141] ["load from etcd meet error"] [key=/ms/7187976276065784319/tso/00000/primary] [error="[PD:etcd:ErrEtcdKVGet]context deadline exceeded: context deadline exceeded"]
[2023/06/02 03:06:42.683 +00:00] [ERROR] [participant.go:213] ["getting the leader meets error"] [error="[PD:etcd:ErrEtcdKVGet]context deadline exceeded: context deadline exceeded"]
...
repeated the 4 log lines every 10 or 11 seconds.
[2023/06/02 03:06:52.885 +00:00] [WARN] [etcdutil.go:136] ["kv gets too slow"] [request-key=/ms/7187976276065784319/tso/00000/primary] [cost=10.001082809s] [error="context deadline exceeded"]
[2023/06/02 03:06:52.885 +00:00] [ERROR] [etcdutil.go:141] ["load from etcd meet error"] [key=/ms/7187976276065784319/tso/00000/primary] [error="[PD:etcd:ErrEtcdKVGet]context deadline exceeded: context deadline exceeded"]
[2023/06/02 03:06:52.885 +00:00] [ERROR] [participant.go:213] ["getting the leader meets error"] [error="[PD:etcd:ErrEtcdKVGet]context deadline exceeded: context deadline exceeded"]
[2023/06/02 03:06:52.885 +00:00] [WARN] [retry_interceptor.go:62] ["retrying of unary invoker failed"] [target=endpoint://client-13874eab-e8c6-4950-b46f-f708fdd4ca9d/serverless-cluster-pd.tidb-serverless.svc:2379] [attempt=0] [error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"]
...
[2023/06/02 03:22:00.220 +00:00] [WARN] [retry_interceptor.go:62] ["retrying of unary invoker failed"] [target=endpoint://client-13874eab-e8c6-4950-b46f-f708fdd4ca9d/serverless-cluster-pd.tidb-serverless.svc:2379] [attempt=0] [error="rpc error: code = Unavailable desc = transport is closing"]
[2023/06/02 03:22:00.220 +00:00] [WARN] [retry_interceptor.go:62] ["retrying of unary invoker failed"] [target=endpoint://client-13874eab-e8c6-4950-b46f-f708fdd4ca9d/serverless-cluster-pd.tidb-serverless.svc:2379] [attempt=0] [error="rpc error: code = Unavailable desc = transport is closing"]
[2023/06/02 03:22:00.248 +00:00] [WARN] [etcdutil.go:136] ["kv gets too slow"] [request-key=/ms/7187976276065784319/tso/00000/primary] [cost=9.450242803s] []
[2023/06/02 03:22:00.249 +00:00] [INFO] [global_allocator.go:547] ["start to watch the primary"] [] [campaign-tso-primary-name=http://pd-tso-server-1.tso-service.tidb-serverless.svc:2379-00000] [tso-primary="name:\"http://pd-tso-server-0.tso-service.tidb-serverless.svc:2379-00000\" id:10272136419596665458 listen_urls:\"http://pd-tso-server-0.tso-service.tidb-serverless.svc:2379\" "]

What did you expect to see?
TSO primary election within 15 seconds

What did you see instead?
TSO primary election took 14 minutes

What version of PD are you using (pd-server -V)?
tidbcloud/pd-cse release-6.6-keyspace 9e1e2de
tikv/pd master

@binshi-bing binshi-bing added the type/enhancement The issue or PR belongs to an enhancement. label Jun 5, 2023
@binshi-bing binshi-bing changed the title TSO primary election took 14 minutes after PD(API) Pods being deleted at the same time and PD(API) being re-elected TSO primary election took 14 minutes after PD(API) Pods being deleted at the same time and PD(API) leader being re-elected Jun 5, 2023
@lhy1024
Copy link
Contributor

lhy1024 commented Jun 5, 2023

[etcdutil.go:136] ["kv gets too slow"] [request-key=/ms/7187976276065784319/tso/00000/dc-location] [cost=10.000692425s]

Maybe etcd client still uses the old connection.

@rleungx
Copy link
Member

rleungx commented Jun 8, 2023

@lhy1024 Any progress on this issue?

@lhy1024
Copy link
Contributor

lhy1024 commented Jun 8, 2023

@lhy1024 Any progress on this issue?

I am trying to reproduce it.

@lhy1024
Copy link
Contributor

lhy1024 commented Jun 8, 2023

This issue includes at least two parts, one of which is a watch issue for the secondary.

Regarding whether the delete event is missing, "current leadership is deleted" appears after "required revision has been compacted", indicating that the delete event is later than the previous watch 1800213. So the previous delete event did not miss.

However, we cannot exclude the watcher's handling of the compact revision. The pd log shows that the compact of 2353567 was executed at 2:26, the pd was restarted at 3:07, and the tso secondary received the compact from 1800213 to 2353567 only at 3:21. The compact case should have been received half an hour ago.

@lhy1024
Copy link
Contributor

lhy1024 commented Jun 8, 2023

The other part is about connections.

I found a similar issue where the request failed for 15 minutes after the pod update and resumed thereafter, the comments show it is related to TCP_USER_TIMEOUT.

Checking tcp_retries2 may not disconnect effectively when the keepalive mechanism is not triggered, the documentation says that the timeout may be between 13 and 30 minutes, and the documentation also says that TCP_USER_TIMEOUT can be be used to configure.

According to https://github.com/grpc/proposal/blob/master/A18-tcp-user-timeout.md, after grpc/grpc-go#2307, TCP_USER_TIMEOUT can be configured via KeepaliveParams to configure TCP_USER_TIMEOUT.

But unfortunately, my local attempts with iptable and tcpkill did not reproduce this timeout.

@lhy1024
Copy link
Contributor

lhy1024 commented Jun 8, 2023

We can consider temporarily using multiendpoint + keepalive again to avoid this problem, but note that this will cause pd-leader-io-hang to always fail

In the meantime, we can also introduce withRequireLeader and the handling of closeErr, and revision = wresp.Header.Revision + 1, before investigating the watch problem in the secondary.

@lhy1024
Copy link
Contributor

lhy1024 commented Jun 13, 2023

This issue includes at least two parts, one of which is a watch issue for the secondary.

Regarding whether the delete event is missing, "current leadership is deleted" appears after "required revision has been compacted", indicating that the delete event is later than the previous watch 1800213. So the previous delete event did not miss.

However, we cannot exclude the watcher's handling of the compact revision. The pd log shows that the compact of 2353567 was executed at 2:26, the pd was restarted at 3:07, and the tso secondary received the compact from 1800213 to 2353567 only at 3:21. The compact case should have been received half an hour ago.

Maybe there is no any problem with handling of the compact revision, I check the log again.

[2023/06/01 13:53:24.673 +00:00] [INFO] [server.go:1786] ["update tso primary"] [primary=http://pd-tso-server-1.tso-service.tidb-serverless.svc:2379]
[2023/06/01 13:53:24.673 +00:00] [INFO] [leadership.go:122] ["check campaign resp"] [resp="{\"header\":{\"cluster_id\":10920548605668718190,\"member_id\":13485048042260555703,\"revision\":1800213,\"raft_term\":20},\"succeeded\":true,\"responses\":[{\"Response\":{\"ResponsePut\":{\"header\":{\"revision\":1800213}}}}]}"]

**{....none "update tso primary" in log}**

[2023/06/01 15:26:50.423 +00:00] [INFO] [kvstore_compaction.go:56] ["finished scheduled compaction"] [compact-revision=1825171] [took=1.166314444s]

**{...some log about compact}**

[2023/06/02 02:26:50.588 +00:00] [INFO] [kvstore_compaction.go:56] ["finished scheduled compaction"] [compact-revision=2353567] [took=1.179783116s]

[2023/06/02 03:06:35.932 +00:00] [INFO] [versioninfo.go:89] ["Welcome to Placement Driver (API SERVICE)"](pd-0)
[2023/06/02 03:06:36.507 +00:00] [INFO] [versioninfo.go:89] ["Welcome to Placement Driver (API SERVICE)"](pd-2)
[2023/06/02 03:06:42.576 +00:00] [INFO] [server.go:1786] ["update tso primary"] [primary=http://pd-tso-server-1.tso-service.tidb-serverless.svc:2379](pd-0)
[2023/06/02 03:06:49.009 +00:00] [INFO] [server.go:1786] ["update tso primary"] [primary=http://pd-tso-server-1.tso-service.tidb-serverless.svc:2379](pd-2)


[2023/06/02 03:21:53.564 +00:00] [WARN] [leadership.go:194] ["required revision has been compacted, use the compact revision"] [required-revision=1800213] [compact-revision=2353567]

[2023/06/02 03:21:53.808 +00:00] [INFO] [server.go:1786] ["update tso primary"] [primary=http://pd-tso-server-0.tso-service.tidb-serverless.svc:2379]

These logs show that at 13:53 tso last updated the leader to tso-1, some other key updates triggered multiple compacts, and at 03:06 pd restarted, which was still using tso-1 until the connection was re-elected later.

To prove this guess, I implemented a simple unit test that updates the other keys and then compacts them to see if the watcher receives the message.


// watcher watch "TestWatcherBreak"
// some other operator
suite.put("TestWatcherBreak", "3")

// put other key
for i := 0; i < 1000; i++ {
	suite.put(fmt.Sprintf("TestWatcherBreak/%d", i), fmt.Sprintf("4_%d", i))
}

// the rivison is greater than 1000
resp, err := EtcdKVGet(suite.client, "TestWatcherBreak")
suite.NoError(err)
suite.Greater(int(resp.Header.Revision), 1000)

// compact and watcher doesn't any message
revision := resp.Header.Revision
resp2, err := suite.etcd.Server.Compact(suite.ctx, &etcdserverpb.CompactionRequest{Revision: revision})
suite.NoError(err)
suite.Equal(revision, resp2.Header.Revision)

// put key and watcher update its rivision
time.Sleep(time.Second * 10)
suite.put("TestWatcherBreak", "5")

This test proves that if it is just compact, the watcher does not receive information until it updates the key it is watching.

@lhy1024
Copy link
Contributor

lhy1024 commented Jun 25, 2023

I also try to reproduct it in k8s

  1. not graceful shutdown and restart it immediately, it can recover qps in 2 minutes
    image
  2. wait 10 minutes and restart and check qps after 2 minutes
    only one times qps drop 27% and meet two times tidb 8027 error
    image

ti-chi-bot bot pushed a commit that referenced this issue Jul 20, 2023
close #6554

Signed-off-by: lhy1024 <admin@liudos.us>
rleungx pushed a commit to rleungx/pd that referenced this issue Aug 2, 2023
close tikv#6554

Signed-off-by: lhy1024 <admin@liudos.us>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants