TSO primary election took 14 minutes after PD(API) Pods being deleted at the same time and PD(API) leader being re-elected #6554

binshi-bing · 2023-06-05T03:32:51Z

Enhancement Task

What did you do?
In dev env,

kubectl delete pod serverless-cluster-pd-0 serverless-cluster-pd-1 serverless-cluster-pd-2 -n tidb-serverless --force --grace-period=0
tso_0 secondary

[2023/06/02 03:06:31.612 +00:00] [ERROR] [register.go:82] ["keep alive failed"] [key=/ms/7187976276065784319/tso/registry/http://pd-tso-server-0.tso-service.tidb-serverless.svc:2379]
**repeat the following 4 lines every 1s**
[2023/06/02 03:06:46.624 +00:00] [WARN] [retry_interceptor.go:62] ["retrying of unary invoker failed"] [target=endpoint://client-bf54c660-20ab-44b5-ab7b-5bc8cbf44d8b/serverless-cluster-pd.tidb-serverless.svc:2379] [attempt=0] [error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"]
[2023/06/02 03:06:46.624 +00:00] [WARN] [etcdutil.go:136] ["kv gets too slow"] [request-key=/ms/7187976276065784319/tso/00000/dc-location] [cost=10.000772311s] [error="context deadline exceeded"]
[2023/06/02 03:06:46.624 +00:00] [ERROR] [etcdutil.go:141] ["load from etcd meet error"] [key=/ms/7187976276065784319/tso/00000/dc-location] [error="[PD:etcd:ErrEtcdKVGet]context deadline exceeded: context deadline exceeded"]
[2023/06/02 03:06:46.624 +00:00] [ERROR] [allocator_manager.go:848] ["get cluster dc-locations from etcd failed"] [] [error="[PD:etcd:ErrEtcdKVGet]context deadline exceeded: context deadline exceeded"]
...
[2023/06/02 03:20:46.623 +00:00] [WARN] [etcdutil.go:136] ["kv gets too slow"] [request-key=/ms/7187976276065784319/tso/00000/dc-location] [cost=10.000127145s] [error="context deadline exceeded"]
[2023/06/02 03:20:46.623 +00:00] [ERROR] [etcdutil.go:141] ["load from etcd meet error"] [key=/ms/7187976276065784319/tso/00000/dc-location] [error="[PD:etcd:ErrEtcdKVGet]context deadline exceeded: context deadline exceeded"]
[2023/06/02 03:20:46.623 +00:00] [ERROR] [allocator_manager.go:848] ["get cluster dc-locations from etcd failed"] [] [error="[PD:etcd:ErrEtcdKVGet]context deadline exceeded: context deadline exceeded"]
[2023/06/02 03:20:46.623 +00:00] [WARN] [retry_interceptor.go:62] ["retrying of unary invoker failed"] [target=endpoint://client-bf54c660-20ab-44b5-ab7b-5bc8cbf44d8b/serverless-cluster-pd.tidb-serverless.svc:2379] [attempt=0] [error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"]
[2023/06/02 03:21:46.624 +00:00] [WARN] [etcdutil.go:136] ["kv gets too slow"] [request-key=/ms/7187976276065784319/tso/00000/dc-location] [cost=10.000692425s] [error="context deadline exceeded"]
[2023/06/02 03:21:46.624 +00:00] [ERROR] [etcdutil.go:141] ["load from etcd meet error"] [key=/ms/7187976276065784319/tso/00000/dc-location] [error="[PD:etcd:ErrEtcdKVGet]context deadline exceeded: context deadline exceeded"]
[2023/06/02 03:21:46.624 +00:00] [ERROR] [allocator_manager.go:848] ["get cluster dc-locations from etcd failed"] [] [error="[PD:etcd:ErrEtcdKVGet]context deadline exceeded: context deadline exceeded"]
[2023/06/02 03:21:46.624 +00:00] [WARN] [retry_interceptor.go:62] ["retrying of unary invoker failed"] [target=endpoint://client-bf54c660-20ab-44b5-ab7b-5bc8cbf44d8b/serverless-cluster-pd.tidb-serverless.svc:2379] [attempt=0] [error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"]
**[2023/06/02 03:21:53.550 +00:00] [WARN] [retry_interceptor.go:62] ["retrying of unary invoker failed"] [target=endpoint://client-bf54c660-20ab-44b5-ab7b-5bc8cbf44d8b/serverless-cluster-pd.tidb-serverless.svc:2379] [attempt=0] [error="rpc error: code = Unavailable desc = transport is closing"]**
[2023/06/02 03:21:53.564 +00:00] [WARN] [leadership.go:194] ["required revision has been compacted, use the compact revision"] [required-revision=1800213] [compact-revision=2353567]
[2023/06/02 03:21:53.804 +00:00] [INFO] [leadership.go:211] ["current leadership is deleted"] [leader-key=/ms/7187976276065784319/tso/00000/primary] [purpose="keyspace group primary election"]
[2023/06/02 03:21:53.804 +00:00] [INFO] [global_allocator.go:553] ["the tso primary has changed, try to re-campaign a primary"] []
[2023/06/02 03:21:53.804 +00:00] [INFO] [global_allocator.go:562] ["start to campaign the primary"] [] [campaign-tso-primary-name=http://pd-tso-server-0.tso-service.tidb-serverless.svc:2379-00000]
[2023/06/02 03:21:53.806 +00:00] [INFO] [lease.go:66] ["lease granted"] [lease-id=7473592164536214571] [lease-timeout=3] [purpose="keyspace group primary election"]
[2023/06/02 03:21:53.809 +00:00] [INFO] [leadership.go:122] ["check campaign resp"] [resp="{\"header\":{\"cluster_id\":10920548605668718190,\"member_id\":13485048042260555703,\"revision\":2481213,\"raft_term\":23},\"succeeded\":true,\"responses\":[{\"Response\":{\"ResponsePut\":{\"header\":{\"revision\":2481213}}}}]}"]
[2023/06/02 03:21:53.809 +00:00] [INFO] [leadership.go:131] ["write leaderData to leaderPath ok"] [leaderPath=/ms/7187976276065784319/tso/00000/primary] [purpose="keyspace group primary election"]
[2023/06/02 03:21:53.809 +00:00] [INFO] [global_allocator.go:595] ["campaign tso primary ok"] [] [campaign-tso-primary-name=http://pd-tso-server-0.tso-service.tidb-serverless.svc:2379-00000]
[2023/06/02 03:21:53.809 +00:00] [INFO] [global_allocator.go:606] ["initializing the global tso allocator"]
[2023/06/02 03:21:53.809 +00:00] [INFO] [lease.go:143] ["start lease keep alive worker"] [interval=1s] [purpose="keyspace group primary election"]
[2023/06/02 03:21:53.931 +00:00] [INFO] [tso.go:185] ["sync and save timestamp"] [last=2023/06/02 03:06:29.715 +00:00] [save=2023/06/02 03:21:56.918 +00:00] [next=2023/06/02 03:21:53.918 +00:00]
[2023/06/02 03:21:53.931 +00:00] [INFO] [global_allocator.go:625] ["tso primary is ready to serve"] [] [tso-primary-name=http://pd-tso-server-0.tso-service.tidb-serverless.svc:2379-00000]

tso_1 was the primary

[2023/06/02 03:06:29.675 +00:00] [INFO] [lease.go:126] ["lease timeout"] [expire=2023/06/02 03:06:31.673 +00:00] [purpose="keyspace group primary election"]
[2023/06/02 03:06:29.675 +00:00] [WARN] [lease.go:158] ["lease keep alive failed"] [purpose="keyspace group primary election"] [error="context canceled"]
[2023/06/02 03:06:29.675 +00:00] [INFO] [lease.go:172] ["stop lease keep alive worker"] [purpose="keyspace group primary election"]
[2023/06/02 03:06:29.675 +00:00] [WARN] [lease.go:158] ["lease keep alive failed"] [purpose="keyspace group primary election"] [error="context canceled"]
[2023/06/02 03:06:29.675 +00:00] [WARN] [lease.go:158] ["lease keep alive failed"] [purpose="keyspace group primary election"] [error="context canceled"]
[2023/06/02 03:06:30.999 +00:00] [ERROR] [register.go:82] ["keep alive failed"] [key=/ms/7187976276065784319/tso/registry/http://pd-tso-server-1.tso-service.tidb-serverless.svc:2379]
[2023/06/02 03:06:31.681 +00:00] [INFO] [global_allocator.go:636] ["no longer a primary because lease has expired, the tso primary will step down"] []
[2023/06/02 03:06:32.682 +00:00] [WARN] [retry_interceptor.go:62] ["retrying of unary invoker failed"] [target=endpoint://client-13874eab-e8c6-4950-b46f-f708fdd4ca9d/serverless-cluster-pd.tidb-serverless.svc:2379] [attempt=0] [error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"]
[2023/06/02 03:06:32.682 +00:00] [INFO] [tso.go:384] ["reset the timestamp in memory"]
[2023/06/02 03:06:39.716 +00:00] [WARN] [retry_interceptor.go:62] ["retrying of unary invoker failed"] [target=endpoint://client-13874eab-e8c6-4950-b46f-f708fdd4ca9d/serverless-cluster-pd.tidb-serverless.svc:2379] [attempt=0] [error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"]
[2023/06/02 03:06:39.716 +00:00] [WARN] [etcdutil.go:136] ["kv gets too slow"] [request-key=/pd/7187976276065784319/timestamp] [cost=10.00049442s] [error="context deadline exceeded"]
[2023/06/02 03:06:39.716 +00:00] [ERROR] [etcdutil.go:141] ["load from etcd meet error"] [key=/pd/7187976276065784319/timestamp] [error="[PD:etcd:ErrEtcdKVGet]context deadline exceeded: context deadline exceeded"]
[2023/06/02 03:06:39.716 +00:00] [WARN] [tso.go:322] ["save timestamp failed"] [dc-location=global] [timestamp-path=timestamp] [error="[PD:etcd:ErrEtcdKVGet]context deadline exceeded: context deadline exceeded"]
[2023/06/02 03:06:39.716 +00:00] [WARN] [allocator_manager.go:801] ["failed to update allocator's timestamp"] [] [dc-location=global] [name=http://pd-tso-server-1.tso-service.tidb-serverless.svc:2379-00000] [error="[PD:etcd:ErrEtcdKVGet]context deadline exceeded: context deadline exceeded"]
[2023/06/02 03:06:39.716 +00:00] [INFO] [tso.go:384] ["reset the timestamp in memory"]
[2023/06/02 03:06:42.683 +00:00] [WARN] [retry_interceptor.go:62] ["retrying of unary invoker failed"] [target=endpoint://client-13874eab-e8c6-4950-b46f-f708fdd4ca9d/serverless-cluster-pd.tidb-serverless.svc:2379] [attempt=0] [error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"]
[2023/06/02 03:06:42.683 +00:00] [WARN] [etcdutil.go:136] ["kv gets too slow"] [request-key=/ms/7187976276065784319/tso/00000/primary] [cost=10.001048119s] [error="context deadline exceeded"]
[2023/06/02 03:06:42.683 +00:00] [ERROR] [etcdutil.go:141] ["load from etcd meet error"] [key=/ms/7187976276065784319/tso/00000/primary] [error="[PD:etcd:ErrEtcdKVGet]context deadline exceeded: context deadline exceeded"]
[2023/06/02 03:06:42.683 +00:00] [ERROR] [participant.go:213] ["getting the leader meets error"] [error="[PD:etcd:ErrEtcdKVGet]context deadline exceeded: context deadline exceeded"]
...
repeated the 4 log lines every 10 or 11 seconds.
[2023/06/02 03:06:52.885 +00:00] [WARN] [etcdutil.go:136] ["kv gets too slow"] [request-key=/ms/7187976276065784319/tso/00000/primary] [cost=10.001082809s] [error="context deadline exceeded"]
[2023/06/02 03:06:52.885 +00:00] [ERROR] [etcdutil.go:141] ["load from etcd meet error"] [key=/ms/7187976276065784319/tso/00000/primary] [error="[PD:etcd:ErrEtcdKVGet]context deadline exceeded: context deadline exceeded"]
[2023/06/02 03:06:52.885 +00:00] [ERROR] [participant.go:213] ["getting the leader meets error"] [error="[PD:etcd:ErrEtcdKVGet]context deadline exceeded: context deadline exceeded"]
[2023/06/02 03:06:52.885 +00:00] [WARN] [retry_interceptor.go:62] ["retrying of unary invoker failed"] [target=endpoint://client-13874eab-e8c6-4950-b46f-f708fdd4ca9d/serverless-cluster-pd.tidb-serverless.svc:2379] [attempt=0] [error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"]
...
[2023/06/02 03:22:00.220 +00:00] [WARN] [retry_interceptor.go:62] ["retrying of unary invoker failed"] [target=endpoint://client-13874eab-e8c6-4950-b46f-f708fdd4ca9d/serverless-cluster-pd.tidb-serverless.svc:2379] [attempt=0] [error="rpc error: code = Unavailable desc = transport is closing"]
[2023/06/02 03:22:00.220 +00:00] [WARN] [retry_interceptor.go:62] ["retrying of unary invoker failed"] [target=endpoint://client-13874eab-e8c6-4950-b46f-f708fdd4ca9d/serverless-cluster-pd.tidb-serverless.svc:2379] [attempt=0] [error="rpc error: code = Unavailable desc = transport is closing"]
[2023/06/02 03:22:00.248 +00:00] [WARN] [etcdutil.go:136] ["kv gets too slow"] [request-key=/ms/7187976276065784319/tso/00000/primary] [cost=9.450242803s] []
[2023/06/02 03:22:00.249 +00:00] [INFO] [global_allocator.go:547] ["start to watch the primary"] [] [campaign-tso-primary-name=http://pd-tso-server-1.tso-service.tidb-serverless.svc:2379-00000] [tso-primary="name:\"http://pd-tso-server-0.tso-service.tidb-serverless.svc:2379-00000\" id:10272136419596665458 listen_urls:\"http://pd-tso-server-0.tso-service.tidb-serverless.svc:2379\" "]

What did you expect to see?
TSO primary election within 15 seconds

What did you see instead?
TSO primary election took 14 minutes

What version of PD are you using (pd-server -V)?
tidbcloud/pd-cse release-6.6-keyspace 9e1e2de
tikv/pd master

The text was updated successfully, but these errors were encountered:

lhy1024 · 2023-06-05T04:38:49Z

[etcdutil.go:136] ["kv gets too slow"] [request-key=/ms/7187976276065784319/tso/00000/dc-location] [cost=10.000692425s]

Maybe etcd client still uses the old connection.

rleungx · 2023-06-08T02:46:08Z

@lhy1024 Any progress on this issue?

lhy1024 · 2023-06-08T03:27:44Z

@lhy1024 Any progress on this issue?

I am trying to reproduce it.

lhy1024 · 2023-06-08T15:12:52Z

This issue includes at least two parts, one of which is a watch issue for the secondary.

Regarding whether the delete event is missing, "current leadership is deleted" appears after "required revision has been compacted", indicating that the delete event is later than the previous watch 1800213. So the previous delete event did not miss.

However, we cannot exclude the watcher's handling of the compact revision. The pd log shows that the compact of 2353567 was executed at 2:26, the pd was restarted at 3:07, and the tso secondary received the compact from 1800213 to 2353567 only at 3:21. The compact case should have been received half an hour ago.

lhy1024 · 2023-06-08T15:15:48Z

The other part is about connections.

I found a similar issue where the request failed for 15 minutes after the pod update and resumed thereafter, the comments show it is related to TCP_USER_TIMEOUT.

Checking tcp_retries2 may not disconnect effectively when the keepalive mechanism is not triggered, the documentation says that the timeout may be between 13 and 30 minutes, and the documentation also says that TCP_USER_TIMEOUT can be be used to configure.

According to https://github.com/grpc/proposal/blob/master/A18-tcp-user-timeout.md, after grpc/grpc-go#2307, TCP_USER_TIMEOUT can be configured via KeepaliveParams to configure TCP_USER_TIMEOUT.

But unfortunately, my local attempts with iptable and tcpkill did not reproduce this timeout.

lhy1024 · 2023-06-08T15:21:34Z

We can consider temporarily using multiendpoint + keepalive again to avoid this problem, but note that this will cause pd-leader-io-hang to always fail

In the meantime, we can also introduce withRequireLeader and the handling of closeErr, and revision = wresp.Header.Revision + 1, before investigating the watch problem in the secondary.

lhy1024 · 2023-06-13T17:11:50Z

This issue includes at least two parts, one of which is a watch issue for the secondary.

Regarding whether the delete event is missing, "current leadership is deleted" appears after "required revision has been compacted", indicating that the delete event is later than the previous watch 1800213. So the previous delete event did not miss.

However, we cannot exclude the watcher's handling of the compact revision. The pd log shows that the compact of 2353567 was executed at 2:26, the pd was restarted at 3:07, and the tso secondary received the compact from 1800213 to 2353567 only at 3:21. The compact case should have been received half an hour ago.

Maybe there is no any problem with handling of the compact revision, I check the log again.

[2023/06/01 13:53:24.673 +00:00] [INFO] [server.go:1786] ["update tso primary"] [primary=http://pd-tso-server-1.tso-service.tidb-serverless.svc:2379]
[2023/06/01 13:53:24.673 +00:00] [INFO] [leadership.go:122] ["check campaign resp"] [resp="{\"header\":{\"cluster_id\":10920548605668718190,\"member_id\":13485048042260555703,\"revision\":1800213,\"raft_term\":20},\"succeeded\":true,\"responses\":[{\"Response\":{\"ResponsePut\":{\"header\":{\"revision\":1800213}}}}]}"]

**{....none "update tso primary" in log}**

[2023/06/01 15:26:50.423 +00:00] [INFO] [kvstore_compaction.go:56] ["finished scheduled compaction"] [compact-revision=1825171] [took=1.166314444s]

**{...some log about compact}**

[2023/06/02 02:26:50.588 +00:00] [INFO] [kvstore_compaction.go:56] ["finished scheduled compaction"] [compact-revision=2353567] [took=1.179783116s]

[2023/06/02 03:06:35.932 +00:00] [INFO] [versioninfo.go:89] ["Welcome to Placement Driver (API SERVICE)"](pd-0)
[2023/06/02 03:06:36.507 +00:00] [INFO] [versioninfo.go:89] ["Welcome to Placement Driver (API SERVICE)"](pd-2)
[2023/06/02 03:06:42.576 +00:00] [INFO] [server.go:1786] ["update tso primary"] [primary=http://pd-tso-server-1.tso-service.tidb-serverless.svc:2379](pd-0)
[2023/06/02 03:06:49.009 +00:00] [INFO] [server.go:1786] ["update tso primary"] [primary=http://pd-tso-server-1.tso-service.tidb-serverless.svc:2379](pd-2)


[2023/06/02 03:21:53.564 +00:00] [WARN] [leadership.go:194] ["required revision has been compacted, use the compact revision"] [required-revision=1800213] [compact-revision=2353567]

[2023/06/02 03:21:53.808 +00:00] [INFO] [server.go:1786] ["update tso primary"] [primary=http://pd-tso-server-0.tso-service.tidb-serverless.svc:2379]

These logs show that at 13:53 tso last updated the leader to tso-1, some other key updates triggered multiple compacts, and at 03:06 pd restarted, which was still using tso-1 until the connection was re-elected later.

To prove this guess, I implemented a simple unit test that updates the other keys and then compacts them to see if the watcher receives the message.


// watcher watch "TestWatcherBreak"
// some other operator
suite.put("TestWatcherBreak", "3")

// put other key
for i := 0; i < 1000; i++ {
	suite.put(fmt.Sprintf("TestWatcherBreak/%d", i), fmt.Sprintf("4_%d", i))
}

// the rivison is greater than 1000
resp, err := EtcdKVGet(suite.client, "TestWatcherBreak")
suite.NoError(err)
suite.Greater(int(resp.Header.Revision), 1000)

// compact and watcher doesn't any message
revision := resp.Header.Revision
resp2, err := suite.etcd.Server.Compact(suite.ctx, &etcdserverpb.CompactionRequest{Revision: revision})
suite.NoError(err)
suite.Equal(revision, resp2.Header.Revision)

// put key and watcher update its rivision
time.Sleep(time.Second * 10)
suite.put("TestWatcherBreak", "5")

This test proves that if it is just compact, the watcher does not receive information until it updates the key it is watching.

lhy1024 · 2023-06-25T15:10:03Z

I also try to reproduct it in k8s

not graceful shutdown and restart it immediately, it can recover qps in 2 minutes
wait 10 minutes and restart and check qps after 2 minutes
only one times qps drop 27% and meet two times tidb 8027 error

close #6554 Signed-off-by: lhy1024 <admin@liudos.us>

close tikv#6554 Signed-off-by: lhy1024 <admin@liudos.us>

binshi-bing added the type/enhancement The issue or PR belongs to an enhancement. label Jun 5, 2023

binshi-bing changed the title ~~TSO primary election took 14 minutes after PD(API) Pods being deleted at the same time and PD(API) being re-elected~~ TSO primary election took 14 minutes after PD(API) Pods being deleted at the same time and PD(API) leader being re-elected Jun 5, 2023

rleungx assigned lhy1024 Jun 7, 2023

lhy1024 mentioned this issue Jun 8, 2023

etcdutil, leadership: make more high availability #6577

Merged

lhy1024 mentioned this issue Jun 27, 2023

tso: primary election took > 10 minutes after "delete pod serverless-cluster-pd-* w/o grace-period #6684

Closed

ti-chi-bot bot closed this as completed in #6577 Jul 20, 2023

ti-chi-bot bot pushed a commit that referenced this issue Jul 20, 2023

etcdutil, leadership: make more high availability (#6577)

c9939b6

close #6554 Signed-off-by: lhy1024 <admin@liudos.us>

rleungx pushed a commit to rleungx/pd that referenced this issue Aug 2, 2023

etcdutil, leadership: make more high availability (tikv#6577)

a90e02a

close tikv#6554 Signed-off-by: lhy1024 <admin@liudos.us>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TSO primary election took 14 minutes after PD(API) Pods being deleted at the same time and PD(API) leader being re-elected #6554

TSO primary election took 14 minutes after PD(API) Pods being deleted at the same time and PD(API) leader being re-elected #6554

binshi-bing commented Jun 5, 2023 •

edited by rleungx

Loading

lhy1024 commented Jun 5, 2023

rleungx commented Jun 8, 2023

lhy1024 commented Jun 8, 2023

lhy1024 commented Jun 8, 2023 •

edited

Loading

lhy1024 commented Jun 8, 2023

lhy1024 commented Jun 8, 2023

lhy1024 commented Jun 13, 2023

lhy1024 commented Jun 25, 2023

TSO primary election took 14 minutes after PD(API) Pods being deleted at the same time and PD(API) leader being re-elected #6554

TSO primary election took 14 minutes after PD(API) Pods being deleted at the same time and PD(API) leader being re-elected #6554

Comments

binshi-bing commented Jun 5, 2023 • edited by rleungx Loading

Enhancement Task

lhy1024 commented Jun 5, 2023

rleungx commented Jun 8, 2023

lhy1024 commented Jun 8, 2023

lhy1024 commented Jun 8, 2023 • edited Loading

lhy1024 commented Jun 8, 2023

lhy1024 commented Jun 8, 2023

lhy1024 commented Jun 13, 2023

lhy1024 commented Jun 25, 2023

binshi-bing commented Jun 5, 2023 •

edited by rleungx

Loading

lhy1024 commented Jun 8, 2023 •

edited

Loading