Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

changefeed restarted and lag >25min when upgrade pd #8868

Closed
fubinzh opened this issue Apr 27, 2023 · 3 comments
Closed

changefeed restarted and lag >25min when upgrade pd #8868

fubinzh opened this issue Apr 27, 2023 · 3 comments
Assignees
Labels
area/ticdc Issues or PRs related to TiCDC. may-affects-6.1 may-affects-6.5 may-affects-7.1 severity/major type/bug The issue is confirmed as a bug.

Comments

@fubinzh
Copy link

fubinzh commented Apr 27, 2023

What did you do?

  1. ti-operator cluster with 36 tikv, 3 cdc, 3 pd, create changefeed to sync to kafka
  2. running workload, throughtput ~50MB/s
  3. Upgrade cluester

What did you expect to see?

lag should be less then 10s

What did you see instead?

lag > 25min when uprade pd upgrade, cdc capture restart seen due to etcd ontext cancel

[2023/04/25 13:31:22.129 +00:00] [INFO] [client.go:235] ["WatchWithChan exited"] [role=processor]
[2023/04/25 13:31:22.129 +00:00] [INFO] [capture.go:353] ["processor routine exited"] [captureID=f799bd5d-5cae-448e-b621-7b0ee517ef03] [error="[CDC:ErrPDEtcdAPIError]etcd api call error: context canceled"] [errorVerbose="[CDC:ErrPDEtcdAPIError]etcd api call error: context canceled\ngithub.com/pingcap/errors.AddStack\n\tgithub.com/pingcap/errors@v0.11.5-0.20221009092201-b66cddb77c32/errors.go:174\ngithub.com/pingcap/errors.(*Error).GenWithStackByArgs\n\tgithub.com/pingcap/errors@v0.11.5-0.20221009092201-b66cddb77c32/normalize.go:164\ngithub.com/pingcap/tiflow/pkg/errors.WrapError\n\tgithub.com/pingcap/tiflow/pkg/errors/helper.go:34\ngithub.com/pingcap/tiflow/cdc/capture.(*captureImpl).runEtcdWorker\n\tgithub.com/pingcap/tiflow/cdc/capture/capture.go:524\ngithub.com/pingcap/tiflow/cdc/capture.(*captureImpl).run.func4\n\tgithub.com/pingcap/tiflow/cdc/capture/capture.go:352\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\tgolang.org/x/sync@v0.1.0/errgroup/errgroup.go:75\nruntime.goexit\n\truntime/asm_amd64.s:1598"]
...
 [2023/04/25 13:31:28.136 +00:00] [INFO] [capture.go:240] ["capture initialized"] [capture="{\"id\":\"b20df9b5-93a9-4746-b705-c5fbc495981d\",\"address\":\"upstream-ticdc-0.upstream-ticdc-peer.cdc-kafka-big-cluster-tps-1686552-1-956.svc:8301\",\"version\":\"v7.1.0-rc.0\"}"]

image

Versions of the cluster

[root@upstream-ticdc-0 /]# /cdc version
Release Version: v7.1.0-rc.0
Git Commit Hash: 4d633bf
Git Branch: heads/refs/tags/v7.1.0-rc.0
UTC Build Time: 2023-04-21 10:14:01
Go Version: go version go1.20.3 linux/amd64
Failpoint Build: false

@fubinzh fubinzh added area/ticdc Issues or PRs related to TiCDC. type/bug The issue is confirmed as a bug. labels Apr 27, 2023
@fubinzh
Copy link
Author

fubinzh commented Apr 27, 2023

capture restart is not seen during another testing, however lag is >10s (~40s)
image

@fubinzh
Copy link
Author

fubinzh commented Apr 27, 2023

/severity major

@asddongmen
Copy link
Contributor

fix by #8884

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ticdc Issues or PRs related to TiCDC. may-affects-6.1 may-affects-6.5 may-affects-7.1 severity/major type/bug The issue is confirmed as a bug.
Projects
None yet
Development

No branches or pull requests

2 participants