Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CDC cloud: CDC OOM when upstream has historical data and has 60K tables #2553

Closed
Tammyxia opened this issue Aug 17, 2021 · 3 comments
Closed
Assignees
Labels
area/ticdc Issues or PRs related to TiCDC. component/replica-model Replication model component. component/sorter Sorter component. difficulty/hard Hard task. severity/major type/bug The issue is confirmed as a bug.

Comments

@Tammyxia
Copy link

Bug Report

Please answer these questions before submitting your issue. Thanks!

  1. What did you do? If possible, provide a recipe for reproducing the error.
  • There are 8 CDC servers, each has vcpu36c mem72g , 1 changefeed, 6w tables.
  • Stop changefeed, add workload in upstream tidb, data size less then 40GB.
  • Resume changefeed. Meanwhile, workload IO in upstream is still running...
  1. What did you expect to see?
  • No any error.
  1. What did you see instead?
  • CDC OOM persistently...
  1. Versions of the cluster

    • Upstream TiDB cluster version (execute SELECT tidb_version(); in a MySQL client):

      4.0.14
      
    • TiCDC version (execute cdc version):

Release Version: v4.0.14-42-gc24e30f
Git Commit Hash: c24e30f
Git Branch: HEAD
UTC Build Time: 2021-08-11 06:03:32
Go Version: go version go1.16.4 linux/amd64
Failpoint Build: false
```

@Tammyxia Tammyxia added type/bug The issue is confirmed as a bug. severity/major labels Aug 17, 2021
@amyangfei amyangfei changed the title CDC cloud: CDC OOM when upstream has historical data and has 6w tables CDC cloud: CDC OOM when upstream has historical data and has 60K tables Aug 17, 2021
@asddongmen asddongmen added component/sorter Sorter component. component/replica-model Replication model component. difficulty/hard Hard task. labels Aug 23, 2021
@Tammyxia
Copy link
Author

Tammyxia commented Sep 1, 2021

When tikv scale-in, checkpoint lag is 12 hours, CDC owner is OOM.

@Tammyxia
Copy link
Author

Tammyxia commented Sep 1, 2021

image

@overvenus
Copy link
Member

We have confirmed this issue is fixed in release-5.4.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ticdc Issues or PRs related to TiCDC. component/replica-model Replication model component. component/sorter Sorter component. difficulty/hard Hard task. severity/major type/bug The issue is confirmed as a bug.
Projects
None yet
Development

No branches or pull requests

4 participants