title | Summary |
---|---|
Sink to MySQL |
Learn how to create a changefeed to stream data from TiDB Cloud to MySQL. |
This document describes how to stream data from TiDB Cloud to MySQL using the Sink to MySQL changefeed.
Note:
To use the Changefeed feature, make sure that your TiDB cluster version is v6.4.0 or later and the TiKV node size is at least 8 vCPU and 16 GiB.
Currently, TiDB Cloud only allows up to 10 changefeeds per cluster.
For Serverless Tier clusters, the changefeed feature is unavailable.
Make sure that your TiDB Cluster can connect to the MySQL service.
If your MySQL service is in an AWS VPC that has no public internet access, take the following steps:
-
Set up a VPC peering connection between the VPC of the MySQL service and your TiDB cluster.
-
Modify the inbound rules of the security group that the MySQL service is associated with.
You must add the CIDR of the region where your TiDB Cloud cluster is located to the inbound rules. Doing so allows the traffic to flow from your TiDB Cluster to the MySQL instance.
-
If the MySQL URL contains a hostname, you need to allow TiDB Cloud to be able to resolve the DNS hostname of the MySQL service.
- Follow the steps in Enable DNS resolution for a VPC peering connection.
- Enable the Accepter DNS resolution option.
If your MySQL service is in a GCP VPC that has no public internet access, take the following steps:
-
If your MySQL service is Google Cloud SQL, you must expose a MySQL endpoint in the associated VPC of the Google Cloud SQL instance. You may need to use the Cloud SQL Auth proxy which is developed by Google.
-
Set up a VPC peering connection between the VPC of the MySQL service and your TiDB cluster.
-
Modify the ingress firewall rules of the VPC where MySQL is located.
You must add the CIDR of the region where your TiDB Cloud cluster is located to the ingress firewall rules. Doing so allows the traffic to flow from your TiDB Cluster to the MySQL endpoint.
The Sink to MySQL connector can only sink incremental data from your TiDB cluster to MySQL after a certain timestamp. If you already have data in your TiDB cluster, you must export and load the full load data of your TiDB cluster into MySQL before enabling Sink to MySQL:
-
Extend the tidb_gc_life_time to be longer than the total time of the following two operations, so that historical data during the time is not garbage collected by TiDB.
- The time to export and import the full load data
- The time to create Sink to MySQL
For example:
{{< copyable "sql" >}}
SET GLOBAL tidb_gc_life_time = '720h';
-
Use Dumpling to export data from your TiDB cluster, then use community tools such as mydumper/myloader to load data to the MySQL service.
-
From the exported files of Dumpling, get the start position of MySQL sink from the metadata file:
The following is a part of an example metadata file. The
Pos
ofSHOW MASTER STATUS
is the TSO of the full load data, which is also the start position of MySQL sink.Started dump at: 2020-11-10 10:40:19 SHOW MASTER STATUS: Log: tidb-binlog Pos: 420747102018863124 Finished dump at: 2020-11-10 10:40:20
After completing the prerequisites, you can sink your data to MySQL.
-
Navigate to the cluster overview page of the target TiDB cluster, and then click Changefeed in the left navigation pane.
-
Click Create Changefeed, and select MySQL as Target Type.
-
Fill in the MySQL endpoints, user name, and password in MySQL Connection.
-
Click Next to test whether TiDB can connect to MySQL successfully:
- If yes, you are directed to the next step of configuration.
- If not, a connectivity error is displayed, and you need to handle the error. After the error is resolved, click Next again.
-
Customize Table Filter to filter the tables that you want to replicate. For the rule syntax, refer to table filter rules.
- Add filter rules: you can set filter rules in this column. By default, there is a rule
*. *
, which stands for replicating all tables. When you add a new rule, TiDB Cloud queries all the tables in TiDB and displays only the tables that match the rules in the box on the right. - Tables to be replicated: this column shows the tables to be replicated. But it does not show the new tables to be replicated in the future or the schemas to be fully replicated.
- Tables without valid keys: this column shows tables without unique and primary keys. For these tables, because no unique identifier can be used by the downstream system to handle duplicate events, their data might be inconsistent during replication. To avoid such issues, it is recommended that you add unique keys or primary keys to these tables before the replication, or set filter rules to filter out these tables. For example, you can filter out the table
test.tbl1
using "!test.tbl1".
- Add filter rules: you can set filter rules in this column. By default, there is a rule
-
In Start Position, configure the starting position for your MySQL sink.
- If you have performed full load data using Dumpling, select Start replication from a specific TSO and fill in the TSO that you get from Dumpling exported metadata files.
- If you do not have any data in the upstream TiDB cluster, select Start replication from now on.
- Otherwise, you can customize the start time point by choosing Start replication from a specific time.
-
Click Next to review the Changefeed configuration.
If you confirm all configurations are correct, check the compliance of cross-region replication, and click Create.
If you want to modify some configurations, click Previous to go back to the previous configuration page.
-
The sink starts soon, and you can see the status of the sink changes from "Creating" to "Running".
Click the Sink to MySQL card, and you can see the Changfeed running status in a pop-up window, including checkpoint, replication latency, and other metrics.
-
If you have performed full load data using Dumpling, you need to restore the GC time to its original value (the default value is
10m
) after the sink is created:
{{< copyable "sql" >}}
SET GLOBAL tidb_gc_life_time = '10m';
- For each TiDB Cloud cluster, you can create up to 10 changefeeds.
- Because TiDB Cloud uses TiCDC to establish changefeeds, it has the same restrictions as TiCDC.