diff --git a/export-or-backup-using-dumpling.md b/export-or-backup-using-dumpling.md index dc5b2f86b8b13..d96f03a2ff1df 100644 --- a/export-or-backup-using-dumpling.md +++ b/export-or-backup-using-dumpling.md @@ -8,11 +8,19 @@ aliases: ['/docs/v3.0/export-or-backup-using-dumpling/'] This document introduces how to use the [Dumpling](https://github.com/pingcap/dumpling) tool to export or backup data in TiDB. Dumpling exports data stored in TiDB as SQL or CSV data files and can be used to make a logical full backup or export. +For detailed usage of Dumpling, use the `--help` command or refer to [Dumpling User Guide](https://github.com/pingcap/dumpling/blob/master/docs/en/user-guide.md). + When using Dumpling, you need to execute the export command on a running cluster. This document assumes that there is a TiDB instance on the `127.0.0.1:4000` host and that this TiDB instance has a root user without a password. +## Download Dumpling + +To download the latest version of Dumpling, click the [download link](https://download.pingcap.org/dumpling-nightly-linux-amd64.tar.gz). + ## Export data from TiDB -Export data using the following command: +### Export to SQL files + +Dumpling exports data to SQL files by default. You can also export data to SQL files by adding the `--filetype sql` flag: {{< copyable "shell-regular" >}} @@ -20,16 +28,18 @@ Export data using the following command: dumpling \ -u root \ -P 4000 \ - -H 127.0.0.1 \ + -h 127.0.0.1 \ --filetype sql \ --threads 32 \ -o /tmp/test \ - -F $(( 1024 * 1024 * 256 )) + -F 256 ``` -In the above command, `-H`, `-P` and `-u` mean address, port and user, respectively. If password authentication is required, you can pass it to Dumpling with `-p $YOUR_SECRET_PASSWORD`. +In the above command, `-h`, `-P` and `-u` mean address, port and user, respectively. If password authentication is required, you can pass it to Dumpling with `-p $YOUR_SECRET_PASSWORD`. + +### Export to CSV files -Dumpling exports all tables (except for system tables) in the entire database by default. You can use `--where ` to select the records to be exported. If the exported data is in CSV format (CSV files can be exported using `--filetype csv`), you can also use `--sql ` to export records selected by the specified SQL statement. +If Dumpling exports data to CSV files (use `--filetype csv` to export to CSV files), you can also use `--sql ` to export the records selected by the specified SQL statement. For example, you can export all records that match `id < 100` in `test.sbtest1` using the following command: @@ -39,17 +49,23 @@ For example, you can export all records that match `id < 100` in `test.sbtest1` ./dumpling \ -u root \ -P 4000 \ - -H 127.0.0.1 \ + -h 127.0.0.1 \ -o /tmp/test \ --filetype csv \ - --sql "select * from `test`.`sbtest1` where id < 100" + --sql 'select * from `test`.`sbtest1` where id < 100' ``` -Note that the `--sql` option can be used only for exporting CSV files for now. However, you can use `--where` to filter the rows to be exported, and use the following command to export all rows with `id < 100`: - > **Note:** > -> You need to execute the `select * from where id < 100` statement on all tables to be exported. If any table does not have the specified field, then the export fails. +> - Currently, the `--sql` option can be used only for exporting to CSV files. +> +> - Here you need to execute the `select * from where id <100` statement on all tables to be exported. If some tables do not have specified fields, the export fails. + +### Filter the exported data + +#### Use the `--where` command to filter data + +By default, Dumpling exports the tables of the entire database except the tables in the system databases. You can use `--where ` to select the records to be exported. {{< copyable "shell-regular" >}} @@ -57,24 +73,63 @@ Note that the `--sql` option can be used only for exporting CSV files for now. H ./dumpling \ -u root \ -P 4000 \ - -H 127.0.0.1 \ + -h 127.0.0.1 \ -o /tmp/test \ --where "id < 100" ``` +The above command exports the data that matches `id < 100` from each table. + +#### Use the `--filter` command to filter data + +Dumpling can filter specific databases or tables by specifying the table filter with the `--filter` command. The syntax of table filters is similar to that of `.gitignore`. For details, see [Table Filter](/table-filter.md). + +{{< copyable "shell-regular" >}} + +```shell +./dumpling \ + -u root \ + -P 4000 \ + -h 127.0.0.1 \ + -o /tmp/test \ + --filter "employees.*" \ + --filter "*.WorkOrder" +``` + +The above command exports all the tables in the `employees` database and the `WorkOrder` tables in all databases. + +#### Use the `-B` or `-T` command to filter data + +Dumpling can also export specific databases with the `-B` command or specific tables with the `-T` command. + > **Note:** > -> Currently, Dumpling does not support exporting only certain tables specified by users (i.e. `-T` flag, see [this issue](https://github.com/pingcap/dumpling/issues/76)). If you do need this feature, you can use [MyDumper](/backup-and-restore-using-mydumper-lightning.md) instead. +> - The `--filter` command and the `-T` command cannot be used at the same time. +> +> - The `-T` command can only accept a complete form of inputs like `database-name.table-name`, and inputs with only the table name are not accepted. Example: Dumpling cannot recognize `-T WorkOrder`. + +Examples: + +-`-B employees` exports the `employees` database +-`-T employees.WorkOrder` exports the `employees.WorkOrder` table + +### Improve export efficiency through concurrency The exported file is stored in the `./export-` directory by default. Commonly used parameters are as follows: - `-o` is used to select the directory where the exported files are stored. -- `-F` option is used to specify the maximum size of a single file (the unit here is byte, different from MyDumper). -- `-r` option is used to specify the maximum number of records (or the number of rows in the database) for a single file. +- `-F` option is used to specify the maximum size of a single file (the unit here is `MiB`; inputs like `5GiB` or `8KB` are also acceptable). +- `-r` option is used to specify the maximum number of records (or the number of rows in the database) for a single file. When it is enabled, Dumpling enables concurrency in the table to improve the speed of exporting large tables. -You can use the above parameters to provide Dumpling with a higher degree of parallelism. +You can use the above parameters to provide Dumpling with a higher degree of concurrency. + +### Adjust Dumpling's data consistency options + +> **Note:** +> +> In most scenarios, you do not need to adjust the default data consistency options of Dumpling. -Another flag that is not mentioned above is `--consistency `, which controls the way in which data is exported for "consistency assurance". For TiDB, consistency is ensured by getting a snapshot of a certain timestamp by default (i.e. `--consistency snapshot`). When using snapshot for consistency, you can use the `--snapshot` parameter to specify the timestamp to be backed up. You can also use the following levels of consistency: +Dumpling uses the `--consistency ` option to control the way in which data is exported for "consistency assurance". For TiDB, data consistency is guaranteed by getting a snapshot of a certain timestamp by default (i.e. `--consistency snapshot`). When using snapshot for consistency, you can use the `--snapshot` parameter to specify the timestamp to be backed up. You can also use the following levels of consistency: - `flush`: Use [`FLUSH TABLES WITH READ LOCK`](https://dev.mysql.com/doc/refman/8.0/en/flush.html#flush-tables-with-read-lock) to ensure consistency. - `snapshot`: Get a consistent snapshot of the specified timestamp and export it.