Skip to content

Commit

Permalink
cherry pick #3454 to release-3.1 (#3459)
Browse files Browse the repository at this point in the history
Signed-off-by: ti-srebot <ti-srebot@pingcap.com>

Co-authored-by: Keke Yi <40977455+yikeke@users.noreply.github.com>
  • Loading branch information
ti-srebot and yikeke authored Jul 28, 2020
1 parent 697a355 commit 13c4192
Showing 1 changed file with 71 additions and 16 deletions.
87 changes: 71 additions & 16 deletions export-or-backup-using-dumpling.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,28 +10,38 @@ This document introduces how to use the [Dumpling](https://github.com/pingcap/du

For backups of SST files (KV pairs) or backups of incremental data that are not sensitive to latency, refer to [BR](/br/backup-and-restore-tool.md).

For detailed usage of Dumpling, use the `--help` command or refer to [Dumpling User Guide](https://github.com/pingcap/dumpling/blob/master/docs/en/user-guide.md).

When using Dumpling, you need to execute the export command on a running cluster. This document assumes that there is a TiDB instance on the `127.0.0.1:4000` host and that this TiDB instance has a root user without a password.

## Download Dumpling

To download the latest version of Dumpling, click the [download link](https://download.pingcap.org/dumpling-nightly-linux-amd64.tar.gz).

## Export data from TiDB

Export data using the following command:
### Export to SQL files

Dumpling exports data to SQL files by default. You can also export data to SQL files by adding the `--filetype sql` flag:

{{< copyable "shell-regular" >}}

```shell
dumpling \
-u root \
-P 4000 \
-H 127.0.0.1 \
-h 127.0.0.1 \
--filetype sql \
--threads 32 \
-o /tmp/test \
-F $(( 1024 * 1024 * 256 ))
-F 256
```

In the above command, `-H`, `-P` and `-u` mean address, port and user, respectively. If password authentication is required, you can pass it to Dumpling with `-p $YOUR_SECRET_PASSWORD`.
In the above command, `-h`, `-P` and `-u` mean address, port and user, respectively. If password authentication is required, you can pass it to Dumpling with `-p $YOUR_SECRET_PASSWORD`.

### Export to CSV files

Dumpling exports all tables (except for system tables) in the entire database by default. You can use `--where <SQL where expression>` to select the records to be exported. If the exported data is in CSV format (CSV files can be exported using `--filetype csv`), you can also use `--sql <SQL>` to export records selected by the specified SQL statement.
If Dumpling exports data to CSV files (use `--filetype csv` to export to CSV files), you can also use `--sql <SQL>` to export the records selected by the specified SQL statement.

For example, you can export all records that match `id < 100` in `test.sbtest1` using the following command:

Expand All @@ -41,42 +51,87 @@ For example, you can export all records that match `id < 100` in `test.sbtest1`
./dumpling \
-u root \
-P 4000 \
-H 127.0.0.1 \
-h 127.0.0.1 \
-o /tmp/test \
--filetype csv \
--sql "select * from `test`.`sbtest1` where id < 100"
--sql 'select * from `test`.`sbtest1` where id < 100'
```

Note that the `--sql` option can be used only for exporting CSV files for now. However, you can use `--where` to filter the rows to be exported, and use the following command to export all rows with `id < 100`:

> **Note:**
>
> You need to execute the `select * from <table-name> where id < 100` statement on all tables to be exported. If any table does not have the specified field, then the export fails.
> - Currently, the `--sql` option can be used only for exporting to CSV files.
>
> - Here you need to execute the `select * from <table-name> where id <100` statement on all tables to be exported. If some tables do not have specified fields, the export fails.
### Filter the exported data

#### Use the `--where` command to filter data

By default, Dumpling exports the tables of the entire database except the tables in the system databases. You can use `--where <SQL where expression>` to select the records to be exported.

{{< copyable "shell-regular" >}}

```shell
./dumpling \
-u root \
-P 4000 \
-H 127.0.0.1 \
-h 127.0.0.1 \
-o /tmp/test \
--where "id < 100"
```

The above command exports the data that matches `id < 100` from each table.

#### Use the `--filter` command to filter data

Dumpling can filter specific databases or tables by specifying the table filter with the `--filter` command. The syntax of table filters is similar to that of `.gitignore`. For details, see [Table Filter](/table-filter.md).

{{< copyable "shell-regular" >}}

```shell
./dumpling \
-u root \
-P 4000 \
-h 127.0.0.1 \
-o /tmp/test \
--filter "employees.*"
--filter "*.WorkOrder"
```

The above command exports all the tables in the `employees` database and the `WorkOrder` tables in all databases.

#### Use the `-B` or `-T` command to filter data

Dumpling can also export specific databases with the `-B` command or specific tables with the `-T` command.

> **Note:**
>
> Currently, Dumpling does not support exporting only certain tables specified by users (i.e. `-T` flag, see [this issue](https://github.com/pingcap/dumpling/issues/76)). If you do need this feature, you can use [MyDumper](/backup-and-restore-using-mydumper-lightning.md) instead.
> - The `--filter` command and the `-T` command cannot be used at the same time.
>
> - The `-T` command can only accept a complete form of inputs like `database-name.table-name`, and inputs with only the table name are not accepted. Example: Dumpling cannot recognize `-T WorkOrder`.
Examples:

-`-B employees` exports the `employees` database
-`-T employees.WorkOrder` exports the `employees.WorkOrder` table

### Improve export efficiency through concurrency

The exported file is stored in the `./export-<current local time>` directory by default. Commonly used parameters are as follows:

- `-o` is used to select the directory where the exported files are stored.
- `-F` option is used to specify the maximum size of a single file (the unit here is byte, different from MyDumper).
- `-r` option is used to specify the maximum number of records (or the number of rows in the database) for a single file.
- `-F` option is used to specify the maximum size of a single file (the unit here is `MiB`; inputs like `5GiB` or `8KB` are also acceptable).
- `-r` option is used to specify the maximum number of records (or the number of rows in the database) for a single file. When it is enabled, Dumpling enables concurrency in the table to improve the speed of exporting large tables.

You can use the above parameters to provide Dumpling with a higher degree of parallelism.
You can use the above parameters to provide Dumpling with a higher degree of concurrency.

### Adjust Dumpling's data consistency options

> **Note:**
>
> In most scenarios, you do not need to adjust the default data consistency options of Dumpling.
Another flag that is not mentioned above is `--consistency <consistency level>`, which controls the way in which data is exported for "consistency assurance". For TiDB, consistency is ensured by getting a snapshot of a certain timestamp by default (i.e. `--consistency snapshot`). When using snapshot for consistency, you can use the `--snapshot` parameter to specify the timestamp to be backed up. You can also use the following levels of consistency:
Dumpling uses the `--consistency <consistency level>` option to control the way in which data is exported for "consistency assurance". For TiDB, data consistency is guaranteed by getting a snapshot of a certain timestamp by default (i.e. `--consistency snapshot`). When using snapshot for consistency, you can use the `--snapshot` parameter to specify the timestamp to be backed up. You can also use the following levels of consistency:

- `flush`: Use [`FLUSH TABLES WITH READ LOCK`](https://dev.mysql.com/doc/refman/8.0/en/flush.html#flush-tables-with-read-lock) to ensure consistency.
- `snapshot`: Get a consistent snapshot of the specified timestamp and export it.
Expand Down

0 comments on commit 13c4192

Please sign in to comment.