Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tidb-lightning,br: replaced black-white-list by table-filter (#3065) #3140

Merged
merged 3 commits into from
Jul 3, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion TOC.md
Original file line number Diff line number Diff line change
Expand Up @@ -159,7 +159,7 @@
+ [Configure](/tidb-lightning/tidb-lightning-configuration.md)
+ Key Features
+ [Checkpoints](/tidb-lightning/tidb-lightning-checkpoints.md)
+ [Table Filter](/tidb-lightning/tidb-lightning-table-filter.md)
+ [Table Filter](/table-filter.md)
+ [CSV Support](/tidb-lightning/migrate-from-csv-using-tidb-lightning.md)
+ [TiDB-backend](/tidb-lightning/tidb-lightning-tidb-backend.md)
+ [Web Interface](/tidb-lightning/tidb-lightning-web-interface.md)
Expand Down Expand Up @@ -430,6 +430,7 @@
+ [Errors Codes](/error-codes.md)
+ [TiCDC Overview](/ticdc/ticdc-overview.md)
+ [TiCDC Open Protocol](/ticdc/ticdc-open-protocol.md)
+ [Table Filter](/table-filter.md)
+ FAQs
+ [TiDB FAQs](/faq/tidb-faq.md)
+ [TiDB Lightning FAQs](/tidb-lightning/tidb-lightning-faq.md)
Expand Down
37 changes: 37 additions & 0 deletions br/backup-and-restore-tool.md
Original file line number Diff line number Diff line change
Expand Up @@ -269,6 +269,25 @@ For descriptions of other options, see [Back up all cluster data](#back-up-all-t

A progress bar is displayed in the terminal during the backup operation. When the progress bar advances to 100%, the backup is complete. Then the BR also checks the backup data to ensure data safety.

### Back up with table filter

To back up multiple tables with more complex criteria, execute the `br backup full` command and specify the [table filters](/table-filter.md) with `--filter` or `-f`.

**Usage example:**

The following command backs up the data of all tables in the form `db*.tbl*` to the `/tmp/backup` path on each TiKV node and writes the `backupmeta` file to this path.

{{< copyable "shell-regular" >}}

```shell
br backup full \
--pd "${PDIP}:2379" \
--filter 'db*.tbl*' \
--storage "local:///tmp/backup" \
--ratelimit 120 \
--log-file backupfull.log
```

### Back up data to Amazon S3 backend

If you back up the data to the Amazon S3 backend, instead of `local` storage, you need to specify the S3 storage path in the `storage` sub-command, and allow the BR node and the TiKV node to access Amazon S3.
Expand Down Expand Up @@ -443,6 +462,24 @@ br restore table \

In the above command, `--table` specifies the name of the table to be restored. For descriptions of other options, see [Restore all backup data](#restore-all-the-backup-data) and [Restore a database](#restore-a-database).

### Restore with table filter

To restore multiple tables with more complex criteria, execute the `br restore full` command and specify the [table filters](/table-filter.md) with `--filter` or `-f`.

**Usage example:**

The following command restores a subset of tables backed up in the `/tmp/backup` path to the cluster.

{{< copyable "shell-regular" >}}

```shell
br restore full \
--pd "${PDIP}:2379" \
--filter 'db*.tbl*' \
--storage "local:///tmp/backup" \
--log-file restorefull.log
```

### Restore data from Amazon S3 backend

If you restore data from the Amazon S3 backend, instead of `local` storage, you need to specify the S3 storage path in the `storage` sub-command, and allow the BR node and the TiKV node to access Amazon S3.
Expand Down
252 changes: 252 additions & 0 deletions table-filter.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,252 @@
---
title: Table Filter
summary: Usage of table filter feature in TiDB tools.
category: reference
aliases: ['/docs/stable/tidb-lightning/tidb-lightning-table-filter/','/docs/stable/reference/tools/tidb-lightning/table-filter/','/tidb/stable/tidb-lightning-table-filter/','/tidb/v4.0/tidb-lightning-table-filter/']
---

# Table Filter

The TiDB ecosystem tools operate on all the databases by default, but oftentimes only a subset is needed. For example, you only want to work with the schemas in the form of `foo*` and `bar*` and nothing else.

Since TiDB 4.0, all TiDB ecosystem tools share a common filter syntax to define subsets. This document describes how to use the table filter feature.

## Usage

### CLI

Table filters can be applied to the tools using multiple `-f` or `--filter` command line parameters. Each filter is in the form of `db.table`, where each part can be a wildcard (further explained in the [next section](#wildcards)). The following lists the example usage in each tool.

* [BR](/br/backup-and-restore-tool.md):

{{< copyable "shell-regular" >}}

```shell
./br backup full -f 'foo*.*' -f 'bar*.*' -s 'local:///tmp/backup'
# ^~~~~~~~~~~~~~~~~~~~~~~
./br restore full -f 'foo*.*' -f 'bar*.*' -s 'local:///tmp/backup'
# ^~~~~~~~~~~~~~~~~~~~~~~
```

* [Dumpling](/export-or-backup-using-dumpling.md):

{{< copyable "shell-regular" >}}

```shell
./dumpling -f 'foo*.*' -f 'bar*.*' -P 3306 -o /tmp/data/
# ^~~~~~~~~~~~~~~~~~~~~~~
```

* [Lightning](/tidb-lightning/tidb-lightning-overview.md):

{{< copyable "shell-regular" >}}

```shell
./tidb-lightning -f 'foo*.*' -f 'bar*.*' -d /tmp/data/ --backend tidb
# ^~~~~~~~~~~~~~~~~~~~~~~
```

### TOML configuration files

Table filters in TOML files are specified as [array of strings](https://toml.io/en/v1.0.0-rc.1#section-15). The following lists the example usage in each tool.

* Lightning:

```toml
[mydumper]
filter = ['foo*.*', 'bar*.*']
```

* [TiCDC](/ticdc/ticdc-overview.md):

```toml
[filter]
rules = ['foo*.*', 'bar*.*']

[[sink.dispatchers]]
matcher = ['db1.*', 'db2.*', 'db3.*']
dispatcher = 'ts'
```

## Syntax

### Plain table names

Each table filter rule consists of a "schema pattern" and a "table pattern", separated by a dot (`.`). Tables whose fully-qualified name matches the rules are accepted.

```
db1.tbl1
db2.tbl2
db3.tbl3
```

A plain name must only consist of valid [identifier characters](/schema-object-names.md), such as:

* digits (`0` to `9`)
* letters (`a` to `z`, `A` to `Z`)
* `$`
* `_`
* non ASCII characters (U+0080 to U+10FFFF)

All other ASCII characters are reserved. Some punctuations have special meanings, as described in the next section.

### Wildcards

Each part of the name can be a wildcard symbol described in [fnmatch(3)](https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_13):

* `*` — matches zero or more characters
* `?` — matches one character
* `[a-z]` — matches one character between "a" and "z" inclusively
* `[!a-z]` — matches one character except "a" to "z".

```
db[0-9].tbl[0-9a-f][0-9a-f]
data.*
*.backup_*
```

"Character" here means a Unicode code point, such as:

* U+00E9 (é) is 1 character.
* U+0065 U+0301 (é) are 2 characters.
* U+1F926 U+1F3FF U+200D U+2640 U+FE0F (🤦🏿‍♀️) are 5 characters.

### File import

To import a file as the filter rule, include an `@` at the beginning of the rule to specify the file name. The table filter parser treats each line of the imported file as additional filter rules.

For example, if a file `config/filter.txt` has the following content:

```
employees.*
*.WorkOrder
```

the following two invocations are equivalent:

```bash
./dumpling -f '@config/filter.txt'
./dumpling -f 'employees.*' -f '*.WorkOrder'
```

A filter file cannot further import another file.

### Comments and blank lines

Inside a filter file, leading and trailing white-spaces of every line are trimmed. Furthermore, blank lines (empty strings) are ignored.

A leading `#` marks a comment and is ignored. `#` not at start of line is considered syntax error.

```
# this line is a comment
db.table # but this part is not comment and may cause error
```

### Exclusion

An `!` at the beginning of the rule means the pattern after it is used to exclude tables from being processed. This effectively turns the filter into a block list.

```
*.*
#^ note: must add the *.* to include all tables first
!*.Password
!employees.salaries
```

### Escape character

To turn a special character into an identifier character, precede it with a backslash `\`.

```
db\.with\.dots.*
```

For simplicity and future compatibility, the following sequences are prohibited:

* `\` at the end of the line after trimming whitespaces (use `[ ]` to match a literal whitespace at the end).
* `\` followed by any ASCII alphanumeric character (`[0-9a-zA-Z]`). In particular, C-like escape sequences like `\0`, `\r`, `\n` and `\t` currently are meaningless.

### Quoted identifier

Besides `\`, special characters can also be suppressed by quoting using `"` or `` ` ``.

```
"db.with.dots"."tbl\1"
`db.with.dots`.`tbl\2`
```

The quotation mark can be included within an identifier by doubling itself.

```
"foo""bar".`foo``bar`
# equivalent to:
foo\"bar.foo\`bar
```

Quoted identifiers cannot span multiple lines.

It is invalid to partially quote an identifier:

```
"this is "invalid*.*
```

### Regular expression

In case very complex rules are needed, each pattern can be written as a regular expression delimited with `/`:

```
/^db\d{2,}$/./^tbl\d{2,}$/
```

These regular expressions use the [Go dialect](https://pkg.go.dev/regexp/syntax?tab=doc). The pattern is matched if the identifier contains a substring matching the regular expression. For instance, `/b/` matches `db01`.

> **Note:**
>
> Every `/` in the regular expression must be escaped as `\/`, including inside `[…]`. You cannot place an unescaped `/` between `\Q…\E`.

## Multiple rules

When a table name matches none of the rules in the filter list, the default behavior is to ignore such unmatched tables.

To build a block list, an explicit `*.*` must be used as the first rule, otherwise all tables will be excluded.

```bash
# every table will be filtered out
./dumpling -f '!*.Password'

# only the "Password" table is filtered out, the rest are included.
./dumpling -f '*.*' -f '!*.Password'
```

In a filter list, if a table name matches multiple patterns, the last match decides the outcome. For instance:

```
# rule 1
employees.*
# rule 2
!*.dep*
# rule 3
*.departments
```

The filtered outcome is as follows:

| Table name | Rule 1 | Rule 2 | Rule 3 | Outcome |
|-----------------------|--------|--------|--------|------------------|
| irrelevant.table | | | | Default (reject) |
| employees.employees | ✓ | | | Rule 1 (accept) |
| employees.dept_emp | ✓ | ✓ | | Rule 2 (reject) |
| employees.departments | ✓ | ✓ | ✓ | Rule 3 (accept) |
| else.departments | | ✓ | ✓ | Rule 3 (accept) |

> **Note:**
>
> In TiDB tools, the system schemas are always excluded regardless of the table filter settings. The system schemas are:
>
> * `INFORMATION_SCHEMA`
> * `PERFORMANCE_SCHEMA`
> * `METRICS_SCHEMA`
> * `INSPECTION_SCHEMA`
> * `mysql`
> * `sys`
8 changes: 4 additions & 4 deletions tidb-lightning/tidb-lightning-configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -164,6 +164,9 @@ strict-format = false
# parallel. max-region-size is the maximum size of each chunk after splitting.
# max-region-size = 268_435_456 # Byte (default = 256 MB)

# Only import tables if these wildcard rules are matched. See the corresponding section for details.
filter = ['*.*']

# Configures how CSV files are parsed.
[mydumper.csv]
# Separator between fields, should be an ASCII character.
Expand Down Expand Up @@ -256,10 +259,6 @@ analyze = true
switch-mode = "5m"
# Duration between which an import progress is printed to the log.
log-progress = "5m"

# Table filter options. See the corresponding section for details.
#[black-white-list]
# ...
```

### TiKV Importer
Expand Down Expand Up @@ -351,6 +350,7 @@ min-available-ratio = 0.05
| -V | Prints program version | |
| -d *directory* | Directory of the data dump to read from | `mydumper.data-source-dir` |
| -L *level* | Log level: debug, info, warn, error, fatal (default = info) | `lightning.log-level` |
| -f *rule* | [Table filter rules](/table-filter.md) (can be specified multiple times) | `mydumper.filter` |
| --backend *backend* | [Delivery backend](/tidb-lightning/tidb-lightning-tidb-backend.md) (`importer` or `tidb`) | `tikv-importer.backend` |
| --log-file *file* | Log file path (default = a temporary file in `/tmp`) | `lightning.log-file` |
| --status-addr *ip:port* | Listening address of the TiDB Lightning server | `lightning.status-port` |
Expand Down
16 changes: 10 additions & 6 deletions tidb-lightning/tidb-lightning-glossary.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,12 +35,6 @@ Back end is the destination where TiDB Lightning sends the parsed result. Also s

See [TiDB Lightning TiDB-backend](/tidb-lightning/tidb-lightning-tidb-backend.md) for details.

### Black-white list

A configuration list that specifies which tables to be imported and which should be excluded.

See [TiDB Lightning Table Filter](/tidb-lightning/tidb-lightning-table-filter.md) for details.

<!-- C -->

## C
Expand Down Expand Up @@ -103,6 +97,16 @@ Engines use TiKV Importer's `import-dir` as temporary storage, which are sometim

See also [data engine](/tidb-lightning/tidb-lightning-glossary.md#data-engine) and [index engine](/tidb-lightning/tidb-lightning-glossary.md#index-engine).

<!-- F -->

## F

### Filter

A configuration list that specifies which tables to be imported or excluded.

See [Table Filter](/table-filter.md) for details.

<!-- I -->

## I
Expand Down