Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC(dm): Add enhanced pre-check rfc #3674

Merged
merged 39 commits into from
Dec 23, 2021
Merged
Changes from 8 commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
f657869
split actions paths
okJiang Oct 29, 2021
ea863f9
Merge branch 'master' of https://github.com/pingcap/ticdc into pingca…
okJiang Nov 22, 2021
ceec7f3
Merge branch 'pingcap-master'
okJiang Nov 22, 2021
0640fc6
Merge branch 'pingcap:master' into master
okJiang Nov 30, 2021
ce51331
add rfc
okJiang Nov 30, 2021
549da3f
Update dm/docs/RFCS/20211130_enhanced_pre_checker.md
okJiang Dec 1, 2021
ad3c49f
Update dm/docs/RFCS/20211130_enhanced_pre_checker.md
okJiang Dec 1, 2021
d3b42e5
format markdown
okJiang Dec 1, 2021
f5b3240
Update dm/docs/RFCS/20211130_enhanced_pre_checker.md
okJiang Dec 2, 2021
e6bce30
Update dm/docs/RFCS/20211130_enhanced_pre_checker.md
okJiang Dec 2, 2021
a003e1f
address comment
okJiang Dec 2, 2021
ebbaee8
add some description
okJiang Dec 3, 2021
4c32dd1
Update dm/docs/RFCS/20211130_enhanced_pre_checker.md
okJiang Dec 6, 2021
ce60241
update
okJiang Dec 6, 2021
129659f
Merge branch 'add-rfc-pre-check' of github.com:okJiang/ticdc into add…
okJiang Dec 6, 2021
ef05cfd
Apply suggestions from code review
okJiang Dec 6, 2021
c43fe21
Merge branch 'master' into add-rfc-pre-check
glorv Dec 6, 2021
bd824ef
Update dm/docs/RFCS/20211130_enhanced_pre_checker.md
okJiang Dec 7, 2021
eabe327
address comment
okJiang Dec 7, 2021
9444c89
Merge branch 'add-rfc-pre-check' of github.com:okJiang/ticdc into add…
okJiang Dec 7, 2021
66d1ebd
update auto increment id
okJiang Dec 7, 2021
9844a03
add omissive version check
okJiang Dec 8, 2021
89267d9
Apply suggestions from code review
okJiang Dec 10, 2021
ee361ab
address comment
okJiang Dec 10, 2021
9b1456f
Merge branch 'master' into add-rfc-pre-check
okJiang Dec 10, 2021
12c9fed
address comment
okJiang Dec 14, 2021
f90c8ac
adjust version checker
okJiang Dec 15, 2021
320b009
Apply suggestions from code review
okJiang Dec 21, 2021
cc13850
address comment
okJiang Dec 21, 2021
f7992f7
address comment
okJiang Dec 21, 2021
1f83940
address comment
okJiang Dec 22, 2021
a8dc9da
Update dm/docs/RFCS/20211130_enhanced_pre_checker.md
okJiang Dec 22, 2021
93ae002
update
okJiang Dec 22, 2021
aff6168
Merge branch 'add-rfc-pre-check' of github.com:okJiang/ticdc into add…
okJiang Dec 22, 2021
7c2f253
Merge branch 'master' into add-rfc-pre-check
okJiang Dec 22, 2021
d5f8d75
remove LOCK TABLES from flush consistency
okJiang Dec 22, 2021
2ea1a63
add Binlog_Do_DB
okJiang Dec 23, 2021
0b6d70d
Merge branch 'master' into add-rfc-pre-check
okJiang Dec 23, 2021
bfddc6a
Merge branch 'master' into add-rfc-pre-check
ti-chi-bot Dec 23, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
80 changes: 80 additions & 0 deletions dm/docs/RFCS/20211130_enhanced_pre_checker.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
# Enhanced pre-check Design

## Background

Before the DM’s task starts, we should check some items to avoid start-task failures. You can see the details in our [document](https://docs.pingcap.com/zh/tidb-data-migration/stable/precheck#%E5%85%B3%E9%97%AD%E6%A3%80%E6%9F%A5%E9%A1%B9). In order to allow some users to use DM normally under certain circumstances, we can also set ignore-check-items in task config to ignore some items that you don’t want to check. Now, we find some shortcomings in regard to check-items.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'd better change all the doc links to its English version since this doc is written in English.


### Bad user habits

We allow users to ignore all check-items, in which case the user's authority is too large to perform unexpected operations.
okJiang marked this conversation as resolved.
Show resolved Hide resolved

### Too much time overhead

If we have a large number of tables in source, we will take too much time in checking table schema, sharding table consistency and sharding table auto increment key.
okJiang marked this conversation as resolved.
Show resolved Hide resolved

### Inadequate check

* If downstream creates tables manually and the new downstream’s auto increment ID is not the same as the upstream, we shouldn’t check **auto_increment_ID** for errors. Users should be responsible for what they set.
* Dump privilege only checks RELOAD and SELECT. However, Dumpling supports different [consistency configurations](https://docs.pingcap.com/zh/tidb/stable/dumpling-overview#%E8%B0%83%E6%95%B4-dumpling-%E7%9A%84%E6%95%B0%E6%8D%AE%E4%B8%80%E8%87%B4%E6%80%A7%E9%80%89%E9%A1%B9), which need more privilege.
okJiang marked this conversation as resolved.
Show resolved Hide resolved
* If online-ddl is set by true and a DDL is in online-ddl stage, DM will have a problem in all mode. Specifically, ghost table has been created, is executing the DDL, but is not renamed yet. In this case, DM will report an error when the ghost table is renamed after the dump phase. You can learn more about online-ddl [here](https://docs.pingcap.com/zh/tidb-data-migration/stable/feature-online-ddl).
* For schema_of_shard_tables, whatever pessimistic task and optimistic task, we all check it by comparing all sharding tables’ structures for consistency simply. For optimistic mode, we can do better.
okJiang marked this conversation as resolved.
Show resolved Hide resolved

## Proposal

### Restrict user usage
1. Remove the following settings from the [document](https://docs.pingcap.com/zh/tidb-data-migration/stable/precheck#%E5%85%B3%E9%97%AD%E6%A3%80%E6%9F%A5%E9%A1%B9). If the following items are detected to be set in the configuration, a warning will be reported.
okJiang marked this conversation as resolved.
Show resolved Hide resolved
- all
- dump_privilege
- replication_privilege
- server_id
- binlog_enable
- binlog_format
- binlog_row_image
2. If task is full/all mode, the following items will be forced to check (correspondingly, it will not be check in increment mode):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the ignore rule for the items not listed in step 2 and 3

Copy link
Member Author

@okJiang okJiang Dec 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The items not listed in step 2 and 3 will be checked if user don't set in ignore-check-items. If user set it, dm will ignore them.

It is same as now.

Add related description in pingcap/ticdc@a003e1f.

Copy link
Contributor

@lance6716 lance6716 Dec 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you explain why these checking items can be disabled? my expectation is that user can disable nothing. if we can't make sure a checking we just let it raise warning.

Copy link
Member Author

@okJiang okJiang Dec 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sgtm. L80-82 is our efforts.

So, we should deprecate all ignore_check_items. cc @sunzhaoyang

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with lance6716. Critical check items should not be allowed to be ignored, which would potentially lead to data loss or unnecessary oncall.

Before ignore the check because there is a performance problem, if the performance problem is solved, I can not yet think of any cases where there is a reason to skip @okJiang

- dump_privilege
3. If task is increment/all mode, the following items will be forced to check (correspondingly, it will not be check in full mode):
- replication_privilege
- server_id
- binlog_enable
- binlog_format
- binlog_row_image

### Speed ​​up check

1. Support concurrent check
- table_schema
- schema_of_shard_tables
- auto_increment_ID
2. Use mydumper.threads as **source_connection_concurrency**, which should update in our document.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if this is an incremental task? mydumper.threads is not relevant to incremental.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this is a problem.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, should we add a new config item? @sunzhaoyang

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should add a new config item for checks. What about syncers.worker-count or adjust by table numbers?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the latter is better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we adjust by table numbers, how many tables do we cover each connection is better? @lichunzhu


#### How to speed up?

Since every checker is concurrent, we can split tables to **source_connection_concurrency** part, and create a checker for every part.
lance6716 marked this conversation as resolved.
Show resolved Hide resolved

### Optimize some check

1. If downstream creates tables manually and the new downstream’s auto increment ID is not the same as the upstream, we shouldn’t check **auto_increment_ID**.
okJiang marked this conversation as resolved.
Show resolved Hide resolved
2. Dump_privilege will check different privileges according to different [consistency](https://docs.pingcap.com/zh/tidb/stable/dumpling-overview#%E8%B0%83%E6%95%B4-dumpling-%E7%9A%84%E6%95%B0%E6%8D%AE%E4%B8%80%E8%87%B4%E6%80%A7%E9%80%89%E9%A1%B9) and downstream on source.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember there should also be SELECT privilege on some system tables 🤔

cc @lichunzhu

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For MySQL, we need SELECT privilege to INFORMATION_SCHEMA.TABLES
For TiDB, we need SELECT privilege to INFORMATION_SCHEMA.TIKV_REGION_STATUS, INFORMATION_SCHEMA.PLACEMENT_RULES, INFORMATION_SCHEMA.CLUSTER_INFO, INFORMATION_SCHEMA.TIDB_SERVERS_INFO, INFORMATION_SCHEMA.PARTITIONS.

- For all consistency, we will check
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- For all consistency, we will check
For all consistency, we will check

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change will break the markdown format.

- REPLICATION CLIENT (global)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is used in incremental phase, I think dumpling don't need it.

Also, all consistency options should be aligned with https://github.com/pingcap/ticdc/pull/3674#discussion_r762777211

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ptal @lichunzhu

Why dumpling need REPLICATION CLIENT privilege?

Copy link
Contributor

@lichunzhu lichunzhu Dec 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because SHOW MASTER STATUS needs REPLICATION CLIENT privilege

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DM has a test of revoking REPLICATION CLIENT and start a full mode task. So I expect dumpling will not fail the dump without this privilege and we only require it for incremental task

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After test, dump phase does not need REPLICATION CLIENT and REPLICATION SLAVE. I will remove them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should check this privilege according to the task mode.
For all/incremental task we need these two privileges. For full mode it's not needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we still have ReplicationPrivilegeChecker, which can check them in all/incremental task.

- SELECT (only dump table)
- For flush consistency:
- RELOAD (global)
- For flush/lock consistency:
- LOCK TABLES (only dump table)
okJiang marked this conversation as resolved.
Show resolved Hide resolved
- For TiDB downstream:
okJiang marked this conversation as resolved.
Show resolved Hide resolved
- PROCESS (global)
3. Add OnlineddlChecker to check if a ddl exists in online-ddl stage when DM task is all mode and online-ddl is true. It will be forced to check in all mode and not check in increment mode.
okJiang marked this conversation as resolved.
Show resolved Hide resolved
4. Enhance schema_of_shard_tables.
- At first, if a machine exits the DM’s checkpoint, the DM’s subsequent task starts/resumes at the checkpoint. So we think the checkpoint guarantees consistency.
okJiang marked this conversation as resolved.
Show resolved Hide resolved
- If not exit checkpoint:
lance6716 marked this conversation as resolved.
Show resolved Hide resolved
- For all/full mode (pessimistic task): we keep the original check;
- For all/full mode (optimistic task): we check whether the shard tables schema meets the definition of [Optimistic Schema Compatibility](20191209_optimistic_ddl.md). If that meets, we can create tables by the compatible schema in the dump stage.
- For incremental mode: not check the sharding tables’ schema, because the table schema obtained from show create table is not the schema at the point of binlog.
5. Make the fail state more gentle, which is from `StateFailure` to `StateWarning`.
- checkAutoIncrementKey
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checkAutoIncrementKey is duplicated with auto_increment_ID?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes they are the same thing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we'd better use one name for it.

Copy link
Member Author

@okJiang okJiang Dec 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can think of checkAutoIncrementKey and checkPK/UK as a description with a smaller granularity than ignore_check_items's auto_increment_ID.

Because checkPK/UK is only one item in table_schema checker.

- checkPK/UK

### Remove checker from tidb-tools to DM

After this change, checker is deeply coupled to DM, both with dump Privilege and optimistic pessimistic coordination. And checker is only used by DM (TiCDC and TiDB all don't use it). So removing checker from tidb-tools to DM is more convenient for development work。
okJiang marked this conversation as resolved.
Show resolved Hide resolved