-
Notifications
You must be signed in to change notification settings - Fork 286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC(dm): Add enhanced pre-check rfc #3674
Changes from 8 commits
f657869
ea863f9
ceec7f3
0640fc6
ce51331
549da3f
ad3c49f
d3b42e5
f5b3240
e6bce30
a003e1f
ebbaee8
4c32dd1
ce60241
129659f
ef05cfd
c43fe21
bd824ef
eabe327
9444c89
66d1ebd
9844a03
89267d9
ee361ab
9b1456f
12c9fed
f90c8ac
320b009
cc13850
f7992f7
1f83940
a8dc9da
93ae002
aff6168
7c2f253
d5f8d75
2ea1a63
0b6d70d
bfddc6a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,80 @@ | ||||||
# Enhanced pre-check Design | ||||||
|
||||||
## Background | ||||||
|
||||||
Before the DM’s task starts, we should check some items to avoid start-task failures. You can see the details in our [document](https://docs.pingcap.com/zh/tidb-data-migration/stable/precheck#%E5%85%B3%E9%97%AD%E6%A3%80%E6%9F%A5%E9%A1%B9). In order to allow some users to use DM normally under certain circumstances, we can also set ignore-check-items in task config to ignore some items that you don’t want to check. Now, we find some shortcomings in regard to check-items. | ||||||
|
||||||
### Bad user habits | ||||||
|
||||||
We allow users to ignore all check-items, in which case the user's authority is too large to perform unexpected operations. | ||||||
okJiang marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
### Too much time overhead | ||||||
|
||||||
If we have a large number of tables in source, we will take too much time in checking table schema, sharding table consistency and sharding table auto increment key. | ||||||
okJiang marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
### Inadequate check | ||||||
|
||||||
* If downstream creates tables manually and the new downstream’s auto increment ID is not the same as the upstream, we shouldn’t check **auto_increment_ID** for errors. Users should be responsible for what they set. | ||||||
* Dump privilege only checks RELOAD and SELECT. However, Dumpling supports different [consistency configurations](https://docs.pingcap.com/zh/tidb/stable/dumpling-overview#%E8%B0%83%E6%95%B4-dumpling-%E7%9A%84%E6%95%B0%E6%8D%AE%E4%B8%80%E8%87%B4%E6%80%A7%E9%80%89%E9%A1%B9), which need more privilege. | ||||||
okJiang marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
* If online-ddl is set by true and a DDL is in online-ddl stage, DM will have a problem in all mode. Specifically, ghost table has been created, is executing the DDL, but is not renamed yet. In this case, DM will report an error when the ghost table is renamed after the dump phase. You can learn more about online-ddl [here](https://docs.pingcap.com/zh/tidb-data-migration/stable/feature-online-ddl). | ||||||
* For schema_of_shard_tables, whatever pessimistic task and optimistic task, we all check it by comparing all sharding tables’ structures for consistency simply. For optimistic mode, we can do better. | ||||||
okJiang marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
## Proposal | ||||||
|
||||||
### Restrict user usage | ||||||
1. Remove the following settings from the [document](https://docs.pingcap.com/zh/tidb-data-migration/stable/precheck#%E5%85%B3%E9%97%AD%E6%A3%80%E6%9F%A5%E9%A1%B9). If the following items are detected to be set in the configuration, a warning will be reported. | ||||||
okJiang marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
- all | ||||||
- dump_privilege | ||||||
- replication_privilege | ||||||
- server_id | ||||||
- binlog_enable | ||||||
- binlog_format | ||||||
- binlog_row_image | ||||||
2. If task is full/all mode, the following items will be forced to check (correspondingly, it will not be check in increment mode): | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what's the ignore rule for the items not listed in step 2 and 3 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The items not listed in step 2 and 3 will be checked if user don't set in It is same as now. Add related description in pingcap/ticdc@a003e1f. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. could you explain why these checking items can be disabled? my expectation is that user can disable nothing. if we can't make sure a checking we just let it raise warning. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. sgtm. L80-82 is our efforts. So, we should deprecate all There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agree with lance6716. Critical check items should not be allowed to be ignored, which would potentially lead to data loss or unnecessary oncall. Before ignore the check because there is a performance problem, if the performance problem is solved, I can not yet think of any cases where there is a reason to skip @okJiang |
||||||
- dump_privilege | ||||||
3. If task is increment/all mode, the following items will be forced to check (correspondingly, it will not be check in full mode): | ||||||
- replication_privilege | ||||||
- server_id | ||||||
- binlog_enable | ||||||
- binlog_format | ||||||
- binlog_row_image | ||||||
|
||||||
### Speed up check | ||||||
|
||||||
1. Support concurrent check | ||||||
- table_schema | ||||||
- schema_of_shard_tables | ||||||
- auto_increment_ID | ||||||
2. Use mydumper.threads as **source_connection_concurrency**, which should update in our document. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what if this is an incremental task? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes, this is a problem. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So, should we add a new config item? @sunzhaoyang There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think we should add a new config item for checks. What about There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the latter is better. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we adjust by table numbers, how many tables do we cover each connection is better? @lichunzhu |
||||||
|
||||||
#### How to speed up? | ||||||
|
||||||
Since every checker is concurrent, we can split tables to **source_connection_concurrency** part, and create a checker for every part. | ||||||
lance6716 marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
### Optimize some check | ||||||
|
||||||
1. If downstream creates tables manually and the new downstream’s auto increment ID is not the same as the upstream, we shouldn’t check **auto_increment_ID**. | ||||||
okJiang marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
2. Dump_privilege will check different privileges according to different [consistency](https://docs.pingcap.com/zh/tidb/stable/dumpling-overview#%E8%B0%83%E6%95%B4-dumpling-%E7%9A%84%E6%95%B0%E6%8D%AE%E4%B8%80%E8%87%B4%E6%80%A7%E9%80%89%E9%A1%B9) and downstream on source. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I remember there should also be SELECT privilege on some system tables 🤔 cc @lichunzhu There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For MySQL, we need SELECT privilege to |
||||||
- For all consistency, we will check | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This change will break the markdown format. |
||||||
- REPLICATION CLIENT (global) | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is used in incremental phase, I think dumpling don't need it. Also, all consistency options should be aligned with https://github.com/pingcap/ticdc/pull/3674#discussion_r762777211 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ptal @lichunzhu Why dumpling need REPLICATION CLIENT privilege? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Because There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. also There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. DM has a test of revoking REPLICATION CLIENT and start a full mode task. So I expect dumpling will not fail the dump without this privilege and we only require it for incremental task There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. After test, dump phase does not need REPLICATION CLIENT and REPLICATION SLAVE. I will remove them. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe we should check this privilege according to the task mode. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, we still have ReplicationPrivilegeChecker, which can check them in all/incremental task. |
||||||
- SELECT (only dump table) | ||||||
- For flush consistency: | ||||||
- RELOAD (global) | ||||||
- For flush/lock consistency: | ||||||
- LOCK TABLES (only dump table) | ||||||
okJiang marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
- For TiDB downstream: | ||||||
okJiang marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
- PROCESS (global) | ||||||
3. Add OnlineddlChecker to check if a ddl exists in online-ddl stage when DM task is all mode and online-ddl is true. It will be forced to check in all mode and not check in increment mode. | ||||||
okJiang marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
4. Enhance schema_of_shard_tables. | ||||||
- At first, if a machine exits the DM’s checkpoint, the DM’s subsequent task starts/resumes at the checkpoint. So we think the checkpoint guarantees consistency. | ||||||
okJiang marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
- If not exit checkpoint: | ||||||
lance6716 marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
- For all/full mode (pessimistic task): we keep the original check; | ||||||
- For all/full mode (optimistic task): we check whether the shard tables schema meets the definition of [Optimistic Schema Compatibility](20191209_optimistic_ddl.md). If that meets, we can create tables by the compatible schema in the dump stage. | ||||||
- For incremental mode: not check the sharding tables’ schema, because the table schema obtained from show create table is not the schema at the point of binlog. | ||||||
5. Make the fail state more gentle, which is from `StateFailure` to `StateWarning`. | ||||||
- checkAutoIncrementKey | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes they are the same thing. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we'd better use one name for it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You can think of Because |
||||||
- checkPK/UK | ||||||
|
||||||
### Remove checker from tidb-tools to DM | ||||||
|
||||||
After this change, checker is deeply coupled to DM, both with dump Privilege and optimistic pessimistic coordination. And checker is only used by DM (TiCDC and TiDB all don't use it). So removing checker from tidb-tools to DM is more convenient for development work。 | ||||||
okJiang marked this conversation as resolved.
Show resolved
Hide resolved
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we'd better change all the doc links to its English version since this doc is written in English.