-
Notifications
You must be signed in to change notification settings - Fork 286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC(dm): Add enhanced pre-check rfc #3674
Changes from 37 commits
f657869
ea863f9
ceec7f3
0640fc6
ce51331
549da3f
ad3c49f
d3b42e5
f5b3240
e6bce30
a003e1f
ebbaee8
4c32dd1
ce60241
129659f
ef05cfd
c43fe21
bd824ef
eabe327
9444c89
66d1ebd
9844a03
89267d9
ee361ab
9b1456f
12c9fed
f90c8ac
320b009
cc13850
f7992f7
1f83940
a8dc9da
93ae002
aff6168
7c2f253
d5f8d75
2ea1a63
0b6d70d
bfddc6a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,87 @@ | ||||||
# Enhanced pre-check Design | ||||||
|
||||||
## Background | ||||||
|
||||||
Before the DM’s task starts, we should check some items to avoid start-task failures. You can see the details in our [document](https://docs.pingcap.com/tidb-data-migration/stable/precheck#disable-checking-items). In order to allow some users to use DM normally under certain circumstances, we can also set ignore-check-items in task config to ignore some items that you don’t want to check. Now, we find some shortcomings in regard to check-items. | ||||||
|
||||||
### Bad user habits | ||||||
|
||||||
We allow users to ignore all check-items, in which case the user's privilege is too large to perform unexpected operations. | ||||||
|
||||||
### Too much time overhead | ||||||
|
||||||
If we have a large number of tables in source, we will take too much time in checking table schema, sharding table consistency and sharding table auto increment ID. | ||||||
|
||||||
### Inadequate check | ||||||
|
||||||
* Now we check auto increment ID by `column-mapping` which is deprecated. If we don't set the `column-mapping` and don't ignore the **auto_increment_ID**, the pre-check will report an error. PS: **auto_increment_ID** is only checked when **schema_of_shard_tables** is true. | ||||||
* Dump privilege only checks RELOAD and SELECT. However, Dumpling supports different [consistency configurations](https://docs.pingcap.com/tidb/stable/dumpling-overview#adjust-dumplings-data-consistency-options), which need more privileges. | ||||||
* If online-ddl is set by true and a DDL is in online-ddl stage, DM will have a problem in all mode. Specifically, when ghost table has been created but not been renamed, DM will report an error when the ghost table is renamed during the incremental phase. You can learn more about online-ddl [here](https://docs.pingcap.com/tidb-data-migration/stable/feature-online-ddl). | ||||||
* For schema_of_shard_tables, whether it's a pessimistic task or an optimistic task, we all simply check consistency of schema by comparing all sharding tables’ structures. For optimistic mode, we can do better. | ||||||
* Version checker checks that MySQLVersion >= 5.6.0 and MariadbVersion >= 10.1.2 before. However, we find more and more incompatibility problems in MySQLVersion >= 8.0.0 and Mariadb. In view of supporting MySQL 8.0 and Mariadb is experimental yet, checker will report a warning for them. | ||||||
|
||||||
## Proposal | ||||||
|
||||||
### Speed up check | ||||||
|
||||||
1. Support concurrent check | ||||||
- table_schema | ||||||
- schema_of_shard_tables | ||||||
- auto_increment_ID | ||||||
2. We can adjust the concurrency by table numbers. | ||||||
|
||||||
### Optimize some check | ||||||
|
||||||
1. Auto_increment_ID only be checked in sharding mode. | ||||||
- If table exists auto increment ID, report a warning to user and tell them the method that can resolve the PK/UK conflict; | ||||||
1. If you set PK to AUTO INCREMENT, you must make sure that the primary key in sharding tables is not duplicated; | ||||||
2. If sharding tables have duplicated PK, you can refer to [document](https://docs.pingcap.com/tidb-data-migration/stable/shard-merge-best-practices#handle-conflicts-of-auto-increment-primary-key). | ||||||
- And if they have finished the resolving, such as manually created the table and removed AUTO INCREMENT, we should not report the warning. | ||||||
2. Dump_privilege will check different privileges according to different [consistency](https://docs.pingcap.com/tidb/stable/dumpling-overview#adjust-dumplings-data-consistency-options) on source. | ||||||
- For all consistency, we will check | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This change will break the markdown format. |
||||||
- SELECT (only INFORMATION_SCHEMA's tables and dump tables) | ||||||
- For flush consistency: | ||||||
- RELOAD (global) | ||||||
- For lock consistency: | ||||||
- LOCK TABLES (only dump tables) | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. After test, found that
just remove it in flush consistency. In lock consistency, used |
||||||
3. Add OnlineDDLChecker to check if a DDL of tables in allow list exists in online-ddl stage when DM task is all mode and online-ddl is true. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. naming styles are inconsistent, schema_of_shard_tables vs OnlineDDLChecker There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. schema_of_shard_tables is a name left over from history because of But we will deprecate OnlineDDLChecker is its It is true that there is no uniformity in this rfc. After this feature completed, there is no such problem. All |
||||||
4. Enhance schema_of_shard_tables. | ||||||
- If a task has passed the pre-checking when starting and exited, DM should keep the consistency during the task running. So we **don't check it** when restart the task. | ||||||
- If there does not exist checkpoints: | ||||||
- For all/full mode (pessimistic task): we keep **the original check**; | ||||||
- For all/full mode (optimistic task): we check whether the shard tables schema meets **the definition of [Optimistic Schema Compatibility](20191209_optimistic_ddl.md)**. If that meets, we can create tables by the compatible schema in the dump stage. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @GMHDBJD said, we needn't help user to create tables. Because we can't guarantee eventual consistency. For example, sourceA has a table(column a, b), sourceB has a table(column a, c), we don't know the generated table is (a, b, c) or (a, c, d). If the column order is incorrect, our sync will be wrong. @lance6716 @glorv @lichunzhu There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. do you mean "(a, b, c) or (a, c, b)"? columns can be unordered I think |
||||||
- For incremental mode: **not check** the sharding tables’ schema, because the table schema obtained from show create table is not the schema at the point of binlog. | ||||||
5. Version checker will report a warning in the following cases: | ||||||
- MySQL < 5.6.0 | ||||||
- MySQL >= 8.0.0 | ||||||
- Mariadb | ||||||
|
||||||
### Restrict user usage | ||||||
1. Remove all `ignore_check_items` settings from the [document](https://docs.pingcap.com/tidb-data-migration/stable/precheck#disable-checking-items). If the following items are detected to be set in the configuration, a warning will be reported. | ||||||
2. If task is full/all mode, the following items will be forced to check (correspondingly, it will not be check in increment mode): | ||||||
- dump_privilege | ||||||
- schema_of_shard_tables(only for sharding mode) | ||||||
3. If task is increment/all mode, the following items will be forced to check (correspondingly, it will not be check in full mode): | ||||||
- replication_privilege | ||||||
- server_id | ||||||
- binlog_enable | ||||||
- binlog_format | ||||||
- binlog_row_image | ||||||
- online_ddl(new added) | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. need check There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. add it in 2ea1a63 |
||||||
- binlog_do_db(new added) | ||||||
4. The following items will always be forced to check: | ||||||
- version | ||||||
- table_schema | ||||||
- auto_increment_ID(only for sharding mode) | ||||||
5. Make the fail state more gentle, which is from `StateFailure` to `StateWarning`. | ||||||
- VersionChecker(same as version) | ||||||
- AutoIncrementKeyChecker(same as auto_increment_ID) | ||||||
- PK/UKChecker | ||||||
|
||||||
### Move checker from [tidb-tools](https://github.com/pingcap/tidb-tools/tree/master/pkg/check) to DM | ||||||
|
||||||
After this change, checker is deeply coupled to DM, both with dump privilege checking and optimistic pessimistic coordination. And checker is only used by DM (TiCDC and TiDB all don’t use it). So removing checkers from tidb-tools to DM is more convenient for development work。 | ||||||
|
||||||
In detail, we do not take the initiative to submit pr to the tidb-tools repository. Instead, we will replace the checker in tidb-tools step by step during the development of this feature. | ||||||
|
||||||
So at last we will have two checker components in DM and tidb-tools. But DM will completely get rid of tidb-tools checker's dependence or wrap our own checker layer on top of it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also can we batch the request by multi-statement or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just run mutil-thread to execute
show create table
. Can't batch the request. The draftThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we use INFORMATION_SCHEMA to get the table structure? @lichunzhu PTAL.
when the number of table is large and we have a high latency network
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the test result from https://github.com/pingcap/tiflow/pull/3975/files#diff-d6ce4d888f567f277d6f1304165d1b3cca440f6563ecb742d7c3a80f7d411b8f
4 threads 10000 tables
I intend to keep the delay under a minute(@sunzhaoyang 's thought).
In other words, if number of table is greater than 5000 and less than 10000, we use 16 threads. If number of table is greater than 10000 and less than 20000, we use 32 threads. The spend time is almost under a minute.
If number is greater than 20000, maybe user can afford the spend time(Will grow linear growth). Or we can use more threads? 64, 128...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when we incr check thread, do we also adjust mysql_max_connections to 16/32 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
beacause there is limited underlying connection count behiend
sql.DB
more thread count than this number(underlying connection count) can not speed up check processThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if we adjust mysql_max_connection to 16/32, we seem can't guarantee no other connection with this MySQL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, so my point is maybe we should just use
mysql.MaxConnectionCount
as concurrent count when need check table num lagre thanmysql.MaxConnectionCount