Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC(dm): Add enhanced pre-check rfc #3674

Merged
merged 39 commits into from
Dec 23, 2021
Merged
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
f657869
split actions paths
okJiang Oct 29, 2021
ea863f9
Merge branch 'master' of https://github.com/pingcap/ticdc into pingca…
okJiang Nov 22, 2021
ceec7f3
Merge branch 'pingcap-master'
okJiang Nov 22, 2021
0640fc6
Merge branch 'pingcap:master' into master
okJiang Nov 30, 2021
ce51331
add rfc
okJiang Nov 30, 2021
549da3f
Update dm/docs/RFCS/20211130_enhanced_pre_checker.md
okJiang Dec 1, 2021
ad3c49f
Update dm/docs/RFCS/20211130_enhanced_pre_checker.md
okJiang Dec 1, 2021
d3b42e5
format markdown
okJiang Dec 1, 2021
f5b3240
Update dm/docs/RFCS/20211130_enhanced_pre_checker.md
okJiang Dec 2, 2021
e6bce30
Update dm/docs/RFCS/20211130_enhanced_pre_checker.md
okJiang Dec 2, 2021
a003e1f
address comment
okJiang Dec 2, 2021
ebbaee8
add some description
okJiang Dec 3, 2021
4c32dd1
Update dm/docs/RFCS/20211130_enhanced_pre_checker.md
okJiang Dec 6, 2021
ce60241
update
okJiang Dec 6, 2021
129659f
Merge branch 'add-rfc-pre-check' of github.com:okJiang/ticdc into add…
okJiang Dec 6, 2021
ef05cfd
Apply suggestions from code review
okJiang Dec 6, 2021
c43fe21
Merge branch 'master' into add-rfc-pre-check
glorv Dec 6, 2021
bd824ef
Update dm/docs/RFCS/20211130_enhanced_pre_checker.md
okJiang Dec 7, 2021
eabe327
address comment
okJiang Dec 7, 2021
9444c89
Merge branch 'add-rfc-pre-check' of github.com:okJiang/ticdc into add…
okJiang Dec 7, 2021
66d1ebd
update auto increment id
okJiang Dec 7, 2021
9844a03
add omissive version check
okJiang Dec 8, 2021
89267d9
Apply suggestions from code review
okJiang Dec 10, 2021
ee361ab
address comment
okJiang Dec 10, 2021
9b1456f
Merge branch 'master' into add-rfc-pre-check
okJiang Dec 10, 2021
12c9fed
address comment
okJiang Dec 14, 2021
f90c8ac
adjust version checker
okJiang Dec 15, 2021
320b009
Apply suggestions from code review
okJiang Dec 21, 2021
cc13850
address comment
okJiang Dec 21, 2021
f7992f7
address comment
okJiang Dec 21, 2021
1f83940
address comment
okJiang Dec 22, 2021
a8dc9da
Update dm/docs/RFCS/20211130_enhanced_pre_checker.md
okJiang Dec 22, 2021
93ae002
update
okJiang Dec 22, 2021
aff6168
Merge branch 'add-rfc-pre-check' of github.com:okJiang/ticdc into add…
okJiang Dec 22, 2021
7c2f253
Merge branch 'master' into add-rfc-pre-check
okJiang Dec 22, 2021
d5f8d75
remove LOCK TABLES from flush consistency
okJiang Dec 22, 2021
2ea1a63
add Binlog_Do_DB
okJiang Dec 23, 2021
0b6d70d
Merge branch 'master' into add-rfc-pre-check
okJiang Dec 23, 2021
bfddc6a
Merge branch 'master' into add-rfc-pre-check
ti-chi-bot Dec 23, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
87 changes: 87 additions & 0 deletions dm/docs/RFCS/20211130_enhanced_pre_checker.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
# Enhanced pre-check Design

## Background

Before the DM’s task starts, we should check some items to avoid start-task failures. You can see the details in our [document](https://docs.pingcap.com/tidb-data-migration/stable/precheck#disable-checking-items). In order to allow some users to use DM normally under certain circumstances, we can also set ignore-check-items in task config to ignore some items that you don’t want to check. Now, we find some shortcomings in regard to check-items.

### Bad user habits

We allow users to ignore all check-items, in which case the user's privilege is too large to perform unexpected operations.

### Too much time overhead

If we have a large number of tables in source, we will take too much time in checking table schema, sharding table consistency and sharding table auto increment ID.

### Inadequate check

* Now we check auto increment ID by `column-mapping` which is deprecated. If we don't set the `column-mapping` and don't ignore the **auto_increment_ID**, the pre-check will report an error. PS: **auto_increment_ID** is only checked when **schema_of_shard_tables** is true.
* Dump privilege only checks RELOAD and SELECT. However, Dumpling supports different [consistency configurations](https://docs.pingcap.com/tidb/stable/dumpling-overview#adjust-dumplings-data-consistency-options), which need more privileges.
* If online-ddl is set by true and a DDL is in online-ddl stage, DM will have a problem in all mode. Specifically, when ghost table has been created but not been renamed, DM will report an error when the ghost table is renamed during the incremental phase. You can learn more about online-ddl [here](https://docs.pingcap.com/tidb-data-migration/stable/feature-online-ddl).
* For schema_of_shard_tables, whether it's a pessimistic task or an optimistic task, we all simply check consistency of schema by comparing all sharding tables’ structures. For optimistic mode, we can do better.
* Version checker checks that MySQLVersion >= 5.6.0 and MariadbVersion >= 10.1.2 before. However, we find more and more incompatibility problems in MySQLVersion >= 8.0.0 and Mariadb. In view of supporting MySQL 8.0 and Mariadb is experimental yet, checker will report a warning for them.

## Proposal

### Speed ​​up check

1. Support concurrent check
- table_schema
- schema_of_shard_tables
- auto_increment_ID
2. We can adjust the concurrency by table numbers.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also can we batch the request by multi-statement or something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just run mutil-thread to execute show create table. Can't batch the request. The draft

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use INFORMATION_SCHEMA to get the table structure? @lichunzhu PTAL.

when the number of table is large and we have a high latency network

Copy link
Member Author

@okJiang okJiang Dec 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the test result from https://github.com/pingcap/tiflow/pull/3975/files#diff-d6ce4d888f567f277d6f1304165d1b3cca440f6563ecb742d7c3a80f7d411b8f


4 threads 10000 tables

test mechine added delay(simulate network latency) spend time
local 0 ms 3.9 s
125 0 ms 8.5 s
125 10 ms 31.7 s
106 0 ms 11.2 s
106 10 ms 29.6 s
106 20 ms 54.5 s
106 50 ms 128.5 s
106 100 ms 253 s

I intend to keep the delay under a minute(@sunzhaoyang 's thought).

In other words, if number of table is greater than 5000 and less than 10000, we use 16 threads. If number of table is greater than 10000 and less than 20000, we use 32 threads. The spend time is almost under a minute.

If number is greater than 20000, maybe user can afford the spend time(Will grow linear growth). Or we can use more threads? 64, 128...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when we incr check thread, do we also adjust mysql_max_connections to 16/32 ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

beacause there is limited underlying connection count behiend sql.DB more thread count than this number(underlying connection count) can not speed up check process

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if we adjust mysql_max_connection to 16/32, we seem can't guarantee no other connection with this MySQL.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, so my point is maybe we should just use mysql.MaxConnectionCount as concurrent count when need check table num lagre than mysql.MaxConnectionCount


### Optimize some check

1. Auto_increment_ID only be checked in sharding mode.
- If table exists auto increment ID, report a warning to user and tell them the method that can resolve the PK/UK conflict;
1. If you set PK to AUTO INCREMENT, you must make sure that the primary key in sharding tables is not duplicated;
2. If sharding tables have duplicated PK, you can refer to [document](https://docs.pingcap.com/tidb-data-migration/stable/shard-merge-best-practices#handle-conflicts-of-auto-increment-primary-key).
- And if they have finished the resolving, such as manually created the table and removed AUTO INCREMENT, we should not report the warning.
2. Dump_privilege will check different privileges according to different [consistency](https://docs.pingcap.com/tidb/stable/dumpling-overview#adjust-dumplings-data-consistency-options) on source.
- For all consistency, we will check
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- For all consistency, we will check
For all consistency, we will check

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change will break the markdown format.

- SELECT (only INFORMATION_SCHEMA's tables and dump tables)
- For flush consistency:
- RELOAD (global)
- For lock consistency:
- LOCK TABLES (only dump tables)
Copy link
Member Author

@okJiang okJiang Dec 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After test, found that flush tables with read lock does not need this privilege. We juse used flush tables with read lock.😭

remove it.

just remove it in flush consistency. In lock consistency, used lock tables yet.

3. Add OnlineDDLChecker to check if a DDL of tables in allow list exists in online-ddl stage when DM task is all mode and online-ddl is true.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

naming styles are inconsistent, schema_of_shard_tables vs OnlineDDLChecker

Copy link
Member Author

@okJiang okJiang Dec 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

schema_of_shard_tables is a name left over from history because of ignore_checking_items.

But we will deprecate ignore_checking_items. So I don't have to name OnlineDDLChecker for ignore_checking_items.

OnlineDDLChecker is its Checker name.

It is true that there is no uniformity in this rfc. After this feature completed, there is no such problem. All ignore_checking_items will be instead of its Checker name.(will indicate its correspondence with the previous)

4. Enhance schema_of_shard_tables.
- If a task has passed the pre-checking when starting and exited, DM should keep the consistency during the task running. So we **don't check it** when restart the task.
- If there does not exist checkpoints:
- For all/full mode (pessimistic task): we keep **the original check**;
- For all/full mode (optimistic task): we check whether the shard tables schema meets **the definition of [Optimistic Schema Compatibility](20191209_optimistic_ddl.md)**. If that meets, we can create tables by the compatible schema in the dump stage.
Copy link
Member Author

@okJiang okJiang Dec 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GMHDBJD said, we needn't help user to create tables. Because we can't guarantee eventual consistency. For example, sourceA has a table(column a, b), sourceB has a table(column a, c), we don't know the generated table is (a, b, c) or (a, c, d). If the column order is incorrect, our sync will be wrong. @lance6716 @glorv @lichunzhu

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mean "(a, b, c) or (a, c, b)"?

columns can be unordered I think

- For incremental mode: **not check** the sharding tables’ schema, because the table schema obtained from show create table is not the schema at the point of binlog.
5. Version checker will report a warning in the following cases:
- MySQL < 5.6.0
- MySQL >= 8.0.0
- Mariadb

### Restrict user usage
1. Remove all `ignore_check_items` settings from the [document](https://docs.pingcap.com/tidb-data-migration/stable/precheck#disable-checking-items). If the following items are detected to be set in the configuration, a warning will be reported.
2. If task is full/all mode, the following items will be forced to check (correspondingly, it will not be check in increment mode):
- dump_privilege
- schema_of_shard_tables(only for sharding mode)
3. If task is increment/all mode, the following items will be forced to check (correspondingly, it will not be check in full mode):
- replication_privilege
- server_id
- binlog_enable
- binlog_format
- binlog_row_image
- online_ddl(new added)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need check Binlog_Do_DB

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add it in 2ea1a63

- binlog_do_db(new added)
4. The following items will always be forced to check:
- version
- table_schema
- auto_increment_ID(only for sharding mode)
5. Make the fail state more gentle, which is from `StateFailure` to `StateWarning`.
- VersionChecker(same as version)
- AutoIncrementKeyChecker(same as auto_increment_ID)
- PK/UKChecker

### Move checker from [tidb-tools](https://github.com/pingcap/tidb-tools/tree/master/pkg/check) to DM

After this change, checker is deeply coupled to DM, both with dump privilege checking and optimistic pessimistic coordination. And checker is only used by DM (TiCDC and TiDB all don’t use it). So removing checkers from tidb-tools to DM is more convenient for development work。

In detail, we do not take the initiative to submit pr to the tidb-tools repository. Instead, we will replace the checker in tidb-tools step by step during the development of this feature.

So at last we will have two checker components in DM and tidb-tools. But DM will completely get rid of tidb-tools checker's ​​dependence or wrap our own checker layer on top of it.