-
Notifications
You must be signed in to change notification settings - Fork 286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC(dm): Add enhanced pre-check rfc #3674
Conversation
…p-master Conflicts: .github/workflows/dm_binlog_999999.yaml .github/workflows/dm_chaos.yaml .github/workflows/dm_upstream_switch.yaml .github/workflows/ticdc_chaos.yaml .github/workflows/ticdc_integration.yaml .github/workflows/upgrade_dm_via_tiup.yaml
fix conflict
[REVIEW NOTIFICATION] This pull request has been approved by:
To complete the pull request process, please ask the reviewers in the list to review by filling The full list of commands accepted by this bot can be found here. Reviewer can indicate their review by submitting an approval review. |
Codecov Report
Flags with carried forward coverage won't be shown. Click here to find out more. @@ Coverage Diff @@
## master #3674 +/- ##
================================================
- Coverage 57.0741% 55.1687% -1.9055%
================================================
Files 478 486 +8
Lines 56551 59947 +3396
================================================
+ Hits 32276 33072 +796
- Misses 20978 23536 +2558
- Partials 3297 3339 +42 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe you should format this markdown file with format-tools😉
Co-authored-by: Ehco <zh19960202@gmail.com>
Co-authored-by: Ehco <zh19960202@gmail.com>
- table_schema | ||
- schema_of_shard_tables | ||
- auto_increment_ID | ||
2. Use mydumper.threads as **source_connection_concurrency**, which should update in our document. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if this is an incremental task? mydumper.threads
is not relevant to incremental.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, this is a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, should we add a new config item? @sunzhaoyang
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should add a new config item for checks. What about syncers.worker-count
or adjust by table numbers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the latter is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we adjust by table numbers, how many tables do we cover each connection is better? @lichunzhu
- binlog_enable | ||
- binlog_format | ||
- binlog_row_image | ||
2. If task is full/all mode, the following items will be forced to check (correspondingly, it will not be check in increment mode): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the ignore rule for the items not listed in step 2 and 3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The items not listed in step 2 and 3 will be checked if user don't set in ignore-check-items
. If user set it, dm will ignore them.
It is same as now.
Add related description in pingcap/ticdc@a003e1f.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you explain why these checking items can be disabled? my expectation is that user can disable nothing. if we can't make sure a checking we just let it raise warning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sgtm. L80-82 is our efforts.
So, we should deprecate all ignore_check_items
. cc @sunzhaoyang
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with lance6716. Critical check items should not be allowed to be ignored, which would potentially lead to data loss or unnecessary oncall.
Before ignore the check because there is a performance problem, if the performance problem is solved, I can not yet think of any cases where there is a reason to skip @okJiang
### Optimize some check | ||
|
||
1. If downstream creates tables manually and the new downstream’s auto increment ID is not the same as the upstream, we shouldn’t check **auto_increment_ID**. | ||
2. Dump_privilege will check different privileges according to different [consistency](https://docs.pingcap.com/zh/tidb/stable/dumpling-overview#%E8%B0%83%E6%95%B4-dumpling-%E7%9A%84%E6%95%B0%E6%8D%AE%E4%B8%80%E8%87%B4%E6%80%A7%E9%80%89%E9%A1%B9) and downstream on source. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember there should also be SELECT privilege on some system tables 🤔
cc @lichunzhu
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For MySQL, we need SELECT privilege to INFORMATION_SCHEMA.TABLES
For TiDB, we need SELECT privilege to INFORMATION_SCHEMA.TIKV_REGION_STATUS
, INFORMATION_SCHEMA.PLACEMENT_RULES
, INFORMATION_SCHEMA.CLUSTER_INFO
, INFORMATION_SCHEMA.TIDB_SERVERS_INFO
, INFORMATION_SCHEMA.PARTITIONS
.
- For all/full mode (optimistic task): we check whether the shard tables schema meets the definition of [Optimistic Schema Compatibility](20191209_optimistic_ddl.md). If that meets, we can create tables by the compatible schema in the dump stage. | ||
- For incremental mode: not check the sharding tables’ schema, because the table schema obtained from show create table is not the schema at the point of binlog. | ||
5. Make the fail state more gentle, which is from `StateFailure` to `StateWarning`. | ||
- checkAutoIncrementKey |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
checkAutoIncrementKey
is duplicated with auto_increment_ID?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes they are the same thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we'd better use one name for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can think of checkAutoIncrementKey
and checkPK/UK
as a description with a smaller granularity than ignore_check_items
's auto_increment_ID.
Because checkPK/UK
is only one item in table_schema
checker.
Co-authored-by: lance6716 <lance6716@gmail.com>
Co-authored-by: lance6716 <lance6716@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rest lgtm
- table_schema | ||
- schema_of_shard_tables | ||
- auto_increment_ID | ||
2. We can adjust the concurrency by table numbers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also can we batch the request by multi-statement or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just run mutil-thread to execute show create table
. Can't batch the request. The draft
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we use INFORMATION_SCHEMA to get the table structure? @lichunzhu PTAL.
when the number of table is large and we have a high latency network
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the test result from https://github.com/pingcap/tiflow/pull/3975/files#diff-d6ce4d888f567f277d6f1304165d1b3cca440f6563ecb742d7c3a80f7d411b8f
4 threads 10000 tables
test mechine | added delay(simulate network latency) | spend time |
---|---|---|
local | 0 ms | 3.9 s |
125 | 0 ms | 8.5 s |
125 | 10 ms | 31.7 s |
106 | 0 ms | 11.2 s |
106 | 10 ms | 29.6 s |
106 | 20 ms | 54.5 s |
106 | 50 ms | 128.5 s |
106 | 100 ms | 253 s |
I intend to keep the delay under a minute(@sunzhaoyang 's thought).
In other words, if number of table is greater than 5000 and less than 10000, we use 16 threads. If number of table is greater than 10000 and less than 20000, we use 32 threads. The spend time is almost under a minute.
If number is greater than 20000, maybe user can afford the spend time(Will grow linear growth). Or we can use more threads? 64, 128...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when we incr check thread, do we also adjust mysql_max_connections to 16/32 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
beacause there is limited underlying connection count behiend sql.DB
more thread count than this number(underlying connection count) can not speed up check process
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if we adjust mysql_max_connection to 16/32, we seem can't guarantee no other connection with this MySQL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, so my point is maybe we should just use mysql.MaxConnectionCount
as concurrent count when need check table num lagre than mysql.MaxConnectionCount
Co-authored-by: lance6716 <lance6716@gmail.com>
…-rfc-pre-check merge
- For flush consistency: | ||
- RELOAD (global) | ||
- For flush/lock consistency: | ||
- LOCK TABLES (only dump tables) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After test, found that flush tables with read lock
does not need this privilege. We juse used flush tables with read lock
.😭
remove it.
just remove it in flush consistency. In lock consistency, used lock tables
yet.
lgtm, wait other reviewers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rest lgtm
- binlog_enable | ||
- binlog_format | ||
- binlog_row_image | ||
- online_ddl(new added) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need check Binlog_Do_DB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add it in 2ea1a63
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
please help me merge this pr. Thanks~ @lance6716 |
/merge |
This pull request has been accepted and is ready to merge. Commit hash: 0b6d70d
|
/run-verify |
What problem does this PR solve?
close https://github.com/pingcap/ticdc/issues/3703
What is changed and how it works?
As title.
Check List
Tests
Release note