Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC(dm): Add enhanced pre-check rfc #3674

Merged
merged 39 commits into from
Dec 23, 2021

Conversation

okJiang
Copy link
Member

@okJiang okJiang commented Nov 30, 2021

What problem does this PR solve?

close https://github.com/pingcap/ticdc/issues/3703

What is changed and how it works?

As title.

Check List

Tests

  • No code

Release note

`None`.

…p-master

Conflicts:
	.github/workflows/dm_binlog_999999.yaml
	.github/workflows/dm_chaos.yaml
	.github/workflows/dm_upstream_switch.yaml
	.github/workflows/ticdc_chaos.yaml
	.github/workflows/ticdc_integration.yaml
	.github/workflows/upgrade_dm_via_tiup.yaml
@ti-chi-bot
Copy link
Member

ti-chi-bot commented Nov 30, 2021

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • Ehco1996
  • lance6716

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot added release-note-none Denotes a PR that doesn't merit a release note. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Nov 30, 2021
@okJiang
Copy link
Member Author

okJiang commented Nov 30, 2021

@codecov-commenter
Copy link

codecov-commenter commented Nov 30, 2021

Codecov Report

Merging #3674 (0b6d70d) into master (3873d39) will decrease coverage by 1.9054%.
The diff coverage is 71.1202%.

❗ Current head 0b6d70d differs from pull request most recent head bfddc6a. Consider uploading reports for the commit bfddc6a to get more accurate results

Flag Coverage Δ
cdc 58.6117% <71.1202%> (+0.3751%) ⬆️
dm 52.2614% <ø> (-3.7732%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

@@               Coverage Diff                @@
##             master      #3674        +/-   ##
================================================
- Coverage   57.0741%   55.1687%   -1.9055%     
================================================
  Files           478        486         +8     
  Lines         56551      59947      +3396     
================================================
+ Hits          32276      33072       +796     
- Misses        20978      23536      +2558     
- Partials       3297       3339        +42     

@Ehco1996 Ehco1996 added the area/dm Issues or PRs related to DM. label Dec 1, 2021
Copy link
Contributor

@Ehco1996 Ehco1996 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe you should format this markdown file with format-tools😉

dm/docs/RFCS/20211130_enhanced_pre_checker.md Outdated Show resolved Hide resolved
dm/docs/RFCS/20211130_enhanced_pre_checker.md Outdated Show resolved Hide resolved
okJiang and others added 3 commits December 1, 2021 12:03
Co-authored-by: Ehco <zh19960202@gmail.com>
Co-authored-by: Ehco <zh19960202@gmail.com>
- table_schema
- schema_of_shard_tables
- auto_increment_ID
2. Use mydumper.threads as **source_connection_concurrency**, which should update in our document.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if this is an incremental task? mydumper.threads is not relevant to incremental.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this is a problem.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, should we add a new config item? @sunzhaoyang

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should add a new config item for checks. What about syncers.worker-count or adjust by table numbers?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the latter is better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we adjust by table numbers, how many tables do we cover each connection is better? @lichunzhu

- binlog_enable
- binlog_format
- binlog_row_image
2. If task is full/all mode, the following items will be forced to check (correspondingly, it will not be check in increment mode):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the ignore rule for the items not listed in step 2 and 3

Copy link
Member Author

@okJiang okJiang Dec 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The items not listed in step 2 and 3 will be checked if user don't set in ignore-check-items. If user set it, dm will ignore them.

It is same as now.

Add related description in pingcap/ticdc@a003e1f.

Copy link
Contributor

@lance6716 lance6716 Dec 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you explain why these checking items can be disabled? my expectation is that user can disable nothing. if we can't make sure a checking we just let it raise warning.

Copy link
Member Author

@okJiang okJiang Dec 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sgtm. L80-82 is our efforts.

So, we should deprecate all ignore_check_items. cc @sunzhaoyang

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with lance6716. Critical check items should not be allowed to be ignored, which would potentially lead to data loss or unnecessary oncall.

Before ignore the check because there is a performance problem, if the performance problem is solved, I can not yet think of any cases where there is a reason to skip @okJiang

dm/docs/RFCS/20211130_enhanced_pre_checker.md Outdated Show resolved Hide resolved
### Optimize some check

1. If downstream creates tables manually and the new downstream’s auto increment ID is not the same as the upstream, we shouldn’t check **auto_increment_ID**.
2. Dump_privilege will check different privileges according to different [consistency](https://docs.pingcap.com/zh/tidb/stable/dumpling-overview#%E8%B0%83%E6%95%B4-dumpling-%E7%9A%84%E6%95%B0%E6%8D%AE%E4%B8%80%E8%87%B4%E6%80%A7%E9%80%89%E9%A1%B9) and downstream on source.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember there should also be SELECT privilege on some system tables 🤔

cc @lichunzhu

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For MySQL, we need SELECT privilege to INFORMATION_SCHEMA.TABLES
For TiDB, we need SELECT privilege to INFORMATION_SCHEMA.TIKV_REGION_STATUS, INFORMATION_SCHEMA.PLACEMENT_RULES, INFORMATION_SCHEMA.CLUSTER_INFO, INFORMATION_SCHEMA.TIDB_SERVERS_INFO, INFORMATION_SCHEMA.PARTITIONS.

dm/docs/RFCS/20211130_enhanced_pre_checker.md Outdated Show resolved Hide resolved
dm/docs/RFCS/20211130_enhanced_pre_checker.md Outdated Show resolved Hide resolved
- For all/full mode (optimistic task): we check whether the shard tables schema meets the definition of [Optimistic Schema Compatibility](20191209_optimistic_ddl.md). If that meets, we can create tables by the compatible schema in the dump stage.
- For incremental mode: not check the sharding tables’ schema, because the table schema obtained from show create table is not the schema at the point of binlog.
5. Make the fail state more gentle, which is from `StateFailure` to `StateWarning`.
- checkAutoIncrementKey
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checkAutoIncrementKey is duplicated with auto_increment_ID?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes they are the same thing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we'd better use one name for it.

Copy link
Member Author

@okJiang okJiang Dec 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can think of checkAutoIncrementKey and checkPK/UK as a description with a smaller granularity than ignore_check_items's auto_increment_ID.

Because checkPK/UK is only one item in table_schema checker.

dm/docs/RFCS/20211130_enhanced_pre_checker.md Outdated Show resolved Hide resolved
okJiang and others added 3 commits December 2, 2021 20:55
Co-authored-by: lance6716 <lance6716@gmail.com>
Co-authored-by: lance6716 <lance6716@gmail.com>
@ti-chi-bot ti-chi-bot added the status/LGT1 Indicates that a PR has LGTM 1. label Dec 3, 2021
Copy link
Contributor

@lance6716 lance6716 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rest lgtm

dm/docs/RFCS/20211130_enhanced_pre_checker.md Outdated Show resolved Hide resolved
dm/docs/RFCS/20211130_enhanced_pre_checker.md Outdated Show resolved Hide resolved
- table_schema
- schema_of_shard_tables
- auto_increment_ID
2. We can adjust the concurrency by table numbers.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also can we batch the request by multi-statement or something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just run mutil-thread to execute show create table. Can't batch the request. The draft

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use INFORMATION_SCHEMA to get the table structure? @lichunzhu PTAL.

when the number of table is large and we have a high latency network

Copy link
Member Author

@okJiang okJiang Dec 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the test result from https://github.com/pingcap/tiflow/pull/3975/files#diff-d6ce4d888f567f277d6f1304165d1b3cca440f6563ecb742d7c3a80f7d411b8f


4 threads 10000 tables

test mechine added delay(simulate network latency) spend time
local 0 ms 3.9 s
125 0 ms 8.5 s
125 10 ms 31.7 s
106 0 ms 11.2 s
106 10 ms 29.6 s
106 20 ms 54.5 s
106 50 ms 128.5 s
106 100 ms 253 s

I intend to keep the delay under a minute(@sunzhaoyang 's thought).

In other words, if number of table is greater than 5000 and less than 10000, we use 16 threads. If number of table is greater than 10000 and less than 20000, we use 32 threads. The spend time is almost under a minute.

If number is greater than 20000, maybe user can afford the spend time(Will grow linear growth). Or we can use more threads? 64, 128...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when we incr check thread, do we also adjust mysql_max_connections to 16/32 ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

beacause there is limited underlying connection count behiend sql.DB more thread count than this number(underlying connection count) can not speed up check process

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if we adjust mysql_max_connection to 16/32, we seem can't guarantee no other connection with this MySQL.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, so my point is maybe we should just use mysql.MaxConnectionCount as concurrent count when need check table num lagre than mysql.MaxConnectionCount

dm/docs/RFCS/20211130_enhanced_pre_checker.md Outdated Show resolved Hide resolved
- For flush consistency:
- RELOAD (global)
- For flush/lock consistency:
- LOCK TABLES (only dump tables)
Copy link
Member Author

@okJiang okJiang Dec 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After test, found that flush tables with read lock does not need this privilege. We juse used flush tables with read lock.😭

remove it.

just remove it in flush consistency. In lock consistency, used lock tables yet.

@lance6716
Copy link
Contributor

lgtm, wait other reviewers

Copy link
Contributor

@Ehco1996 Ehco1996 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rest lgtm

- binlog_enable
- binlog_format
- binlog_row_image
- online_ddl(new added)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need check Binlog_Do_DB

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add it in 2ea1a63

@ti-chi-bot ti-chi-bot removed the status/LGT1 Indicates that a PR has LGTM 1. label Dec 23, 2021
Copy link
Contributor

@glorv glorv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-chi-bot ti-chi-bot added the status/LGT1 Indicates that a PR has LGTM 1. label Dec 23, 2021
@ti-chi-bot ti-chi-bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Dec 23, 2021
@okJiang
Copy link
Member Author

okJiang commented Dec 23, 2021

please help me merge this pr. Thanks~ @lance6716

@glorv
Copy link
Contributor

glorv commented Dec 23, 2021

/merge

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

Commit hash: 0b6d70d

@ti-chi-bot ti-chi-bot added the status/can-merge Indicates a PR has been approved by a committer. label Dec 23, 2021
@okJiang
Copy link
Member Author

okJiang commented Dec 23, 2021

/run-verify

@ti-chi-bot ti-chi-bot merged commit 43fe550 into pingcap:master Dec 23, 2021
@okJiang okJiang deleted the add-rfc-pre-check branch December 23, 2021 10:18
zhaoxinyu pushed a commit to zhaoxinyu/ticdc that referenced this pull request Dec 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/dm Issues or PRs related to DM. release-note-none Denotes a PR that doesn't merit a release note. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Propose enhanced pre-check rfc
8 participants