Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

puller(ticdc): fix wrong update splitting behavior after table scheduling #11296

Merged
merged 2 commits into from
Jun 13, 2024

Conversation

lidezhu
Copy link
Collaborator

@lidezhu lidezhu commented Jun 11, 2024

What problem does this PR solve?

Issue Number: close #11219

What is changed and how it works?

  1. There are two cdc nodes A and B, and B start before A, that is thresholdTSB < thresholdTSA;
  2. The sync task of table t is first on A;
  3. Table t has an update event which commitTS is smaller than thresholdTSA and larger than thresholdTSB. So the update event is split to a delete event and an insert event on node A;
  4. But the delete event and insert event cannot be send to the downstream in an atomic way. So if after the delete event is send to downstream and before the insert event being send, the table sync task is scheduling to node B, the update event are received by node B again;
  5. The update event is not split by node B because its commitTS is larger than the thresholdTSB, and node B just send an update sql to downstream which cause data inconsistency;

And there is also another thing to notice that after scheduling, node B will send some events to downstream which are already send by node A; So node B must send these events in an idempotent way;
Previously, this is handled by getting a replicateTS in sink module when sink starts and split these events which commitTS is smaller than replicateTS. But this mechanism is also removed in #11030. So we need to handle this case in puller too.

In this pr, instead of maintaining a separate thresholdTS in sourcemanager, we try to get the replicateTS from sink when puller need to check whether to split the update event.
And since puller module starts working before sink module, so we give replicateTS a default value MAXUInt64 which means to split all update events. After sink starts working, replicateTS will be set to the correct value.

The last thing to notice, when sink restarts due to some error, after restart, the sink may send some events downstream which are already send before restart. These events also need be send in an idempotent way. But these events are already in sorter, so just restart sink cannot accomplish this goal. So we forbid restarting sink in this pr and just restart the changefeed when meet error.

Check List

Tests

  • Manual test (add detailed scripts or steps below)
  1. deploy a cluster with three cdc nodes;
  2. kill all nodes occasionally while running workload and check whether the data is consistent;

Questions

Will it cause performance regression or break compatibility?
Do you need to update user documentation, design documentation or monitoring documentation?

Release note

None

@ti-chi-bot ti-chi-bot bot added do-not-merge/needs-linked-issue release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge/needs-linked-issue release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Jun 11, 2024
Copy link

codecov bot commented Jun 11, 2024

Codecov Report

Attention: Patch coverage is 35.29412% with 33 lines in your changes missing coverage. Please review.

Project coverage is 57.6414%. Comparing base (e3b0bc7) to head (b1eda2d).
Report is 4 commits behind head on master.

Additional details and impacted files
Components Coverage Δ
cdc 61.4538% <53.8043%> (-0.0249%) ⬇️
dm 51.2091% <ø> (+0.0100%) ⬆️
engine 63.3667% <ø> (-0.0353%) ⬇️
Flag Coverage Δ
unit 57.6414% <35.2941%> (-0.0080%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

@@               Coverage Diff                @@
##             master     #11296        +/-   ##
================================================
- Coverage   57.6493%   57.6414%   -0.0080%     
================================================
  Files           850        850                
  Lines        126095     126239       +144     
================================================
+ Hits          72693      72766        +73     
- Misses        48002      48068        +66     
- Partials       5400       5405         +5     

@ti-chi-bot ti-chi-bot bot added needs-1-more-lgtm Indicates a PR needs 1 more LGTM. approved labels Jun 13, 2024
Co-authored-by: CharlesCheung <61726649+CharlesCheung96@users.noreply.github.com>
@lidezhu
Copy link
Collaborator Author

lidezhu commented Jun 13, 2024

/retest

Copy link
Contributor

ti-chi-bot bot commented Jun 13, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: asddongmen, CharlesCheung96

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [CharlesCheung96,asddongmen]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Jun 13, 2024
Copy link
Contributor

ti-chi-bot bot commented Jun 13, 2024

[LGTM Timeline notifier]

Timeline:

  • 2024-06-13 03:07:20.406105898 +0000 UTC m=+606794.459417815: ☑️ agreed by CharlesCheung96.
  • 2024-06-13 06:30:26.767306053 +0000 UTC m=+618980.820617976: ☑️ agreed by asddongmen.

@ti-chi-bot ti-chi-bot bot merged commit e3412d9 into pingcap:master Jun 13, 2024
28 checks passed
@ti-chi-bot ti-chi-bot added needs-cherry-pick-release-7.5 Should cherry pick this PR to release-7.5 branch. needs-cherry-pick-release-6.5 Should cherry pick this PR to release-6.5 branch. needs-cherry-pick-release-7.1 Should cherry pick this PR to release-7.1 branch. needs-cherry-pick-release-8.1 Should cherry pick this PR to release-8.1 branch. labels Jun 13, 2024
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-7.5: #11302.

ti-chi-bot pushed a commit to ti-chi-bot/tiflow that referenced this pull request Jun 13, 2024
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-6.5: #11303.

ti-chi-bot pushed a commit to ti-chi-bot/tiflow that referenced this pull request Jun 13, 2024
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
ti-chi-bot pushed a commit to ti-chi-bot/tiflow that referenced this pull request Jun 13, 2024
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-7.1: #11304.

@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-8.1: #11305.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved lgtm needs-cherry-pick-release-6.5 Should cherry pick this PR to release-6.5 branch. needs-cherry-pick-release-7.1 Should cherry pick this PR to release-7.1 branch. needs-cherry-pick-release-7.5 Should cherry pick this PR to release-7.5 branch. needs-cherry-pick-release-8.1 Should cherry pick this PR to release-8.1 branch. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

data inconsistency after inject kill all cdc chaos
4 participants