syncer/: add async flush checkpoint feature #605

lichunzhu · 2020-04-14T12:17:34Z

cherry-pick of #595
If this PR is merged and operate stably on master branch we will then merge #595.

What problem does this PR solve?

Async flush checkpoint to improve effeciency.

What is changed and how it works?

worker goroutine: concurrent worker count

The worker successfully executes a batch of jobs
The worker updates his checkpoint to his latest successful job pos
(The checkpoint saved by the worker goroutine is only global checkpoint, and the table checkpoint is refreshed by ddl job)

flush goroutine:
for loop
1. Get flush type from channel. Check whether we need to update global checkpoint this time. If so, update global checkpoint with min worker pos.
2. Flush checkpoint.

add job:
No longer scheduled wait dml job execution is complete
a. The ddl job is executed, the wait job is completed, the checkpoint is updated, and submitted to the flush goroutine
b. If there is no flush checkpoint for more than 30s, submit a request to the flush goroutine

fake checkpoint:
We will use added dml job location as fake checkpoint to avoid unbalanced dmls to cause an extremely small checkpoint.

Check List

Tests

Unit test
Integration test

Code changes

Has exported variable/fields change

codecov · 2020-04-14T13:21:41Z

Codecov Report

Merging #605 into master will decrease coverage by 0.0024%.
The diff coverage is 48.0000%.

@@               Coverage Diff                @@
##             master       #605        +/-   ##
================================================
- Coverage   57.7236%   57.7212%   -0.0024%     
================================================
  Files           203        201         -2     
  Lines         20515      20353       -162     
================================================
- Hits          11842      11748        -94     
+ Misses         7526       7474        -52     
+ Partials       1147       1131        -16

csuzhangxc · 2020-04-20T02:02:29Z

@lichunzhu please resolve conflicts

lichunzhu · 2020-04-20T02:42:00Z

/run-unit-test

csuzhangxc · 2020-04-20T03:42:33Z

syncer/syncer.go

@@ -721,7 +745,11 @@ func (s *Syncer) addJob(job *job) error {
 		}
 		s.jobWg.Wait()
 		finishedJobsTotal.WithLabelValues("flush", s.cfg.Name, adminQueueName).Inc()
-		return s.flushCheckPoints()


In the old code, flush job will wait for checkpoint flush to complete, do we still need to keep the behavior? or is there any mechanism to wait for checkpoint flushed (like wait before executing DDL)

Newly added structure flushHelper may make flush checkpoint job wait.

csuzhangxc · 2020-04-20T04:31:42Z

syncer/syncer.go

@@ -974,6 +1015,10 @@ func (s *Syncer) sync(tctx *tcontext.Context, queueBucket string, db *DBConn, jo
 			if !ok {
 				return
 			}
+			if sqlJob.tp == xid {
+				lastAddedXidPos.save(sqlJob.location.Clone(), nil)


Is currentLocation more accurate? (although location and currentLocation are the same for XID job).

addressed in c1bef9f

…t chan" This reverts commit d9dbdd1.

…o asyncCheckpoint

ti-chi-bot · 2021-03-24T16:16:07Z

@lichunzhu: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

lichunzhu · 2021-03-25T02:59:54Z

Won't merge

add async flush checkpoint feature

ee41576

lichunzhu added type/enhancement Performance improvement or refactoring needs-cherry-pick-release-1.0 This PR should be cherry-picked to release-1.0. Remove this label after cherry-picked to release-1.0 labels Apr 14, 2020

lichunzhu added 2 commits April 14, 2020 20:41

fix dmctl_basic integration test

0727bd2

fix check sgk bug

3df7e11

lichunzhu requested review from csuzhangxc and WangXiangUSTC April 14, 2020 13:22

lichunzhu added priority/normal Minor change, requires approval from ≥1 primary reviewer status/PTAL This PR is ready for review. Add this label back after committing new changes labels Apr 14, 2020

lichunzhu changed the title ~~add async flush checkpoint feature~~ syncer/: add async flush checkpoint feature Apr 14, 2020

lichunzhu added 2 commits April 16, 2020 11:46

fix bug

db26f76

remove fake job, refine xid pos update

90ac4f3

merge master and resolve conflict

0a46045

csuzhangxc reviewed Apr 20, 2020

View reviewed changes

lichunzhu added 13 commits April 20, 2020 15:08

Serial flush checkpoint for ddl/flush, remove flush checkpoint chan

d9dbdd1

Revert "Serial flush checkpoint for ddl/flush, remove flush checkpoin…

9489452

…t chan" This reverts commit d9dbdd1.

add flusher

848a18f

fix ut

dd54e6b

Merge branch 'master' into asyncCheckpoint

c0a86ef

fix

5ab61c3

Merge branch 'asyncCheckpoint' of https://github.com/lichunzhu/dm int…

d90dd6e

…o asyncCheckpoint

merge master and resolve conflicts

f914295

address comments and fix bugs

c1bef9f

refine syncer error return

7301b66

fix check

4bcc421

fix atomic error bug

9c9b42a

fix ut

eca9c28

fix ut again

3373298

lichunzhu mentioned this pull request May 9, 2020

Proposal: Binlog replication(syncer) unit support plugin #596

Merged

csuzhangxc added the status/Stale label Jul 27, 2020

lance6716 mentioned this pull request Oct 16, 2020

syncer/checkpoint: ignore cancel for update/delete #1174

Merged

ti-srebot mentioned this pull request Oct 16, 2020

syncer/checkpoint: ignore cancel for update/delete (#1174) #1177

Merged

lance6716 mentioned this pull request Oct 22, 2020

add async flush checkpoint feature #1206

Open

ti-chi-bot added the needs-rebase label Mar 24, 2021

lichunzhu closed this Mar 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

syncer/: add async flush checkpoint feature #605

syncer/: add async flush checkpoint feature #605

lichunzhu commented Apr 14, 2020 •

edited

Loading

codecov bot commented Apr 14, 2020 •

edited

Loading

csuzhangxc commented Apr 20, 2020

lichunzhu commented Apr 20, 2020

csuzhangxc Apr 20, 2020

lichunzhu May 6, 2020

csuzhangxc Apr 20, 2020

lichunzhu May 6, 2020

ti-chi-bot commented Mar 24, 2021

lichunzhu commented Mar 25, 2021

syncer/: add async flush checkpoint feature #605

syncer/: add async flush checkpoint feature #605

Conversation

lichunzhu commented Apr 14, 2020 • edited Loading

What problem does this PR solve?

What is changed and how it works?

Check List

codecov bot commented Apr 14, 2020 • edited Loading

Codecov Report

csuzhangxc commented Apr 20, 2020

lichunzhu commented Apr 20, 2020

csuzhangxc Apr 20, 2020

Choose a reason for hiding this comment

lichunzhu May 6, 2020

Choose a reason for hiding this comment

csuzhangxc Apr 20, 2020

Choose a reason for hiding this comment

lichunzhu May 6, 2020

Choose a reason for hiding this comment

ti-chi-bot commented Mar 24, 2021

lichunzhu commented Mar 25, 2021

lichunzhu commented Apr 14, 2020 •

edited

Loading

codecov bot commented Apr 14, 2020 •

edited

Loading