Skip to content
This repository has been archived by the owner on Nov 24, 2023. It is now read-only.

relay: use new Reader, Transformer and Writer #171

Merged
merged 31 commits into from
Jun 27, 2019

Conversation

csuzhangxc
Copy link
Member

@csuzhangxc csuzhangxc commented Jun 12, 2019

What problem does this PR solve?

this PR is a part of #91.

What is changed and how it works?

use new Reader, Transformer and Writer to:

  1. read binlog events from upstream
  2. transform some binlog events
  3. write binlog events into relay log files
  4. do recover for relay log file if needed before processing
  5. refine some code and add some test cases about this refactoring
    • more test cases for the interface methods can be added later.

Check List

Tests

  • Unit test
  • Integration test

Code changes

  • Has persistent data change

Side effects

  • Possible performance regression

@csuzhangxc csuzhangxc added priority/important Major change, requires approval from ≥2 primary reviewers status/WIP This PR is still work in progress type/enhancement Performance improvement or refactoring labels Jun 12, 2019
@codecov
Copy link

codecov bot commented Jun 12, 2019

Codecov Report

Merging #171 into master will not change coverage.
The diff coverage is n/a.

@@             Coverage Diff             @@
##             master       #171   +/-   ##
===========================================
  Coverage   54.7612%   54.7612%           
===========================================
  Files           122        122           
  Lines         14492      14492           
===========================================
  Hits           7936       7936           
  Misses         5766       5766           
  Partials        790        790

@csuzhangxc
Copy link
Member Author

/run-all-tests

1 similar comment
@csuzhangxc
Copy link
Member Author

/run-all-tests

@csuzhangxc
Copy link
Member Author

/run-all-tests

1 similar comment
@csuzhangxc
Copy link
Member Author

/run-all-tests

@csuzhangxc csuzhangxc added status/PTAL This PR is ready for review. Add this label back after committing new changes and removed status/WIP This PR is still work in progress labels Jun 19, 2019
@csuzhangxc
Copy link
Member Author

@GregoryIan @amyangfei PTAL

_, err = t.db.Exec(query)
c.Assert(err, IsNil)
maxRetryCount := 5
for i := 0; i < maxRetryCount; i++ {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can have a function in pkg/unit, like

func WaitSomething(backoff int, fn func() bool) bool {
	for i := 0; i < backoff; i++ {
		if fn() {
			return true
		}

		time.Sleep(10 * time.Millisecond)
	}

	return false
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in 6c3fab5.

case replication.ErrChecksumMismatch:
relayLogDataCorruptionCounter.Inc()
case replication.ErrSyncClosed, replication.ErrNeedSyncAgain:
// do nothing
// do nothing, but the error will be returned
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also stat it into metric?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will be recorded in relayExitWithErrorCounter, should we need to add another metric like relayGoMySQLError?

relay/relay.go Outdated
lastPos.Pos = tResult.LogPos
err = lastGTID.Set(tResult.GTIDSet)
if err != nil {
log.Errorf("[relay] update last GTID set to %v error %v", tResult.GTIDSet, err)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's different with

			err = r.meta.Save(lastPos, lastGTID)
			if err != nil {
				return errors.Trace(err)
			}

one is lenient, one is strict, can both be lenient?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change both to be strict in 201f960.

@IANTHEREAL
Copy link
Collaborator

Rest LGTM

}
return true
}
utils.WaitSomething(backoff, waitTime, waitFn)
Copy link
Contributor

@amyangfei amyangfei Jun 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it need to check the wait result?
besides, in which scenario will other test case write binlog even after this test case is started

Copy link
Member Author

@csuzhangxc csuzhangxc Jun 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WaitSomething can only decrease the possibility of errors.
I observed after RESET MASTER, other test cases write binlog event, may it because multi cases run in different goroutines at the same time?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes?

Copy link
Member Author

@csuzhangxc csuzhangxc Jun 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -27,6 +28,16 @@ import (
"github.com/pingcap/dm/pkg/log"
)

const (
// event timeout when trying to read events from upstream master server.
eventTimeout = 1 * time.Hour
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to the delay requirements of replication, let it 10 or 1minute?

Copy link
Member Author

@csuzhangxc csuzhangxc Jun 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It only takes effect for getting events from go-mysql (the events are already read from upstream server).
In fact, we can even use a Cancel rather than a Timeout, but use a Timeout we can log something to indicate still running.

the timeout between master and salve is controlled by slaveReadTimeout (default 1m) and masterHeartbeatPeriod (default 30s).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

every step will have an impact on the delay, for example, if a master-slave switch happend, there may be no data in 1h, delay would be 1h

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

every step will have an impact on the delay

agree.

if a master-slave switch happend

An error will get with small delay

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, set a reasonable small value and let user or dm know something ASAP

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to 10m in 0527d16.

@IANTHEREAL
Copy link
Collaborator

LGTM

@IANTHEREAL IANTHEREAL added status/LGT1 One reviewer already commented LGTM and removed status/PTAL This PR is ready for review. Add this label back after committing new changes labels Jun 22, 2019
@csuzhangxc
Copy link
Member Author

/run-all-tests

relay/relay.go Outdated Show resolved Hide resolved

// Process implements Process interface
func (d *DummyRelay) Process(ctx context.Context, pr chan pb.ProcessResult) {
select {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here we can use a simple channel send/receive instead of select with a single case

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refined in f9b1663.

time.Sleep(time.Second)
_, gs, err2 := utils.GetMasterStatus(t.db, flavor)
c.Assert(err2, IsNil)
if len(gs.String()) > 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return gs.String() == "" is simpler?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in f9b1663.

// wait success
f2 := func() bool {
count++
if count >= 5 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return count >= 5

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in f9b1663.

tryReSync = false // do not support repeat try re-sync
continue
}
// TODO: try auto fix GTID, and can support auto switching between upstream server later.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to add some log here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added some log in f9b1663.


// record current active relay log file, and keep it until newer file opened
// when current file's fd closed, we should not reset this, because it may re-open again
r.setActiveRelayLog(filename)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to call setActiveRelayLog somewhere?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! add a call to setActiveRelayLog in f9b1663.

@csuzhangxc
Copy link
Member Author

/run-all-tests

Copy link
Contributor

@amyangfei amyangfei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@amyangfei amyangfei added status/LGT2 Two reviewers already commented LGTM, ready for merge and removed status/LGT1 One reviewer already commented LGTM labels Jun 27, 2019
@csuzhangxc csuzhangxc merged commit 6fbddf3 into pingcap:master Jun 27, 2019
@csuzhangxc csuzhangxc deleted the relay-writer-3 branch June 27, 2019 02:11
lichunzhu pushed a commit to lichunzhu/dm that referenced this pull request Apr 6, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
priority/important Major change, requires approval from ≥2 primary reviewers status/LGT2 Two reviewers already commented LGTM, ready for merge type/enhancement Performance improvement or refactoring
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants