Skip to content
This repository has been archived by the owner on Nov 24, 2023. It is now read-only.

worker: fix potential panic when restoring subtask from meta #305

Merged
merged 3 commits into from
Oct 16, 2019

Conversation

csuzhangxc
Copy link
Member

What problem does this PR solve?

[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x53b07e]

goroutine 47 [running]:
context.WithDeadline(0x0, 0x0, 0xbf5be448e73dc52d, 0x463dd73766, 0x2ad5a60, 0x0, 0x0, 0x0)
/usr/local/go/src/context/context.go:384 +0x4e
context.WithTimeout(0x0, 0x0, 0x45d964b800, 0xc001974200, 0x2, 0x2)
/usr/local/go/src/context/context.go:451 +0x6b
github.com/pingcap/dm/dm/worker.(*SubTask).unitTransWaitCondition(0xc000372e40, 0x0, 0x0)
/home/jenkins/workspace/build_dm_master/go/src/github.com/pingcap/dm/dm/worker/subtask.go:627 +0x3ca
github.com/pingcap/dm/dm/worker.(*SubTask).run(0xc000372e40)
/home/jenkins/workspace/build_dm_master/go/src/github.com/pingcap/dm/dm/worker/subtask.go:182 +0x5c
github.com/pingcap/dm/dm/worker.(*SubTask).fetchResult(0xc000372e40, 0xc0004a7560)
/home/jenkins/workspace/build_dm_master/go/src/github.com/pingcap/dm/dm/worker/subtask.go:248 +0xb28
created by github.com/pingcap/dm/dm/worker.(*SubTask).run
/home/jenkins/workspace/build_dm_master/go/src/github.com/pingcap/dm/dm/worker/subtask.go:196 +0x4a8

What is changed and how it works?

before this PR, we have the following order in worker.go:

  1. InitConditionHub registers Worker instances
  2. restoreSubTask restores subtasks from meta
  3. w.ctx, w.cancel = context.WithCancel(context.Background()) inits w.ctx and w.cancel

but when restoring subtasks from meta, w.ctx may be used, then nil pointer will be accessed and panicked.

in this PR, we make sure have the following order:

  1. w.ctx, w.cancel = context.WithCancel(context.Background()) inits w.ctx and w.cancel
  2. InitConditionHub registers Worker instances
  3. restoreSubTask restores subtasks from meta

Check List

Tests

  • Unit test
  • Integration test

Related changes

  • Need to cherry-pick to the release branch
  • Need to be included in the release note

@csuzhangxc csuzhangxc added priority/normal Minor change, requires approval from ≥1 primary reviewer status/PTAL This PR is ready for review. Add this label back after committing new changes type/bug-fix Bug fix needs-cherry-pick-release-1.0 This PR should be cherry-picked to release-1.0. Remove this label after cherry-picked to release-1.0 needs-update-release-note This PR should be added into release notes. Remove this label once the release notes are updated labels Oct 8, 2019
@codecov
Copy link

codecov bot commented Oct 8, 2019

Codecov Report

Merging #305 into master will not change coverage.
The diff coverage is n/a.

@@             Coverage Diff             @@
##             master       #305   +/-   ##
===========================================
  Coverage   60.1755%   60.1755%           
===========================================
  Files           134        134           
  Lines         14810      14810           
===========================================
  Hits           8912       8912           
  Misses         5059       5059           
  Partials        839        839

@csuzhangxc
Copy link
Member Author

@amyangfei @WangXiangUSTC PTAL

Copy link
Contributor

@WangXiangUSTC WangXiangUSTC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rest LGTM

Comment on lines +86 to +94
// release resources, NOTE: we need to refactor New/Init/Start/Close for components later.
w2.cancel()
w2.subTaskHolder.closeAllSubTasks()
if w2.meta != nil {
w2.meta.Close()
}
if w2.db != nil {
w2.db.Close()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we just use w2.Close(), and maybe need add some check in Close (close something when it is not nil)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can.
But I also think we should refactor the pattern of components' new/init/start/close later.
like new only creates fields but does not acquire (IO) resources and start no goroutines, then when new failed, no operation needs to do and let GC to remove the objects is enough.

I think we need to do some refactoring when developing the HA version of DM.

what's your opinion?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK,we can do this later

dm/worker/worker.go Outdated Show resolved Hide resolved
csuzhangxc and others added 2 commits October 14, 2019 14:24
Co-Authored-By: WangXiangUSTC <wx347249478@gmail.com>
@WangXiangUSTC
Copy link
Contributor

LGTM

@WangXiangUSTC WangXiangUSTC added status/LGT1 One reviewer already commented LGTM and removed status/PTAL This PR is ready for review. Add this label back after committing new changes labels Oct 15, 2019
Copy link
Contributor

@amyangfei amyangfei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@amyangfei amyangfei added status/LGT2 Two reviewers already commented LGTM, ready for merge and removed status/LGT1 One reviewer already commented LGTM labels Oct 15, 2019
@csuzhangxc csuzhangxc merged commit 4c28fed into pingcap:master Oct 16, 2019
@csuzhangxc csuzhangxc deleted the fix-worker-ctx-panic branch October 16, 2019 01:53
@sre-bot
Copy link

sre-bot commented Oct 16, 2019

cherry pick to release-1.0 in PR #311

@sre-bot sre-bot added already-cherry-pick-1.0 The related PR is already cherry-picked to release-1.0. Add this label once the PR is cherry-picked and removed needs-cherry-pick-release-1.0 This PR should be cherry-picked to release-1.0. Remove this label after cherry-picked to release-1.0 labels Oct 16, 2019
@csuzhangxc csuzhangxc added already-update-release-note The release note is updated. Add this label once the release note is updated and removed needs-update-release-note This PR should be added into release notes. Remove this label once the release notes are updated labels Nov 1, 2019
lichunzhu pushed a commit to lichunzhu/dm that referenced this pull request Apr 6, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
already-cherry-pick-1.0 The related PR is already cherry-picked to release-1.0. Add this label once the PR is cherry-picked already-update-release-note The release note is updated. Add this label once the release note is updated priority/normal Minor change, requires approval from ≥1 primary reviewer status/LGT2 Two reviewers already commented LGTM, ready for merge type/bug-fix Bug fix
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants