Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

disttask: correct the usage of context (#48343) #48369

Merged

Conversation

ti-chi-bot
Copy link
Member

This is an automated cherry-pick of #48343

What problem does this PR solve?

Issue Number: close #48303

Problem Summary:

After cancelling the job, I found that the ingest worker did not stop. Instead, it was waiting for the server side(TiKV):

11 @ 0x1c2d6ee 0x1c3e185 0x21cf23c 0x225699a 0x2256987 0x2255c1f 0x22548da 0x2255a73 0x3d9bb96 0x3d999f1 0x3d98066 0x3d8942b 0x3d88f47 0x3d8be0e 0x3ac2a96 0x1c625c1
#	0x21cf23b	google.golang.org/grpc/internal/transport.(*Stream).waitOnHeader+0x7b			/go/pkg/mod/google.golang.org/grpc@v1.59.0/internal/transport/transport.go:327
#	0x2256999	google.golang.org/grpc/internal/transport.(*Stream).RecvCompress+0xb9			/go/pkg/mod/google.golang.org/grpc@v1.59.0/internal/transport/transport.go:342
#	0x2256986	google.golang.org/grpc.(*csAttempt).recvMsg+0xa6					/go/pkg/mod/google.golang.org/grpc@v1.59.0/stream.go:1070
#	0x2255c1e	google.golang.org/grpc.(*clientStream).RecvMsg.func1+0x1e				/go/pkg/mod/google.golang.org/grpc@v1.59.0/stream.go:927
#	0x22548d9	google.golang.org/grpc.(*clientStream).withRetry+0x139					/go/pkg/mod/google.golang.org/grpc@v1.59.0/stream.go:776
#	0x2255a72	google.golang.org/grpc.(*clientStream).RecvMsg+0x112					/go/pkg/mod/google.golang.org/grpc@v1.59.0/stream.go:926
#	0x3d9bb95	github.com/pingcap/tidb/br/pkg/lightning/backend/local.(*Backend).doWrite.func5+0x2b5	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/br/pkg/lightning/backend/local/region_job.go:354
#	0x3d999f0	github.com/pingcap/tidb/br/pkg/lightning/backend/local.(*Backend).doWrite+0x1890	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/br/pkg/lightning/backend/local/region_job.go:389
#	0x3d98065	github.com/pingcap/tidb/br/pkg/lightning/backend/local.(*Backend).writeToTiKV+0x25	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/br/pkg/lightning/backend/local/region_job.go:189
#	0x3d8942a	github.com/pingcap/tidb/br/pkg/lightning/backend/local.(*Backend).executeJob+0xaa	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/br/pkg/lightning/backend/local/local.go:1435
#	0x3d88f46	github.com/pingcap/tidb/br/pkg/lightning/backend/local.(*Backend).startWorker+0x1c6	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/br/pkg/lightning/backend/local/local.go:1344
#	0x3d8be0d	github.com/pingcap/tidb/br/pkg/lightning/backend/local.(*Backend).doImport.func5+0x2d	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/br/pkg/lightning/backend/local/local.go:1677
#	0x3ac2a95	golang.org/x/sync/errgroup.(*Group).Go.func1+0x55

Normally, it should quit when the related context is canceled.

func (s *Stream) waitOnHeader() {
	if s.headerChan == nil {
		// On the server headerChan is always nil since a stream originates
		// only after having received headers.
		return
	}
	select {
	case <-s.ctx.Done():
		// Close the stream to prevent headers/trailers from changing after
		// this function returns.
		s.ct.CloseStream(s, ContextErr(s.ctx.Err()))
		// headerChan could possibly not be closed yet if closeStream raced
		// with operateHeaders; wait until it is closed explicitly here.
		<-s.headerChan
	case <-s.headerChan:
	}
}

Finally I found that the context passed to the local backend is wrong. It should be "task context" instead of "manager context", because the latter will only be canceled when the TiDB process exits.

As for the problem of getting stuck, there should be other solutions: #48352.

What is changed and how it works?

  • For scheduler methods such as Init(), Run(), Pause(), etc., we pass the task context.
  • For interactions with the task table, we pass the manager context.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

@ti-chi-bot ti-chi-bot added ok-to-test Indicates a PR is ready to be tested. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. type/cherry-pick-for-release-7.5 This PR is cherry-picked to release-7.5 from a source PR. labels Nov 7, 2023
@ti-chi-bot ti-chi-bot added the cherry-pick-approved Cherry pick PR approved by release team. label Nov 8, 2023
@ti-chi-bot ti-chi-bot bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Nov 8, 2023
Copy link

ti-chi-bot bot commented Nov 8, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: wjhuang2016, ywqzzy

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added approved lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Nov 8, 2023
Copy link

ti-chi-bot bot commented Nov 8, 2023

[LGTM Timeline notifier]

Timeline:

  • 2023-11-08 02:30:20.341581326 +0000 UTC m=+3611417.928691472: ☑️ agreed by ywqzzy.
  • 2023-11-08 04:30:29.597314608 +0000 UTC m=+3618627.184424753: ☑️ agreed by wjhuang2016.

@tangenta
Copy link
Contributor

tangenta commented Nov 8, 2023

/retest

@ti-chi-bot ti-chi-bot bot merged commit 9786a0d into pingcap:release-7.5 Nov 8, 2023
6 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved cherry-pick-approved Cherry pick PR approved by release team. lgtm ok-to-test Indicates a PR is ready to be tested. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. type/cherry-pick-for-release-7.5 This PR is cherry-picked to release-7.5 from a source PR.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants