Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lightning stuck in writeToTiKV after inject tikv-failure for 60 minutes #46321

Closed
lance6716 opened this issue Aug 22, 2023 · 1 comment · Fixed by #48355
Closed

lightning stuck in writeToTiKV after inject tikv-failure for 60 minutes #46321

lance6716 opened this issue Aug 22, 2023 · 1 comment · Fixed by #48355
Labels
affects-5.3 This bug affects 5.3.x versions. affects-5.4 This bug affects 5.4.x versions. affects-6.1 affects-6.5 affects-7.1 affects-7.5 component/lightning This issue is related to Lightning of TiDB. severity/major type/bug The issue is confirmed as a bug.

Comments

@lance6716
Copy link
Contributor

lance6716 commented Aug 22, 2023

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

inject chaos error

cases:
  - name: tikv_failure
    faultType: failure
    selector: tikv(random)
    warmUpTime: 1m
    faultDuration: 60m
    recoveryTime: 5m

2. What did you expect to see? (Required)

3. What did you see instead (Required)

goroutine 11915 [select, 421 minutes]:
google.golang.org/grpc/internal/transport.(*http2Client).NewStream(0xc004e638c0, {0x59c7d10, 0xc03ca11f20}, 0xc03e701aa0)
	/go/pkg/mod/google.golang.org/grpc@v1.54.0/internal/transport/http2_client.go:829 +0x5a5
google.golang.org/grpc.(*csAttempt).newStream(0xc03cf4fee0)
	/go/pkg/mod/google.golang.org/grpc@v1.54.0/stream.go:486 +0xb2
google.golang.org/grpc.newClientStreamWithParams.func2(0xc03cf4fee0)
	/go/pkg/mod/google.golang.org/grpc@v1.54.0/stream.go:336 +0x3a
google.golang.org/grpc.(*clientStream).withRetry(0xc03e46bb00, 0xc03e7183e0, 0xc03bb32998)
	/go/pkg/mod/google.golang.org/grpc@v1.54.0/stream.go:760 +0x144
google.golang.org/grpc.newClientStreamWithParams({0x59c7c68, 0xc0135e2460}, 0x7bf00e0, 0xc001451180, {0x519c087, 0x1d}, {0x0, 0x0, 0x0, 0x0, ...}, ...)
	/go/pkg/mod/google.golang.org/grpc@v1.54.0/stream.go:345 +0xc76
google.golang.org/grpc.newClientStream.func2({0x59c7c68?, 0xc0135e2460?}, 0xc03baaace0?)
	/go/pkg/mod/google.golang.org/grpc@v1.54.0/stream.go:202 +0x9f
google.golang.org/grpc.newClientStream({0x59c7c68, 0xc0135e2460}, 0x7bf00e0, 0xc001451180, {0x519c087, 0x1d}, {0x0, 0x0, 0x0})
	/go/pkg/mod/google.golang.org/grpc@v1.54.0/stream.go:237 +0x674
google.golang.org/grpc.(*ClientConn).NewStream(0x18ae6c7?, {0x59c7c68?, 0xc0135e2460?}, 0xc03bb32c80?, {0x519c087?, 0x3998efd?}, {0x0?, 0x59c7c68?, 0xc0135e2460?})
	/go/pkg/mod/google.golang.org/grpc@v1.54.0/stream.go:162 +0x1ae
github.com/pingcap/kvproto/pkg/import_sstpb.(*importSSTClient).Write(0xf3457203a36cfba0?, {0x59c7c68?, 0xc0135e2460?}, {0x0?, 0x31?, 0x40?})
	/go/pkg/mod/github.com/pingcap/kvproto@v0.0.0-20230523065550-8b641fa69bf3/pkg/import_sstpb/import_sstpb.pb.go:2714 +0x6d
github.com/pingcap/tidb/br/pkg/lightning/backend/local.(*Backend).writeToTiKV(0xc0014a6180, {0x59c7c68, 0xc0135e2460}, 0xc0310ccc00)
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/lightning/backend/local/region_job.go:231 +0xb5c
github.com/pingcap/tidb/br/pkg/lightning/backend/local.(*Backend).executeJob(0xc0014a6180, {0x59c7c68, 0xc0135e2460}, 0xc0310ccc00)
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/lightning/backend/local/local.go:1357 +0xb4
github.com/pingcap/tidb/br/pkg/lightning/backend/local.(*Backend).startWorker(0x4b1b6c0?, {0x59c7c68, 0xc0135e2460}, 0xc0006bc7e0, 0x0?, 0x0?)
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/lightning/backend/local/local.go:1271 +0x125
github.com/pingcap/tidb/br/pkg/lightning/backend/local.(*Backend).doImport.func3()
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/lightning/backend/local/local.go:1547 +0x31
golang.org/x/sync/errgroup.(*Group).Go.func1()
	/go/pkg/mod/golang.org/x/sync@v0.2.0/errgroup/errgroup.go:75 +0x64
created by golang.org/x/sync/errgroup.(*Group).Go
	/go/pkg/mod/golang.org/x/sync@v0.2.0/errgroup/errgroup.go:72 +0xa5

but in fact the TiKV has joined the cluster after injected 60 minutes failure.

lightning-0-goroutine.txt

4. What is your TiDB version? (Required)

@lance6716 lance6716 added the type/bug The issue is confirmed as a bug. label Aug 22, 2023
@jebter jebter added component/lightning This issue is related to Lightning of TiDB. severity/major labels Aug 23, 2023
@ti-chi-bot ti-chi-bot bot added may-affects-5.2 This bug maybe affects 5.2.x versions. may-affects-5.3 This bug maybe affects 5.3.x versions. may-affects-5.4 This bug maybe affects 5.4.x versions. may-affects-6.1 may-affects-6.5 may-affects-7.1 labels Aug 23, 2023
@lance6716
Copy link
Contributor Author

In the test environment, lightning reads the TiKV peer address from PD which is "tc-a-tikv-0.tc-a-tikv-peer.testbed-endless-xxx-resilience-blx9k.svc:20160".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-5.3 This bug affects 5.3.x versions. affects-5.4 This bug affects 5.4.x versions. affects-6.1 affects-6.5 affects-7.1 affects-7.5 component/lightning This issue is related to Lightning of TiDB. severity/major type/bug The issue is confirmed as a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants