Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rawkv can not recovery after one of the tikv-servers restart #522

Closed
pingyu opened this issue Jun 8, 2022 · 7 comments · Fixed by #665
Closed

Rawkv can not recovery after one of the tikv-servers restart #522

pingyu opened this issue Jun 8, 2022 · 7 comments · Fixed by #665

Comments

@pingyu
Copy link
Contributor

pingyu commented Jun 8, 2022

Version

master, 2807409

Environment

RawKV cluster deploy by TiUP, v6.1.0-alpha
tikv-server --version
TiKV
Release Version:   6.1.0-alpha
Edition:           Community
Git Commit Hash:   c26134bbf04235f61484d4f5ed58c47b520e8015
Git Commit Branch: heads/refs/tags/v6.1.0
UTC Build Time:    2022-06-07 11:55:22
Rust Version:      rustc 1.60.0-nightly (1e12aef3f 2022-02-13)
Enable Features:   jemalloc mem-profiling portable sse test-engine-kv-rocksdb test-engine-raft-raft-engine cloud-aws cloud-gcp cloud-azure
Profile:           dist_release
Cluster setup

4 x TiKV + 3 x PD (deployed together with TiKV), 4 x 32C64GB + 500GB cloud SSD. Kingsoft cloud.

Cluster config
  tikv:
    log.level: "info"
    log.file.max-size: 1024
    log.file.max-backups: 30
    storage.api-version: 1
    storage.enable-ttl: false

Steps to reproduce

  1. Run the test code: rawkv_ha.go
  2. Kill one of the tikv-server (10.2.103.99) by kill -9. Then it will be restarted by systemctl.

What did happen

10.2.103.99 is recovered in less than 30 seconds.

image

But no request is sent to it for minutes.

image

Even there are external retires (rawkv_ha.go)
func workload(ctx context.Context, cli *rawkv.Client, idx int64) (err error) {
	...
	withRetry := func(f func() error) (err error) {
	Loop:
		for i := 0; i < *flagRetryCnt; i++ {
			if e := f(); e == nil {
				return nil
			} else {
				err = multierr.Append(err, errors.Annotatef(e, "retry %d", i))
			}

			select {
			case <-ctx.Done():
				break Loop
			default:
			}
		}
		log.Warn("witryRetry final error", zap.Error(err))
		return
	}

	for {
		for k := begin; k < end; k++ {
			...
			if err = withRetry(
				func() error { return cli.Put(ctxCli, key, value0) },
			); err != nil {
				return errors.Trace(err)
			}
			...
		}
	}
}
The error log of one of the threads (it is collected by go.uber.org/multierr for all the 10 retries).
[2022/06/08 15:37:42.796 +08:00] [ERROR] [rawkv_ha.go:141] ["runWorkload error"] [error="retry 0: tikv server timeout; retry 1: tikv server timeout; retry 2: no available connections; retry 3: no available connections; retry 4: tikv server timeout; retry 5: tikv server timeout; retry 6: no available connections; retry 7: no available connections; retry 8: tikv server timeout; retry 9: no available connections"] [errorVerbose="the following errors occurred:
 -  tikv server timeout
    github.com/tikv/client-go/v2/error.init
    	/disk1/home/pingyu/workspace/client-go/error/error.go:62
    runtime.doInit
    	/disk1/home/pingyu/opt/go-1.16.10/src/runtime/proc.go:6315
    runtime.doInit
    	/disk1/home/pingyu/opt/go-1.16.10/src/runtime/proc.go:6292
    runtime.doInit
    	/disk1/home/pingyu/opt/go-1.16.10/src/runtime/proc.go:6292
    runtime.main
    	/disk1/home/pingyu/opt/go-1.16.10/src/runtime/proc.go:208
    runtime.goexit
    	/disk1/home/pingyu/opt/go-1.16.10/src/runtime/asm_amd64.s:1371
    github.com/tikv/client-go/v2/internal/retry.(*Backoffer).BackoffWithCfgAndMaxSleep
    	/disk1/home/pingyu/workspace/client-go/internal/retry/backoff.go:160
    github.com/tikv/client-go/v2/internal/retry.(*Backoffer).Backoff
    	/disk1/home/pingyu/workspace/client-go/internal/retry/backoff.go:120
    github.com/tikv/client-go/v2/rawkv.(*Client).sendReq
    	/disk1/home/pingyu/workspace/client-go/rawkv/rawkv.go:582
    github.com/tikv/client-go/v2/rawkv.(*Client).PutWithTTL
    	/disk1/home/pingyu/workspace/client-go/rawkv/rawkv.go:251
    github.com/tikv/client-go/v2/rawkv.(*Client).Put
    	/disk1/home/pingyu/workspace/client-go/rawkv/rawkv.go:299
    main.workload.func2
    	/disk1/home/pingyu/workspace/client-go/examples/rawkv/rawkv_ha.go:78
    main.workload.func1
    	/disk1/home/pingyu/workspace/client-go/examples/rawkv/rawkv_ha.go:61
    main.workload
    	/disk1/home/pingyu/workspace/client-go/examples/rawkv/rawkv_ha.go:77
    main.runWorkload.func1
    	/disk1/home/pingyu/workspace/client-go/examples/rawkv/rawkv_ha.go:120
    golang.org/x/sync/errgroup.(*Group).Go.func1
    	/disk1/home/pingyu/go/pkg/mod/golang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57
    runtime.goexit
    	/disk1/home/pingyu/opt/go-1.16.10/src/runtime/asm_amd64.s:1371
    retry 0
    main.workload.func1
    	/disk1/home/pingyu/workspace/client-go/examples/rawkv/rawkv_ha.go:64
    main.workload
    	/disk1/home/pingyu/workspace/client-go/examples/rawkv/rawkv_ha.go:77
    main.runWorkload.func1
    	/disk1/home/pingyu/workspace/client-go/examples/rawkv/rawkv_ha.go:120
    golang.org/x/sync/errgroup.(*Group).Go.func1
    	/disk1/home/pingyu/go/pkg/mod/golang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57
    runtime.goexit
    	/disk1/home/pingyu/opt/go-1.16.10/src/runtime/asm_amd64.s:1371
 -  tikv server timeout
    github.com/tikv/client-go/v2/error.init
    	/disk1/home/pingyu/workspace/client-go/error/error.go:62
    runtime.doInit
    	/disk1/home/pingyu/opt/go-1.16.10/src/runtime/proc.go:6315
    runtime.doInit
    	/disk1/home/pingyu/opt/go-1.16.10/src/runtime/proc.go:6292
    runtime.doInit
    	/disk1/home/pingyu/opt/go-1.16.10/src/runtime/proc.go:6292
    runtime.main
    	/disk1/home/pingyu/opt/go-1.16.10/src/runtime/proc.go:208
    runtime.goexit
    	/disk1/home/pingyu/opt/go-1.16.10/src/runtime/asm_amd64.s:1371
    github.com/tikv/client-go/v2/internal/retry.(*Backoffer).BackoffWithCfgAndMaxSleep
    	/disk1/home/pingyu/workspace/client-go/internal/retry/backoff.go:160
    github.com/tikv/client-go/v2/internal/retry.(*Backoffer).Backoff
    	/disk1/home/pingyu/workspace/client-go/internal/retry/backoff.go:120
    github.com/tikv/client-go/v2/rawkv.(*Client).sendReq
    	/disk1/home/pingyu/workspace/client-go/rawkv/rawkv.go:582
    github.com/tikv/client-go/v2/rawkv.(*Client).PutWithTTL
    	/disk1/home/pingyu/workspace/client-go/rawkv/rawkv.go:251
    github.com/tikv/client-go/v2/rawkv.(*Client).Put
    	/disk1/home/pingyu/workspace/client-go/rawkv/rawkv.go:299
    main.workload.func2
    	/disk1/home/pingyu/workspace/client-go/examples/rawkv/rawkv_ha.go:78
    main.workload.func1
    	/disk1/home/pingyu/workspace/client-go/examples/rawkv/rawkv_ha.go:61
    main.workload
    	/disk1/home/pingyu/workspace/client-go/examples/rawkv/rawkv_ha.go:77
    main.runWorkload.func1
    	/disk1/home/pingyu/workspace/client-go/examples/rawkv/rawkv_ha.go:120
    golang.org/x/sync/errgroup.(*Group).Go.func1
    	/disk1/home/pingyu/go/pkg/mod/golang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57
    runtime.goexit
    	/disk1/home/pingyu/opt/go-1.16.10/src/runtime/asm_amd64.s:1371
    retry 1
    main.workload.func1
    	/disk1/home/pingyu/workspace/client-go/examples/rawkv/rawkv_ha.go:64
    main.workload
    	/disk1/home/pingyu/workspace/client-go/examples/rawkv/rawkv_ha.go:77
    main.runWorkload.func1
    	/disk1/home/pingyu/workspace/client-go/examples/rawkv/rawkv_ha.go:120
    golang.org/x/sync/errgroup.(*Group).Go.func1
    	/disk1/home/pingyu/go/pkg/mod/golang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57
    runtime.goexit
    	/disk1/home/pingyu/opt/go-1.16.10/src/runtime/asm_amd64.s:1371
 -  no available connections
    github.com/tikv/client-go/v2/internal/client.(*batchConn).getClientAndSend
    	/disk1/home/pingyu/workspace/client-go/internal/client/client_batch.go:369
    github.com/tikv/client-go/v2/internal/client.(*batchConn).batchSendLoop
    	/disk1/home/pingyu/workspace/client-go/internal/client/client_batch.go:344
    runtime.goexit
    	/disk1/home/pingyu/opt/go-1.16.10/src/runtime/asm_amd64.s:1371
    github.com/tikv/client-go/v2/internal/client.sendBatchRequest
    	/disk1/home/pingyu/workspace/client-go/internal/client/client_batch.go:789
    github.com/tikv/client-go/v2/internal/client.(*RPCClient).SendRequest
    	/disk1/home/pingyu/workspace/client-go/internal/client/client.go:409
    github.com/tikv/client-go/v2/internal/locate.(*RegionRequestSender).sendReqToRegion
    	/disk1/home/pingyu/workspace/client-go/internal/locate/region_request.go:1166
    github.com/tikv/client-go/v2/internal/locate.(*RegionRequestSender).SendReqCtx
    	/disk1/home/pingyu/workspace/client-go/internal/locate/region_request.go:1001
    github.com/tikv/client-go/v2/internal/locate.(*RegionRequestSender).SendReq
    	/disk1/home/pingyu/workspace/client-go/internal/locate/region_request.go:231
    github.com/tikv/client-go/v2/rawkv.(*Client).sendReq
    	/disk1/home/pingyu/workspace/client-go/rawkv/rawkv.go:573
    github.com/tikv/client-go/v2/rawkv.(*Client).PutWithTTL
    	/disk1/home/pingyu/workspace/client-go/rawkv/rawkv.go:251
    github.com/tikv/client-go/v2/rawkv.(*Client).Put
    	/disk1/home/pingyu/workspace/client-go/rawkv/rawkv.go:299
    main.workload.func2
    	/disk1/home/pingyu/workspace/client-go/examples/rawkv/rawkv_ha.go:78
    main.workload.func1
    	/disk1/home/pingyu/workspace/client-go/examples/rawkv/rawkv_ha.go:61
    main.workload
    	/disk1/home/pingyu/workspace/client-go/examples/rawkv/rawkv_ha.go:77
    main.runWorkload.func1
    	/disk1/home/pingyu/workspace/client-go/examples/rawkv/rawkv_ha.go:120
    golang.org/x/sync/errgroup.(*Group).Go.func1
    	/disk1/home/pingyu/go/pkg/mod/golang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57
    runtime.goexit
    	/disk1/home/pingyu/opt/go-1.16.10/src/runtime/asm_amd64.s:1371
    retry 2
    main.workload.func1
    	/disk1/home/pingyu/workspace/client-go/examples/rawkv/rawkv_ha.go:64
    main.workload
    	/disk1/home/pingyu/workspace/client-go/examples/rawkv/rawkv_ha.go:77
    main.runWorkload.func1
    	/disk1/home/pingyu/workspace/client-go/examples/rawkv/rawkv_ha.go:120
    golang.org/x/sync/errgroup.(*Group).Go.func1
    	/disk1/home/pingyu/go/pkg/mod/golang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57
    runtime.goexit
    	/disk1/home/pingyu/opt/go-1.16.10/src/runtime/asm_amd64.s:1371
 -  no available connections
    github.com/tikv/client-go/v2/internal/client.(*batchConn).getClientAndSend
    	/disk1/home/pingyu/workspace/client-go/internal/client/client_batch.go:369
    github.com/tikv/client-go/v2/internal/client.(*batchConn).batchSendLoop
    	/disk1/home/pingyu/workspace/client-go/internal/client/client_batch.go:344
    runtime.goexit
    	/disk1/home/pingyu/opt/go-1.16.10/src/runtime/asm_amd64.s:1371
    github.com/tikv/client-go/v2/internal/client.sendBatchRequest
    	/disk1/home/pingyu/workspace/client-go/internal/client/client_batch.go:789
    github.com/tikv/client-go/v2/internal/client.(*RPCClient).SendRequest
    	/disk1/home/pingyu/workspace/client-go/internal/client/client.go:409
    github.com/tikv/client-go/v2/internal/locate.(*RegionRequestSender).sendReqToRegion
    	/disk1/home/pingyu/workspace/client-go/internal/locate/region_request.go:1166
    github.com/tikv/client-go/v2/internal/locate.(*RegionRequestSender).SendReqCtx
    	/disk1/home/pingyu/workspace/client-go/internal/locate/region_request.go:1001
    github.com/tikv/client-go/v2/internal/locate.(*RegionRequestSender).SendReq
    	/disk1/home/pingyu/workspace/client-go/internal/locate/region_request.go:231
    github.com/tikv/client-go/v2/rawkv.(*Client).sendReq
    	/disk1/home/pingyu/workspace/client-go/rawkv/rawkv.go:573
    github.com/tikv/client-go/v2/rawkv.(*Client).PutWithTTL
    	/disk1/home/pingyu/workspace/client-go/rawkv/rawkv.go:251
    github.com/tikv/client-go/v2/rawkv.(*Client).Put
    	/disk1/home/pingyu/workspace/client-go/rawkv/rawkv.go:299
    main.workload.func2
    	/disk1/home/pingyu/workspace/client-go/examples/rawkv/rawkv_ha.go:78
    main.workload.func1
    	/disk1/home/pingyu/workspace/client-go/examples/rawkv/rawkv_ha.go:61
    main.workload
    	/disk1/home/pingyu/workspace/client-go/examples/rawkv/rawkv_ha.go:77
    main.runWorkload.func1
    	/disk1/home/pingyu/workspace/client-go/examples/rawkv/rawkv_ha.go:120
    golang.org/x/sync/errgroup.(*Group).Go.func1
    	/disk1/home/pingyu/go/pkg/mod/golang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57
    runtime.goexit
    	/disk1/home/pingyu/opt/go-1.16.10/src/runtime/asm_amd64.s:1371
    retry 3
    main.workload.func1
    	/disk1/home/pingyu/workspace/client-go/examples/rawkv/rawkv_ha.go:64
    main.workload
    	/disk1/home/pingyu/workspace/client-go/examples/rawkv/rawkv_ha.go:77
    main.runWorkload.func1
    	/disk1/home/pingyu/workspace/client-go/examples/rawkv/rawkv_ha.go:120
    golang.org/x/sync/errgroup.(*Group).Go.func1
    	/disk1/home/pingyu/go/pkg/mod/golang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57
    runtime.goexit
    	/disk1/home/pingyu/opt/go-1.16.10/src/runtime/asm_amd64.s:1371
 -  tikv server timeout
    github.com/tikv/client-go/v2/error.init
    	/disk1/home/pingyu/workspace/client-go/error/error.go:62
    runtime.doInit
    	/disk1/home/pingyu/opt/go-1.16.10/src/runtime/proc.go:6315
    runtime.doInit
    	/disk1/home/pingyu/opt/go-1.16.10/src/runtime/proc.go:6292
    runtime.doInit
    	/disk1/home/pingyu/opt/go-1.16.10/src/runtime/proc.go:6292
    runtime.main
    	/disk1/home/pingyu/opt/go-1.16.10/src/runtime/proc.go:208
    runtime.goexit
    	/disk1/home/pingyu/opt/go-1.16.10/src/runtime/asm_amd64.s:1371
    github.com/tikv/client-go/v2/internal/retry.(*Backoffer).BackoffWithCfgAndMaxSleep
    	/disk1/home/pingyu/workspace/client-go/internal/retry/backoff.go:160
    github.com/tikv/client-go/v2/internal/retry.(*Backoffer).Backoff
    	/disk1/home/pingyu/workspace/client-go/internal/retry/backoff.go:120
    github.com/tikv/client-go/v2/rawkv.(*Client).sendReq
    	/disk1/home/pingyu/workspace/client-go/rawkv/rawkv.go:582
    github.com/tikv/client-go/v2/rawkv.(*Client).PutWithTTL
    	/disk1/home/pingyu/workspace/client-go/rawkv/rawkv.go:251
    github.com/tikv/client-go/v2/rawkv.(*Client).Put
    	/disk1/home/pingyu/workspace/client-go/rawkv/rawkv.go:299
    main.workload.func2
    	/disk1/home/pingyu/workspace/client-go/examples/rawkv/rawkv_ha.go:78
    main.workload.func1
    	/disk1/home/pingyu/workspace/client-go/examples/rawkv/rawkv_ha.go:61
    main.workload
    	/disk1/home/pingyu/workspace/client-go/examples/rawkv/rawkv_ha.go:77
    main.runWorkload.func1
    	/disk1/home/pingyu/workspace/client-go/examples/rawkv/rawkv_ha.go:120
    golang.org/x/sync/errgroup.(*Group).Go.func1
    	/disk1/home/pingyu/go/pkg/mod/golang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57
    runtime.goexit
    	/disk1/home/pingyu/opt/go-1.16.10/src/runtime/asm_amd64.s:1371
    retry 4
    main.workload.func1
    	/disk1/home/pingyu/workspace/client-go/examples/rawkv/rawkv_ha.go:64
    main.workload
    	/disk1/home/pingyu/workspace/client-go/examples/rawkv/rawkv_ha.go:77
    main.runWorkload.func1
    	/disk1/home/pingyu/workspace/client-go/examples/rawkv/rawkv_ha.go:120
    golang.org/x/sync/errgroup.(*Group).Go.func1
    	/disk1/home/pingyu/go/pkg/mod/golang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57
    runtime.goexit
    	/disk1/home/pingyu/opt/go-1.16.10/src/runtime/asm_amd64.s:1371
 -  tikv server timeout
    github.com/tikv/client-go/v2/error.init
    	/disk1/home/pingyu/workspace/client-go/error/error.go:62
    runtime.doInit
    	/disk1/home/pingyu/opt/go-1.16.10/src/runtime/proc.go:6315
    runtime.doInit
    	/disk1/home/pingyu/opt/go-1.16.10/src/runtime/proc.go:6292
    runtime.doInit
    	/disk1/home/pingyu/opt/go-1.16.10/src/runtime/proc.go:6292
    runtime.main
    	/disk1/home/pingyu/opt/go-1.16.10/src/runtime/proc.go:208
    runtime.goexit
    	/disk1/home/pingyu/opt/go-1.16.10/src/runtime/asm_amd64.s:1371
    github.com/tikv/client-go/v2/internal/retry.(*Backoffer).BackoffWithCfgAndMaxSleep
    	/disk1/home/pingyu/workspace/client-go/internal/retry/backoff.go:160
    github.com/tikv/client-go/v2/internal/retry.(*Backoffer).Backoff
    	/disk1/home/pingyu/workspace/client-go/internal/retry/backoff.go:120
    github.com/tikv/client-go/v2/rawkv.(*Client).sendReq
    	/disk1/home/pingyu/workspace/client-go/rawkv/rawkv.go:582
    github.com/tikv/client-go/v2/rawkv.(*Client).PutWithTTL
    	/disk1/home/pingyu/workspace/client-go/rawkv/rawkv.go:251
    github.com/tikv/client-go/v2/rawkv.(*Client).Put
    	/disk1/home/pingyu/workspace/client-go/rawkv/rawkv.go:299
    main.workload.func2
    	/disk1/home/pingyu/workspace/client-go/examples/rawkv/rawkv_ha.go:78
    main.workload.func1
    	/disk1/home/pingyu/workspace/client-go/examples/rawkv/rawkv_ha.go:61
    main.workload
    	/disk1/home/pingyu/workspace/client-go/examples/rawkv/rawkv_ha.go:77
    main.runWorkload.func1
    	/disk1/home/pingyu/workspace/client-go/examples/rawkv/rawkv_ha.go:120
    golang.org/x/sync/errgroup.(*Group).Go.func1
    	/disk1/home/pingyu/go/pkg/mod/golang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57
    runtime.goexit
    	/disk1/home/pingyu/opt/go-1.16.10/src/runtime/asm_amd64.s:1371
    retry 5
    main.workload.func1
    	/disk1/home/pingyu/workspace/client-go/examples/rawkv/rawkv_ha.go:64
    main.workload
    	/disk1/home/pingyu/workspace/client-go/examples/rawkv/rawkv_ha.go:77
    main.runWorkload.func1
    	/disk1/home/pingyu/workspace/client-go/examples/rawkv/rawkv_ha.go:120
    golang.org/x/sync/errgroup.(*Group).Go.func1
    	/disk1/home/pingyu/go/pkg/mod/golang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57
    runtime.goexit
    	/disk1/home/pingyu/opt/go-1.16.10/src/runtime/asm_amd64.s:1371
 -  no available connections
    github.com/tikv/client-go/v2/internal/client.(*batchConn).getClientAndSend
    	/disk1/home/pingyu/workspace/client-go/internal/client/client_batch.go:369
    github.com/tikv/client-go/v2/internal/client.(*batchConn).batchSendLoop
    	/disk1/home/pingyu/workspace/client-go/internal/client/client_batch.go:344
    runtime.goexit
    	/disk1/home/pingyu/opt/go-1.16.10/src/runtime/asm_amd64.s:1371
    github.com/tikv/client-go/v2/internal/client.sendBatchRequest
    	/disk1/home/pingyu/workspace/client-go/internal/client/client_batch.go:789
    github.com/tikv/client-go/v2/internal/client.(*RPCClient).SendRequest
    	/disk1/home/pingyu/workspace/client-go/internal/client/client.go:409
    github.com/tikv/client-go/v2/internal/locate.(*RegionRequestSender).sendReqToRegion
    	/disk1/home/pingyu/workspace/client-go/internal/locate/region_request.go:1166
    github.com/tikv/client-go/v2/internal/locate.(*RegionRequestSender).SendReqCtx
    	/disk1/home/pingyu/workspace/client-go/internal/locate/region_request.go:1001
    github.com/tikv/client-go/v2/internal/locate.(*RegionRequestSender).SendReq
    	/disk1/home/pingyu/workspace/client-go/internal/locate/region_request.go:231
    github.com/tikv/client-go/v2/rawkv.(*Client).sendReq
    	/disk1/home/pingyu/workspace/client-go/rawkv/rawkv.go:573
    github.com/tikv/client-go/v2/rawkv.(*Client).PutWithTTL
    	/disk1/home/pingyu/workspace/client-go/rawkv/rawkv.go:251
    github.com/tikv/client-go/v2/rawkv.(*Client).Put
    	/disk1/home/pingyu/workspace/client-go/rawkv/rawkv.go:299
    main.workload.func2
    	/disk1/home/pingyu/workspace/client-go/examples/rawkv/rawkv_ha.go:78
    main.workload.func1
    	/disk1/home/pingyu/workspace/client-go/examples/rawkv/rawkv_ha.go:61
    main.workload
    	/disk1/home/pingyu/workspace/client-go/examples/rawkv/rawkv_ha.go:77
    main.runWorkload.func1
    	/disk1/home/pingyu/workspace/client-go/examples/rawkv/rawkv_ha.go:120
    golang.org/x/sync/errgroup.(*Group).Go.func1
    	/disk1/home/pingyu/go/pkg/mod/golang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57
    runtime.goexit
    	/disk1/home/pingyu/opt/go-1.16.10/src/runtime/asm_amd64.s:1371
    retry 6
    main.workload.func1
    	/disk1/home/pingyu/workspace/client-go/examples/rawkv/rawkv_ha.go:64
    main.workload
    	/disk1/home/pingyu/workspace/client-go/examples/rawkv/rawkv_ha.go:77
    main.runWorkload.func1
    	/disk1/home/pingyu/workspace/client-go/examples/rawkv/rawkv_ha.go:120
    golang.org/x/sync/errgroup.(*Group).Go.func1
    	/disk1/home/pingyu/go/pkg/mod/golang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57
    runtime.goexit
    	/disk1/home/pingyu/opt/go-1.16.10/src/runtime/asm_amd64.s:1371
 -  no available connections
    github.com/tikv/client-go/v2/internal/client.(*batchConn).getClientAndSend
    	/disk1/home/pingyu/workspace/client-go/internal/client/client_batch.go:369
    github.com/tikv/client-go/v2/internal/client.(*batchConn).batchSendLoop
    	/disk1/home/pingyu/workspace/client-go/internal/client/client_batch.go:344
    runtime.goexit
    	/disk1/home/pingyu/opt/go-1.16.10/src/runtime/asm_amd64.s:1371
    github.com/tikv/client-go/v2/internal/client.sendBatchRequest
    	/disk1/home/pingyu/workspace/client-go/internal/client/client_batch.go:789
    github.com/tikv/client-go/v2/internal/client.(*RPCClient).SendRequest
    	/disk1/home/pingyu/workspace/client-go/internal/client/client.go:409
    github.com/tikv/client-go/v2/internal/locate.(*RegionRequestSender).sendReqToRegion
    	/disk1/home/pingyu/workspace/client-go/internal/locate/region_request.go:1166
    github.com/tikv/client-go/v2/internal/locate.(*RegionRequestSender).SendReqCtx
    	/disk1/home/pingyu/workspace/client-go/internal/locate/region_request.go:1001
    github.com/tikv/client-go/v2/internal/locate.(*RegionRequestSender).SendReq
    	/disk1/home/pingyu/workspace/client-go/internal/locate/region_request.go:231
    github.com/tikv/client-go/v2/rawkv.(*Client).sendReq
    	/disk1/home/pingyu/workspace/client-go/rawkv/rawkv.go:573
    github.com/tikv/client-go/v2/rawkv.(*Client).PutWithTTL
    	/disk1/home/pingyu/workspace/client-go/rawkv/rawkv.go:251
    github.com/tikv/client-go/v2/rawkv.(*Client).Put
    	/disk1/home/pingyu/workspace/client-go/rawkv/rawkv.go:299
    main.workload.func2
    	/disk1/home/pingyu/workspace/client-go/examples/rawkv/rawkv_ha.go:78
    main.workload.func1
    	/disk1/home/pingyu/workspace/client-go/examples/rawkv/rawkv_ha.go:61
    main.workload
    	/disk1/home/pingyu/workspace/client-go/examples/rawkv/rawkv_ha.go:77
    main.runWorkload.func1
    	/disk1/home/pingyu/workspace/client-go/examples/rawkv/rawkv_ha.go:120
    golang.org/x/sync/errgroup.(*Group).Go.func1
    	/disk1/home/pingyu/go/pkg/mod/golang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57
    runtime.goexit
    	/disk1/home/pingyu/opt/go-1.16.10/src/runtime/asm_amd64.s:1371
    retry 7
    main.workload.func1
    	/disk1/home/pingyu/workspace/client-go/examples/rawkv/rawkv_ha.go:64
    main.workload
    	/disk1/home/pingyu/workspace/client-go/examples/rawkv/rawkv_ha.go:77
    main.runWorkload.func1
    	/disk1/home/pingyu/workspace/client-go/examples/rawkv/rawkv_ha.go:120
    golang.org/x/sync/errgroup.(*Group).Go.func1
    	/disk1/home/pingyu/go/pkg/mod/golang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57
    runtime.goexit
    	/disk1/home/pingyu/opt/go-1.16.10/src/runtime/asm_amd64.s:1371
 -  tikv server timeout
    github.com/tikv/client-go/v2/error.init
    	/disk1/home/pingyu/workspace/client-go/error/error.go:62
    runtime.doInit
    	/disk1/home/pingyu/opt/go-1.16.10/src/runtime/proc.go:6315
    runtime.doInit
    	/disk1/home/pingyu/opt/go-1.16.10/src/runtime/proc.go:6292
    runtime.doInit
    	/disk1/home/pingyu/opt/go-1.16.10/src/runtime/proc.go:6292
    runtime.main
    	/disk1/home/pingyu/opt/go-1.16.10/src/runtime/proc.go:208
    runtime.goexit
    	/disk1/home/pingyu/opt/go-1.16.10/src/runtime/asm_amd64.s:1371
    github.com/tikv/client-go/v2/internal/retry.(*Backoffer).BackoffWithCfgAndMaxSleep
    	/disk1/home/pingyu/workspace/client-go/internal/retry/backoff.go:160
    github.com/tikv/client-go/v2/internal/retry.(*Backoffer).Backoff
    	/disk1/home/pingyu/workspace/client-go/internal/retry/backoff.go:120
    github.com/tikv/client-go/v2/rawkv.(*Client).sendReq
    	/disk1/home/pingyu/workspace/client-go/rawkv/rawkv.go:582
    github.com/tikv/client-go/v2/rawkv.(*Client).PutWithTTL
    	/disk1/home/pingyu/workspace/client-go/rawkv/rawkv.go:251
    github.com/tikv/client-go/v2/rawkv.(*Client).Put
    	/disk1/home/pingyu/workspace/client-go/rawkv/rawkv.go:299
    main.workload.func2
    	/disk1/home/pingyu/workspace/client-go/examples/rawkv/rawkv_ha.go:78
    main.workload.func1
    	/disk1/home/pingyu/workspace/client-go/examples/rawkv/rawkv_ha.go:61
    main.workload
    	/disk1/home/pingyu/workspace/client-go/examples/rawkv/rawkv_ha.go:77
    main.runWorkload.func1
    	/disk1/home/pingyu/workspace/client-go/examples/rawkv/rawkv_ha.go:120
    golang.org/x/sync/errgroup.(*Group).Go.func1
    	/disk1/home/pingyu/go/pkg/mod/golang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57
    runtime.goexit
    	/disk1/home/pingyu/opt/go-1.16.10/src/runtime/asm_amd64.s:1371
    retry 8
    main.workload.func1
    	/disk1/home/pingyu/workspace/client-go/examples/rawkv/rawkv_ha.go:64
    main.workload
    	/disk1/home/pingyu/workspace/client-go/examples/rawkv/rawkv_ha.go:77
    main.runWorkload.func1
    	/disk1/home/pingyu/workspace/client-go/examples/rawkv/rawkv_ha.go:120
    golang.org/x/sync/errgroup.(*Group).Go.func1
    	/disk1/home/pingyu/go/pkg/mod/golang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57
    runtime.goexit
    	/disk1/home/pingyu/opt/go-1.16.10/src/runtime/asm_amd64.s:1371
 -  no available connections
    github.com/tikv/client-go/v2/internal/client.(*batchConn).getClientAndSend
    	/disk1/home/pingyu/workspace/client-go/internal/client/client_batch.go:369
    github.com/tikv/client-go/v2/internal/client.(*batchConn).batchSendLoop
    	/disk1/home/pingyu/workspace/client-go/internal/client/client_batch.go:344
    runtime.goexit
    	/disk1/home/pingyu/opt/go-1.16.10/src/runtime/asm_amd64.s:1371
    github.com/tikv/client-go/v2/internal/client.sendBatchRequest
    	/disk1/home/pingyu/workspace/client-go/internal/client/client_batch.go:789
    github.com/tikv/client-go/v2/internal/client.(*RPCClient).SendRequest
    	/disk1/home/pingyu/workspace/client-go/internal/client/client.go:409
    github.com/tikv/client-go/v2/internal/locate.(*RegionRequestSender).sendReqToRegion
    	/disk1/home/pingyu/workspace/client-go/internal/locate/region_request.go:1166
    github.com/tikv/client-go/v2/internal/locate.(*RegionRequestSender).SendReqCtx
    	/disk1/home/pingyu/workspace/client-go/internal/locate/region_request.go:1001
    github.com/tikv/client-go/v2/internal/locate.(*RegionRequestSender).SendReq
    	/disk1/home/pingyu/workspace/client-go/internal/locate/region_request.go:231
    github.com/tikv/client-go/v2/rawkv.(*Client).sendReq
    	/disk1/home/pingyu/workspace/client-go/rawkv/rawkv.go:573
    github.com/tikv/client-go/v2/rawkv.(*Client).PutWithTTL
    	/disk1/home/pingyu/workspace/client-go/rawkv/rawkv.go:251
    github.com/tikv/client-go/v2/rawkv.(*Client).Put
    	/disk1/home/pingyu/workspace/client-go/rawkv/rawkv.go:299
    main.workload.func2
    	/disk1/home/pingyu/workspace/client-go/examples/rawkv/rawkv_ha.go:78
    main.workload.func1
    	/disk1/home/pingyu/workspace/client-go/examples/rawkv/rawkv_ha.go:61
    main.workload
    	/disk1/home/pingyu/workspace/client-go/examples/rawkv/rawkv_ha.go:77
    main.runWorkload.func1
    	/disk1/home/pingyu/workspace/client-go/examples/rawkv/rawkv_ha.go:120
    golang.org/x/sync/errgroup.(*Group).Go.func1
    	/disk1/home/pingyu/go/pkg/mod/golang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57
    runtime.goexit
    	/disk1/home/pingyu/opt/go-1.16.10/src/runtime/asm_amd64.s:1371
    retry 9
    main.workload.func1
    	/disk1/home/pingyu/workspace/client-go/examples/rawkv/rawkv_ha.go:64
    main.workload
    	/disk1/home/pingyu/workspace/client-go/examples/rawkv/rawkv_ha.go:77
    main.runWorkload.func1
    	/disk1/home/pingyu/workspace/client-go/examples/rawkv/rawkv_ha.go:120
    golang.org/x/sync/errgroup.(*Group).Go.func1
    	/disk1/home/pingyu/go/pkg/mod/golang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57
    runtime.goexit
    	/disk1/home/pingyu/opt/go-1.16.10/src/runtime/asm_amd64.s:1371"] [stack="main.main
	/disk1/home/pingyu/workspace/client-go/examples/rawkv/rawkv_ha.go:141
runtime.main
	/disk1/home/pingyu/opt/go-1.16.10/src/runtime/proc.go:225"]
Cluster metrics & logs

Clinic: https://clinic.pingcap.com.cn/portal/#/orgs/117/clusters/7106749932299553217
Time: 2022-06-08T15:20:10+08:00 ~ 2022-06-08T15:40:10+08:00

Client log: rawkv_ha.log

Others

Maybe a similar issue with #511.

@Smityz
Copy link
Contributor

Smityz commented Dec 9, 2022

We have encountered similar issues where the client encountered timeouts when accessing specific ranges of keys after a network jitter. The client returns an error "loadRegion from PD failed" and logs show "no available connections". The issue is resolved after restarting the client.

This problem is particularly frustrating because there is no server-side monitoring to track client-side issues. The client's log information is also very messy, making it difficult to analyze the relevant causes.

Implementing client-side monitoring of relevant metrics can improve our ability to identify and troubleshoot such problems. I found #170 and #555 mentioned client-side monitor, but no response. In order to troubleshoot client anomalies, I think there is value in this matter. Do you have plans to move this forward? We can participate in contributing.

@pingyu @disksing @tutububug

@pingyu
Copy link
Contributor Author

pingyu commented Dec 9, 2022

Can work around this issue by disable batch:

import "github.com/tikv/client-go/v2/config"

config.UpdateGlobal(func(conf *config.Config) {
	conf.TiKVClient.MaxBatchSize = 0
})

@pingyu
Copy link
Contributor Author

pingyu commented Dec 9, 2022

We have encountered similar issues where the client encountered timeouts when accessing specific ranges of keys after a network jitter. The client returns an error "loadRegion from PD failed" and logs show "no available connections". The issue is resolved after restarting the client.

This problem is particularly frustrating because there is no server-side monitoring to track client-side issues. The client's log information is also very messy, making it difficult to analyze the relevant causes.

Implementing client-side monitoring of relevant metrics can improve our ability to identify and troubleshoot such problems. I found #170 and #555 mentioned client-side monitor, but no response. In order to troubleshoot client anomalies, I think there is value in this matter. Do you have plans to move this forward? We can participate in contributing.

@pingyu @disksing @tutububug

client-go has client-side metrics, please see https://github.com/tikv/client-go/blob/v2.0.3/metrics/metrics.go.
To use it, you can refer to TiDB here: https://github.com/pingcap/tidb/blob/v6.4.0/metrics/metrics.go#L211.

And you are always welcome to contributing for client-side monitoring and other components.

@Smityz
Copy link
Contributor

Smityz commented Dec 9, 2022

Can work around this issue by disable batch:

import "github.com/tikv/client-go/v2/config"

config.UpdateGlobal(func(conf *config.Config) {
	conf.TiKVClient.MaxBatchSize = 0
})

Thanks for your help!
Is there any issue to explain why this bug happened? Since this is a difficult thing to reproduce, I cannot guarantee that this method will solve the problem.

@pingyu
Copy link
Contributor Author

pingyu commented Dec 9, 2022

I don't know why this happen. But after inspection, I believe that it's a bug in internal batch mechanism.
And I also have done some long-term tests with fault injection on the work around. So I think it works.

I wrote how to reproduce this bug in issue description, please check it.

p.s. As client-go is open source and maintained by community, no one but ourselves can guarantee that this method will solve the problem. If you have doubts, just check it out.

@Smityz
Copy link
Contributor

Smityz commented Dec 12, 2022

I wrote how to reproduce this bug in issue description, please check it.

Perhaps the scenario in which we triggered this error was different, in our scenario there was just a network jitter and the tikv process did not be killed.

It seems that this problem has not been solved at the root. We are also using chaosmesh for fault injection testing, we will update this issue if we have progress.

@pingyu
Copy link
Contributor Author

pingyu commented Dec 12, 2022

We are also using chaosmesh for fault injection testing, we will update this issue if we have progress.

That's great !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants