Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

client: fix the pd client could be blocked in some cases #3283

Merged
merged 3 commits into from
Dec 17, 2020

Conversation

zier-one
Copy link
Member

@zier-one zier-one commented Dec 17, 2020

Signed-off-by: leoppro zhaoyilin@pingcap.com

What problem does this PR solve?

Fix the PD client could be blocked in some cases

What is changed and how it works?

There are two levels of context in PD client: client context and tso request context.
the tsoRequest.Wait() function could be blocked when the client context is canceled because the Wait() function haven't listened the client context

Benchmark

the version of PD server: v4.0.8

before this PR:

Start benchmark #0, duration: 5s
Create 3 client(s) for benchmark
count:768290, max:9, min:0, avg:0, >1ms:34372, >2ms:2037, >5ms:457, >10ms:0, >30ms:0 >50ms:0 >100ms:0 >200ms:0 >400ms:0 >800ms:0 >1s:0
count:800952, max:9, min:0, avg:0, >1ms:96750, >2ms:13593, >5ms:1495, >10ms:0, >30ms:0 >50ms:0 >100ms:0 >200ms:0 >400ms:0 >800ms:0 >1s:0
count:745906, max:8, min:0, avg:0, >1ms:12877, >2ms:3652, >5ms:1330, >10ms:0, >30ms:0 >50ms:0 >100ms:0 >200ms:0 >400ms:0 >800ms:0 >1s:0
count:759058, max:4, min:0, avg:0, >1ms:11232, >2ms:5776, >5ms:0, >10ms:0, >30ms:0 >50ms:0 >100ms:0 >200ms:0 >400ms:0 >800ms:0 >1s:0

Total:
count:3074206, max:9, min:0, avg:0, >1ms:155231, >2ms:25058, >5ms:3282, >10ms:0, >30ms:0 >50ms:0 >100ms:0 >200ms:0 >400ms:0 >800ms:0 >1s:0
count:3074206, >1ms:5.05%, >2ms:0.82%, >5ms:0.11%, >10ms:0.00%, >30ms:0.00% >50ms:0.00% >100ms:0.00% >200ms:0.00% >400ms:0.00% >800ms:0.00% >1s:0.00%

after this PR:

Start benchmark #0, duration: 5s
Create 3 client(s) for benchmark
count:1361814, max:7, min:0, avg:1, >1ms:900605, >2ms:204456, >5ms:2896, >10ms:0, >30ms:0 >50ms:0 >100ms:0 >200ms:0 >400ms:0 >800ms:0 >1s:0
count:1357623, max:11, min:0, avg:1, >1ms:931611, >2ms:151281, >5ms:3936, >10ms:1, >30ms:0 >50ms:0 >100ms:0 >200ms:0 >400ms:0 >800ms:0 >1s:0
count:737217, max:7, min:0, avg:0, >1ms:3830, >2ms:3444, >5ms:1028, >10ms:0, >30ms:0 >50ms:0 >100ms:0 >200ms:0 >400ms:0 >800ms:0 >1s:0
count:1033124, max:8, min:0, avg:0, >1ms:430725, >2ms:83313, >5ms:460, >10ms:0, >30ms:0 >50ms:0 >100ms:0 >200ms:0 >400ms:0 >800ms:0 >1s:0

Total:
count:4489778, max:11, min:0, avg:1, >1ms:2266771, >2ms:442494, >5ms:8320, >10ms:1, >30ms:0 >50ms:0 >100ms:0 >200ms:0 >400ms:0 >800ms:0 >1s:0
count:4489778, >1ms:50.49%, >2ms:9.86%, >5ms:0.19%, >10ms:0.00%, >30ms:0.00% >50ms:0.00% >100ms:0.00% >200ms:0.00% >400ms:0.00% >800ms:0.00% >1s:0.00%

Check List

Tests

  • Unit test

Release note

  • Fix the pd client could be blocked in some cases

Signed-off-by: leoppro <zhaoyilin@pingcap.com>
@rleungx rleungx requested review from JmPotato and removed request for lhy1024 December 17, 2020 03:47
client/client.go Outdated Show resolved Hide resolved
Copy link
Member

@overvenus overvenus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, could you benchmark the change withtools/pd-tso-bench?

@ti-chi-bot ti-chi-bot added the status/LGT1 Indicates that a PR has LGTM 1. label Dec 17, 2020
Signed-off-by: leoppro <zhaoyilin@pingcap.com>
@codecov
Copy link

codecov bot commented Dec 17, 2020

Codecov Report

Merging #3283 (9bdc111) into master (6a10e75) will increase coverage by 0.22%.
The diff coverage is 75.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3283      +/-   ##
==========================================
+ Coverage   73.95%   74.17%   +0.22%     
==========================================
  Files         243      243              
  Lines       23071    23074       +3     
==========================================
+ Hits        17062    17116      +54     
+ Misses       4425     4392      -33     
+ Partials     1584     1566      -18     
Flag Coverage Δ
unittests 74.17% <75.00%> (+0.22%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
client/client.go 65.30% <75.00%> (+0.56%) ⬆️
pkg/tempurl/tempurl.go 45.00% <0.00%> (-15.00%) ⬇️
server/tso/local_allocator.go 63.63% <0.00%> (-3.04%) ⬇️
pkg/etcdutil/etcdutil.go 83.82% <0.00%> (-1.48%) ⬇️
server/schedulers/utils.go 93.05% <0.00%> (-1.39%) ⬇️
server/cluster/coordinator.go 69.52% <0.00%> (-0.48%) ⬇️
server/grpc_service.go 55.99% <0.00%> (+0.51%) ⬆️
server/schedule/operator_controller.go 81.01% <0.00%> (+0.75%) ⬆️
server/core/storage.go 68.80% <0.00%> (+0.79%) ⬆️
server/server.go 73.29% <0.00%> (+0.95%) ⬆️
... and 7 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6a10e75...9bdc111. Read the comment docs.

Copy link
Member

@JmPotato JmPotato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-chi-bot ti-chi-bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Dec 17, 2020
@zier-one
Copy link
Member Author

/label need-cherry-pick-4.0

@ti-srebot
Copy link
Contributor

These labels are not found need-cherry-pick-4.0.

@JmPotato
Copy link
Member

@leoppro Could you please bring this change to release-4.0 also? The PD client in that branch should have the same problem.

@JmPotato JmPotato added the needs-cherry-pick-release-4.0 The PR needs to cherry pick to release-4.0 branch. label Dec 17, 2020
@zier-one
Copy link
Member Author

/label needs-cherry-pick-4.0

@zier-one
Copy link
Member Author

@leoppro Could you please bring this change to release-4.0 also? The PD client in that branch should have the same problem.

ok

@zier-one
Copy link
Member Author

/merge

@ti-chi-bot
Copy link
Member

@leoppro: It seems you want to merge this PR, I will help you trigger all the tests:

/run-all-tests

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the tidb-community-bots/ti-community-prow repository.

@ti-chi-bot
Copy link
Member

@leoppro: you cannot merge your own PR.

In response to this:

/merge

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the tidb-community-bots/ti-community-prow repository.

@lhy1024
Copy link
Contributor

lhy1024 commented Dec 17, 2020

/merge

@ti-chi-bot
Copy link
Member

@lhy1024: It seems you want to merge this PR, I will help you trigger all the tests:

/run-all-tests

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the tidb-community-bots/ti-community-prow repository.

@ti-chi-bot ti-chi-bot added the status/can-merge Indicates a PR has been approved by a committer. label Dec 17, 2020
@ti-chi-bot
Copy link
Member

@leoppro: Your PR has out-of-dated and I have automatically updated it for you.
At the same time I will also trigger all tests for you:

/run-all-tests

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the tidb-community-bots/ti-community-prow repository.

@HunDunDM
Copy link
Member

/merge cancel

@ti-chi-bot ti-chi-bot removed the status/can-merge Indicates a PR has been approved by a committer. label Dec 17, 2020
@HunDunDM
Copy link
Member

HunDunDM commented Dec 17, 2020

It is may not appropriate to use multiple ctx, I think this PR needs more discussion.

@Yisaer
Copy link
Contributor

Yisaer commented Dec 17, 2020

There are two levels of context in PD client: client context and tso request context.

I think when clientCtx is closed, the caller should guarantee the requestCtx should be closed too. That is to say, it's not a bug or improvement in pdClient. @leoppro @overvenus

@zier-one
Copy link
Member Author

There are two levels of context in PD client: client context and tso request context.

I think when clientCtx is closed, the caller should guarantee the requestCtx should be closed too. That is to say, it's not a bug or improvement in pdClient. @leoppro @overvenus

I don't agree with you. Let's think about this following cases:

client:=pd.NewClient(xxx) // this line is equivalent to `client:=pd.NewClientWithContext(context.Backend(),xxx)`

// goroutine1
client.GetTSO(context.Backend(), xx)

// goroutine2 
client.Close() // this will lead to client context canceled, and goroutine1 will be blocked

this case is exist in tikv client:

here we call GetTSO using backend context: https://github.com/pingcap/tidb/blob/master/store/tikv/kv.go#L387
and here we create pd client using backend context: https://github.com/pingcap/tidb/blob/master/store/tikv/kv.go#L88
note that I can't specify a context for this as a user of tikv client.

@HunDunDM
Copy link
Member

I found a similar double ctx design in grpc, and I think it can be merged.

@HunDunDM
Copy link
Member

/merge

@ti-chi-bot
Copy link
Member

@HunDunDM: It seems you want to merge this PR, I will help you trigger all the tests:

/run-all-tests

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the tidb-community-bots/ti-community-prow repository.

@ti-chi-bot ti-chi-bot added the status/can-merge Indicates a PR has been approved by a committer. label Dec 17, 2020
@ti-chi-bot
Copy link
Member

Can merge label has been added.

Git tree hash: 9bdc111

@JmPotato
Copy link
Member

/run-integration-common-test

1 similar comment
@HunDunDM
Copy link
Member

/run-integration-common-test

@ti-chi-bot ti-chi-bot merged commit 52fa509 into tikv:master Dec 17, 2020
ti-srebot pushed a commit to ti-srebot/pd that referenced this pull request Dec 17, 2020
Signed-off-by: ti-srebot <ti-srebot@pingcap.com>
@ti-srebot
Copy link
Contributor

cherry pick to release-4.0 in PR #3285

ti-chi-bot added a commit that referenced this pull request Dec 23, 2020
* cherry pick #3283 to release-4.0

Signed-off-by: ti-srebot <ti-srebot@pingcap.com>

* fmt

Signed-off-by: leoppro <zhaoyilin@pingcap.com>

Co-authored-by: leoppro <i@leop.pro>
Co-authored-by: Ti Prow Robot <71242396+ti-community-prow-bot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-cherry-pick-release-4.0 The PR needs to cherry pick to release-4.0 branch. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants