Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve TSO proxy based on the existing TSO Follower Batching framework #6565

Draft
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

binshi-bing
Copy link
Contributor

@binshi-bing binshi-bing commented Jun 7, 2023

What problem does this PR solve?

Issue Number: Close #6549

What is changed and how does it work?

For what this pr aims to fix and improve, please check the issue #6549.
Specifically, it makes the following changes:
1. Decouple dispatcher goroutine from grpc.ServerStream.
    So if a client cancels the grpc RPCs, it won't cause dispatcher to exit.
2. Decouple the forward error from the grpc.ServerStream related error.
    The former has global impact and can cause the entire dispatcher goroutine of a forward stream to exit,
    and the latter only has local impact on the corresponding grpc.ServerStream and client.
3. After receiving response from the tso microservice for the aggregated/batch request,
    it asynchronously sends the responses to all stream processing goroutine and
    let the corresponding grpc.ServerStreams to send their response to the clients.
4. After forward error happens and dispatcher goroutine exits, clean up unprocessed requests and
    send error responses to the client as soon as possible.
5. Other refactors.

Check List

Tests

  • Unit test

Release note

None.

@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Jun 7, 2023

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • rleungx

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot bot added the release-note-none Denotes a PR that doesn't merit a release note. label Jun 7, 2023
@binshi-bing binshi-bing requested review from lhy1024, JmPotato and rleungx and removed request for lhy1024 and JmPotato June 7, 2023 05:45
@ti-chi-bot ti-chi-bot bot requested a review from disksing June 7, 2023 05:45
@binshi-bing binshi-bing requested review from JmPotato, lhy1024 and rleungx and removed request for rleungx, disksing and JmPotato June 7, 2023 05:46
@codecov
Copy link

codecov bot commented Jun 7, 2023

Codecov Report

Patch coverage: 64.35% and project coverage change: -0.10 ⚠️

Comparison is base (a5d76c6) 74.80% compared to head (f193fe4) 74.70%.

❗ Current head f193fe4 differs from pull request most recent head 8c070de. Consider uploading reports for the commit 8c070de to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6565      +/-   ##
==========================================
- Coverage   74.80%   74.70%   -0.10%     
==========================================
  Files         417      416       -1     
  Lines       42585    42629      +44     
==========================================
- Hits        31855    31848       -7     
- Misses       7936     7986      +50     
- Partials     2794     2795       +1     
Flag Coverage Δ
unittests 74.70% <64.35%> (-0.10%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/mcs/tso/server/server.go 54.75% <ø> (ø)
pkg/utils/tsoutil/tso_request.go 46.73% <45.00%> (-3.27%) ⬇️
pkg/mcs/tso/server/grpc_service.go 62.72% <50.00%> (+5.15%) ⬆️
pkg/utils/tsoutil/tso_proto_factory.go 82.75% <60.00%> (-11.98%) ⬇️
server/grpc_service.go 50.26% <66.66%> (+0.44%) ⬆️
pkg/utils/tsoutil/tso_dispatcher.go 80.73% <79.72%> (+7.09%) ⬆️
server/server.go 75.59% <100.00%> (-0.34%) ⬇️

... and 42 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@binshi-bing binshi-bing force-pushed the improve-tso-proxy branch 2 times, most recently from 551830a to 3625afb Compare June 7, 2023 07:25
@@ -114,9 +114,6 @@ type Server struct {
keyspaceGroupManager *tso.KeyspaceGroupManager
// Store as map[string]*grpc.ClientConn
clientConns sync.Map
// tsoDispatcher is used to dispatch the TSO requests to
Copy link
Contributor

@lhy1024 lhy1024 Jun 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why will tso server remove it?

Comment on lines 181 to 190
cctx, cancel := context.WithCancel(ctx)
defer cancel()
done := make(chan struct{})
dl := deadline{
timer: time.After(DefaultTSOProxyTimeout),
done: done,
cancel: cancel,
}
select {
case tsDeadlineCh <- dl:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about using context.WithTimeout

log.Info("exiting from the dispatch loop. cleaning up the pending requests",
zap.String("forwarded-host", forwardedHost))
if forwardStream != nil {
forwardStream.closeSend()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it mean that a new stream is created for each request?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It won't, because we only start one dispatch loop (this function) for all requests with the same forwarded host until there is any forwarding related error which causes the loop to exit, and we do this check and forwardStream.closeSend() in defer func() when exiting from this loop.

@@ -398,14 +367,72 @@ func (s *GrpcServer) Tso(stream pdpb.PD_TsoServer) error {
}
}

func (s *GrpcServer) getForwardedHost(ctx, streamCtx context.Context) (forwardedHost string, err error) {
// forwardTSO forwards the incoming TSO requests to the TSO microservice.
func (s *GrpcServer) forwardTSO(stream pdpb.PD_TsoServer) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to update getGlobalTSOFromTSOServer?

Copy link
Contributor Author

@binshi-bing binshi-bing Jun 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before the change in this pr is proved to work as expected in dev/staging, I plan to keep the other RPCs unchanged and unimpacted.

@binshi-bing binshi-bing force-pushed the improve-tso-proxy branch 4 times, most recently from 1a40dbe to ff66f31 Compare June 7, 2023 18:32
@ti-chi-bot ti-chi-bot bot added the status/LGT1 Indicates that a PR has LGTM 1. label Jun 8, 2023
@binshi-bing binshi-bing changed the title Improve tso proxy Improve TSO proxy based on the existing TSO Follower Batching framework Jun 8, 2023
Signed-off-by: Bin Shi <binshi.bing@gmail.com>
Signed-off-by: Bin Shi <binshi.bing@gmail.com>
Signed-off-by: Bin Shi <binshi.bing@gmail.com>
Signed-off-by: Bin Shi <binshi.bing@gmail.com>
Signed-off-by: Bin Shi <binshi.bing@gmail.com>
Signed-off-by: Bin Shi <binshi.bing@gmail.com>
Signed-off-by: Bin Shi <binshi.bing@gmail.com>
Signed-off-by: Bin Shi <binshi.bing@gmail.com>
Signed-off-by: Bin Shi <binshi.bing@gmail.com>
ti-chi-bot bot pushed a commit that referenced this pull request Jun 9, 2023
… gPRC stream (#6572)

close #6549, ref #6565

Simplify tso proxy implementation by using one forward stream for one grpc.ServerStream.
#6565 is a longer term solution for both follower batching and tso microservice. 
It's well implemented, but just need more time to bake, and we need a short term workable solution for now.

Signed-off-by: Bin Shi <binshi.bing@gmail.com>
@ti-chi-bot ti-chi-bot bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 9, 2023
@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Jun 9, 2023

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@binshi-bing binshi-bing marked this pull request as draft June 13, 2023 03:57
@ti-chi-bot ti-chi-bot bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 13, 2023
rleungx pushed a commit to rleungx/pd that referenced this pull request Aug 2, 2023
… gPRC stream (tikv#6572)

close tikv#6549, ref tikv#6565

Simplify tso proxy implementation by using one forward stream for one grpc.ServerStream.
tikv#6565 is a longer term solution for both follower batching and tso microservice. 
It's well implemented, but just need more time to bake, and we need a short term workable solution for now.

Signed-off-by: Bin Shi <binshi.bing@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. release-note-none Denotes a PR that doesn't merit a release note. status/LGT1 Indicates that a PR has LGTM 1.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

The api leader got stuck at tso requests forwarding
3 participants