Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clientv3/balancer: propagate network-partition error to gRPC #8675

Closed
wants to merge 3 commits into from

Conversation

gyuho
Copy link
Contributor

@gyuho gyuho commented Oct 10, 2017

So that time-outs on network partition can trigger
connection resets to gRPC via Notify channel.
Here we are expecting time-out errors, not gRPC
transport-layer errors, so need manual unpin
to trigger connection resets.

@gyuho
Copy link
Contributor Author

gyuho commented Oct 11, 2017

Still a few more CIs to fix

INFO: 2017/10/11 14:56:23 clientv3/balancer: pin localhost:52987699167444244570
INFO: 2017/10/11 14:56:23 clientv3/health-balancer: localhost:52987699167444244570 becomes unhealthy (grpc: the client connection is closing)
INFO: 2017/10/11 14:56:23 clientv3/retry: error rpc error: code = FailedPrecondition desc = grpc: the client connection is closing on pinned endpoint
INFO: 2017/10/11 14:56:23 clientv3/auth-retry: error rpc error: code = FailedPrecondition desc = grpc: the client connection is closing on pinned endpoint
2017-10-11 14:56:26.018142 I | integration: terminating 5298769916744424457 (unix://localhost:52987699167444244570)
2017-10-11 14:56:26.039898 I | integration: terminated 5298769916744424457 (unix://localhost:52987699167444244570)
--- FAIL: TestKVNewAfterClose (3.26s)
	kv_test.go:477: expected context canceled, got grpc: the client connection is closing
	kv_test.go:483: kv.Get took too long

But still, we've been doing retry wrong here efd7800.

isStop(err) == true should just exit.

So that time-outs on network partition can trigger
connection resets to gRPC via Notify channel.
Here we are expecting time-out errors, not gRPC
transport-layer errors, so need manual unpin
to trigger connection resets.

Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
When the server returns non-transient errors
(e.g. rpctypes.ErrEmptyKey), balancer should not bother
to switch endpoints and just exit with error.

Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
@@ -66,15 +66,17 @@ func (c *Client) newRetryWrapper(isStop retryStopErrFunc) retryRpcFunc {
if logger.V(4) {
logger.Infof("clientv3/retry: error %v on pinned endpoint %s", err, pinned)
}
// do not switch endpoint when server is stopped
// (should exit on non-transient error)
if isStop(err) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this is non-transient error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xiang90 For instance, rpctypes.ErrEmptyKey should not be retried.
There's a test for this https://github.com/coreos/etcd/blob/master/clientv3/integration/kv_test.go#L36-L52.

@gyuho gyuho closed this Oct 17, 2017
@gyuho gyuho deleted the client branch October 17, 2017 18:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants