-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
clientv3/balancer: propagate network-partition error to gRPC #8675
Conversation
58953a8
to
56dc790
Compare
6260f0c
to
b75c2ba
Compare
Still a few more CIs to fix
But still, we've been doing retry wrong here efd7800.
|
So that time-outs on network partition can trigger connection resets to gRPC via Notify channel. Here we are expecting time-out errors, not gRPC transport-layer errors, so need manual unpin to trigger connection resets. Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
When the server returns non-transient errors (e.g. rpctypes.ErrEmptyKey), balancer should not bother to switch endpoints and just exit with error. Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
@@ -66,15 +66,17 @@ func (c *Client) newRetryWrapper(isStop retryStopErrFunc) retryRpcFunc { | |||
if logger.V(4) { | |||
logger.Infof("clientv3/retry: error %v on pinned endpoint %s", err, pinned) | |||
} | |||
// do not switch endpoint when server is stopped | |||
// (should exit on non-transient error) | |||
if isStop(err) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this is non-transient error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xiang90 For instance, rpctypes.ErrEmptyKey
should not be retried.
There's a test for this https://github.com/coreos/etcd/blob/master/clientv3/integration/kv_test.go#L36-L52.
So that time-outs on network partition can trigger
connection resets to gRPC via Notify channel.
Here we are expecting time-out errors, not gRPC
transport-layer errors, so need manual unpin
to trigger connection resets.