-
Notifications
You must be signed in to change notification settings - Fork 271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
discover: Timeout stalled resolutions #401
Conversation
If a load balancer is not polled, i.e. because it is idle/leaked, its resolution stream can fill up, causing backpressure onto the destination controller client. This change implements a timeout for this scenario. When the resolution buffer is at capacity for a full idle timeout, the background resolution completes to free its resources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change looks good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
am i correct that failing a resolution task is non-fatal (e.g. that when it times out, if the resolution is still needed, we will restart it?)
if so, this lgtm!
If the balanacer were to start consuming from the queue again, it would get an error when the queue is fully consumed and fail (which is what we'd hope). |
This release adds a defense mechanism to ensure that resolutions are released when the associated balancer becomes idle and should have been dropped from the proxy. Furthermore, the proxy is now more selective as to which gRPC status codes are considered "failures" in metrics. --- * Classify some gRPC status codes as non-errors (linkerd/linkerd2-proxy#395) * discover: Timeout stalled resolutions (linkerd/linkerd2-proxy#401)
This release adds a defense mechanism to ensure that resolutions are released when the associated balancer becomes idle and should have been dropped from the proxy. Furthermore, the proxy is now more selective as to which gRPC status codes are considered "failures" in metrics. --- * Classify some gRPC status codes as non-errors (linkerd/linkerd2-proxy#395) * discover: Timeout stalled resolutions (linkerd/linkerd2-proxy#401)
If a load balancer is not polled, i.e. because it is idle/leaked, its
resolution stream can fill up, causing backpressure onto the destination
controller client.
This change implements a timeout for this scenario. When the resolution
buffer is at capacity for a full idle timeout, the background resolution
completes to free its resources.