Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

discover: Timeout stalled resolutions #401

Merged
merged 3 commits into from
Dec 18, 2019
Merged

discover: Timeout stalled resolutions #401

merged 3 commits into from
Dec 18, 2019

Conversation

olix0r
Copy link
Member

@olix0r olix0r commented Dec 17, 2019

If a load balancer is not polled, i.e. because it is idle/leaked, its
resolution stream can fill up, causing backpressure onto the destination
controller client.

This change implements a timeout for this scenario. When the resolution
buffer is at capacity for a full idle timeout, the background resolution
completes to free its resources.

If a load balancer is not polled, i.e. because it is idle/leaked, its
resolution stream can fill up, causing backpressure onto the destination
controller client.

This change implements a timeout for this scenario. When the resolution
buffer is at capacity for a full idle timeout, the background resolution
completes to free its resources.
Copy link
Contributor

@kleimkuhler kleimkuhler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change looks good!

Copy link
Contributor

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

am i correct that failing a resolution task is non-fatal (e.g. that when it times out, if the resolution is still needed, we will restart it?)

if so, this lgtm!

linkerd/proxy/discover/src/buffer.rs Outdated Show resolved Hide resolved
@olix0r
Copy link
Member Author

olix0r commented Dec 18, 2019

If the balanacer were to start consuming from the queue again, it would get an error when the queue is fully consumed and fail (which is what we'd hope).

@olix0r olix0r merged commit 588609b into master Dec 18, 2019
@olix0r olix0r deleted the ver/discover-watchdog branch December 18, 2019 22:09
olix0r added a commit to linkerd/linkerd2 that referenced this pull request Dec 18, 2019
This release adds a defense mechanism to ensure that resolutions are
released when the associated balancer becomes idle and should have
been dropped from the proxy.

Furthermore, the proxy is now more selective as to which gRPC status
codes are considered "failures" in metrics.

---

* Classify some gRPC status codes as non-errors (linkerd/linkerd2-proxy#395)
* discover: Timeout stalled resolutions (linkerd/linkerd2-proxy#401)
olix0r added a commit to linkerd/linkerd2 that referenced this pull request Dec 19, 2019
This release adds a defense mechanism to ensure that resolutions are
released when the associated balancer becomes idle and should have
been dropped from the proxy.

Furthermore, the proxy is now more selective as to which gRPC status
codes are considered "failures" in metrics.

---

* Classify some gRPC status codes as non-errors (linkerd/linkerd2-proxy#395)
* discover: Timeout stalled resolutions (linkerd/linkerd2-proxy#401)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants