Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce a backpressure-propagating buffer #451

Merged
merged 5 commits into from
Mar 6, 2020
Merged

Conversation

olix0r
Copy link
Member

@olix0r olix0r commented Mar 4, 2020

The tower buffer implementation uses the underlying channel for
backpressure. This admits requests before the inner service becomes
ready, and doesn't fully propagate error conditions.

This replacement buffer implementation exposes readiness via a Watch.
This prevents callers from sending requests when the inner service is
not ready and allows service errors to be propagated reliably. The
buffer's background task drops the inner service as soon as it fails.

The effect of this is that the buffer's channel is only used for
requests that race to submit requests while the inner service is ready,
and so it seems likely that they only need enough slots to satisfy the
proxy's expected concurrency (i.e. in terms of cores).

@olix0r olix0r requested a review from a team March 4, 2020 16:13
@olix0r
Copy link
Member Author

olix0r commented Mar 4, 2020

@carllerche's feedback is that the delay-based probing may be unnecessary if inner services are responsible for notifying the task when they need to change (even when they return Ready).

@olix0r olix0r force-pushed the ver/probe-buffer branch from 5e53bfb to d4d05ac Compare March 5, 2020 02:29
@olix0r olix0r changed the title Introduce a probing, backpressure-propagating buffer Introduce a backpressure-propagating buffer Mar 5, 2020
The tower buffer implementation uses the underlying channel for
backpressure. This admits requests before the inner service becomes
ready, and doesn't fully propagate error conditions.

This replacement buffer implementation exposes readiness via a `Watch`.
This prevents callers from sending requests when the inner service is
not ready and allows service errors to be propagated reliably.  The
buffer's background task drops the inner service as soon as it fails.

The effect of this is that the buffer's channel is only used for
requests that race to submit requests while the inner service is ready,
and so it seems likely that they only need enough slots to satisfy the
proxy's expected concurrency (i.e. in terms of cores).
@olix0r olix0r force-pushed the ver/probe-buffer branch from 5300d49 to 5ada587 Compare March 5, 2020 03:45
@olix0r
Copy link
Member Author

olix0r commented Mar 5, 2020

I've removed the probing logic from the buffer and moved it into a dedicated middleware -- there's no reason it needs to be coupled to the buffer.

Copy link
Contributor

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall, this looks good to me...i had a thought about how we drive the inner service's response future, if you're interested, although i'm not sure whether or not it has that big an impact.

linkerd/buffer/src/dispatch.rs Outdated Show resolved Hide resolved
linkerd/buffer/src/dispatch.rs Outdated Show resolved Hide resolved
linkerd/buffer/src/error.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great! 🙌

linkerd/buffer/src/dispatch.rs Show resolved Hide resolved
.as_mut()
.expect("Service must not be dropped")
.call(request);
let _ = tx.send(Ok(fut));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how often do we think requesters will hang up after requests are dispatched? is this something that we would want to record a TRACE/DEBUG event for?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will happen whenever a request is canceled (i.e. the client resets the stream or drops the connection). Could go either way

Copy link
Contributor

@kleimkuhler kleimkuhler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change looks awesome! I have a question about when an inner service errors, but otherwise looks good.

linkerd/buffer/src/service.rs Outdated Show resolved Hide resolved
linkerd/buffer/src/dispatch.rs Outdated Show resolved Hide resolved
linkerd/buffer/src/dispatch.rs Show resolved Hide resolved
@olix0r olix0r requested a review from kleimkuhler March 6, 2020 19:24
Copy link
Contributor

@kleimkuhler kleimkuhler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@olix0r olix0r merged commit 01ef5ce into master Mar 6, 2020
@olix0r olix0r deleted the ver/probe-buffer branch March 6, 2020 20:23
olix0r added a commit to linkerd/linkerd2 that referenced this pull request Mar 10, 2020
This release builds on changes in the prior release to ensure that
balancers process updates eagerly.

Cache capacity limitations have been removed; and services now fail
eagerly, rather than making all requests wait for the timeout to expire.

Also, a bug was fixed in the way the `LINKERD2_PROXY_LOG` env variable
is parsed.

---

* Introduce a backpressure-propagating buffer (linkerd/linkerd2-proxy#451)
* trace: update tracing-subscriber to 0.2.3 (linkerd/linkerd2-proxy#455)
* timeout: Introduce FailFast, Idle, and Probe middlewares (linkerd/linkerd2-proxy#452)
* cache: Let services self-evict (linkerd/linkerd2-proxy#456)
olix0r added a commit to linkerd/linkerd2 that referenced this pull request Mar 10, 2020
This release builds on changes in the prior release to ensure that
balancers process updates eagerly.

Cache capacity limitations have been removed; and services now fail
eagerly, rather than making all requests wait for the timeout to expire.

Also, a bug was fixed in the way the `LINKERD2_PROXY_LOG` env variable
is parsed.

---

* Introduce a backpressure-propagating buffer (linkerd/linkerd2-proxy#451)
* trace: update tracing-subscriber to 0.2.3 (linkerd/linkerd2-proxy#455)
* timeout: Introduce FailFast, Idle, and Probe middlewares (linkerd/linkerd2-proxy#452)
* cache: Let services self-evict (linkerd/linkerd2-proxy#456)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants