Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

timeout: Introduce FailFast, Idle, and Probe middlewares #452

Merged
merged 11 commits into from
Mar 6, 2020
Merged

Conversation

olix0r
Copy link
Member

@olix0r olix0r commented Mar 4, 2020

This change introduces two new timeout middlewares: idle and
failfast.

idle causes the service to start failing if not been called within some timeout. This is intended to be driven by probe-buffer.

failfast causes the service to become ready, failing requests, if the
inner service does not become ready within a timeout.

probe ensures that the inner service is polled at least once per interval.

This change introduces two new timeout middlewares: `idle` and
`failfast`.

`idle` causes the service to start failing if polled after being ready
and unused for some timeout. This is intended to be driven by
`probe-buffer`.

`failfast` causes the service to become ready, failing requests, if the
inner service does not become ready within a timeout.
@olix0r olix0r requested a review from a team March 4, 2020 16:21
@olix0r olix0r changed the title timeout: Introduce FailFast and Idle middlewares timeout: Introduce FailFast, Idle, and Probe middlewares Mar 5, 2020
Copy link
Contributor

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is very cool, and i didn't notice any blockers. i had a few nits and minor non-blocking suggestions; hopefully they're helpful.

linkerd/timeout/src/failfast.rs Outdated Show resolved Hide resolved

// Then we wait for the idle timeout, at which point the service
// should start failing fast.
tokio_timer::Delay::new(Instant::now() + max_unavailable + Duration::from_millis(1))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we implement this using tokio-timer's mock timer?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have any examples of this? I'm open to it but these tests seem Good Enough, since we have a single thread and aren't really exposing ourselves to anything overtly racey

Comment on lines 84 to 98
// Fails the inner service after it has not been polled for the given timeout.
pub fn push_idle_timeout(self, timeout: Duration) -> Layers<Pair<L, timeout::IdleLayer>> {
self.push(timeout::IdleLayer::new(timeout))
}

// Makes the service eagerly process and fail requests after the given timeout.
pub fn push_failfast(self, timeout: Duration) -> Layers<Pair<L, timeout::FailFastLayer>> {
self.push(timeout::FailFastLayer::new(timeout))
}

// Polls the inner service at least once per interval.
pub fn push_probe(self, interval: Duration) -> Layers<Pair<L, timeout::ProbeLayer>> {
self.push(timeout::ProbeLayer::new(interval))
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is sort of a meta-question but it feels like by now, we've added one of these helpers for pretty much every layer...is this really making the code all that much more readable than just using push directly? i'd kind of expect the helpers to mainly be used when we have largeish combinations of layers or complex layer configurations?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My current thought is that it makes imports slightly saner. I think, ideally, we figure out how to do this with some sort of TraitExt pattern (i.e. so the timeouts module can just expose all of its builders).


impl std::fmt::Display for FailFastError {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
write!(f, "Service in fail-fast")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how easy will it be to trace these errors back to which service failed fast? we might want to emit trace events when failing a request in the fail-fast state.

or, if we wanted to be fancier, we could do something like giving the error a PhantomData field that's the inner service's type, and doing something like

Suggested change
write!(f, "Service in fail-fast")
write!(f, "{} service in fail-fast", std::any::type_name::<T>())

or, we could use the new tracing-error crate to start capturing the spans where these errors occur, so we can see which service was in fail-fast.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should rely on something outside of the individual middlewares for this. I definitely don't want type names in display output (as they're not at all user-consumable)

// === impl IdleError ===

impl std::fmt::Display for IdleError {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as in fail-fast about indicating which service idled out


// Then we wait for the idle timeout, at which point the service
// should still be usable if we don't poll_ready again.
tokio_timer::Delay::new(Instant::now() + timeout + Duration::from_millis(1))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, might be nice if we can do this w/ mock timers

linkerd/timeout/src/probe.rs Outdated Show resolved Hide resolved
linkerd/timeout/src/probe.rs Outdated Show resolved Hide resolved
let service = ProbeLayer::new(timeout).layer(Ready);
tokio::spawn(Drive(service, Arc::downgrade(&count)));
let delay = (2 * timeout) + Duration::from_millis(3);
tokio_timer::Delay::new(Instant::now() + delay)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as above, it would be nice to use mock timers for this test...

@olix0r olix0r requested review from kleimkuhler and a team March 5, 2020 23:44
Copy link
Contributor

@kleimkuhler kleimkuhler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@olix0r olix0r merged commit 5206901 into master Mar 6, 2020
@olix0r olix0r deleted the ver/timeouts branch March 6, 2020 20:40
olix0r added a commit to linkerd/linkerd2 that referenced this pull request Mar 10, 2020
This release builds on changes in the prior release to ensure that
balancers process updates eagerly.

Cache capacity limitations have been removed; and services now fail
eagerly, rather than making all requests wait for the timeout to expire.

Also, a bug was fixed in the way the `LINKERD2_PROXY_LOG` env variable
is parsed.

---

* Introduce a backpressure-propagating buffer (linkerd/linkerd2-proxy#451)
* trace: update tracing-subscriber to 0.2.3 (linkerd/linkerd2-proxy#455)
* timeout: Introduce FailFast, Idle, and Probe middlewares (linkerd/linkerd2-proxy#452)
* cache: Let services self-evict (linkerd/linkerd2-proxy#456)
olix0r added a commit to linkerd/linkerd2 that referenced this pull request Mar 10, 2020
This release builds on changes in the prior release to ensure that
balancers process updates eagerly.

Cache capacity limitations have been removed; and services now fail
eagerly, rather than making all requests wait for the timeout to expire.

Also, a bug was fixed in the way the `LINKERD2_PROXY_LOG` env variable
is parsed.

---

* Introduce a backpressure-propagating buffer (linkerd/linkerd2-proxy#451)
* trace: update tracing-subscriber to 0.2.3 (linkerd/linkerd2-proxy#455)
* timeout: Introduce FailFast, Idle, and Probe middlewares (linkerd/linkerd2-proxy#452)
* cache: Let services self-evict (linkerd/linkerd2-proxy#456)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants