timeout: Introduce FailFast, Idle, and Probe middlewares #452

olix0r · 2020-03-04T16:21:57Z

This change introduces two new timeout middlewares: idle and
failfast.

idle causes the service to start failing if not been called within some timeout. This is intended to be driven by probe-buffer.

failfast causes the service to become ready, failing requests, if the
inner service does not become ready within a timeout.

probe ensures that the inner service is polled at least once per interval.

This change introduces two new timeout middlewares: `idle` and `failfast`. `idle` causes the service to start failing if polled after being ready and unused for some timeout. This is intended to be driven by `probe-buffer`. `failfast` causes the service to become ready, failing requests, if the inner service does not become ready within a timeout.

hawkw

this is very cool, and i didn't notice any blockers. i had a few nits and minor non-blocking suggestions; hopefully they're helpful.

linkerd/timeout/src/failfast.rs

hawkw · 2020-03-05T22:26:26Z

linkerd/timeout/src/failfast.rs

+
+            // Then we wait for the idle timeout, at which point the service
+            // should start failing fast.
+            tokio_timer::Delay::new(Instant::now() + max_unavailable + Duration::from_millis(1))


nit: can we implement this using tokio-timer's mock timer?

Do you have any examples of this? I'm open to it but these tests seem Good Enough, since we have a single thread and aren't really exposing ourselves to anything overtly racey

hawkw · 2020-03-05T22:29:04Z

linkerd/app/core/src/svc.rs

+    // Fails the inner service after it has not been polled for the given timeout.
+    pub fn push_idle_timeout(self, timeout: Duration) -> Layers<Pair<L, timeout::IdleLayer>> {
+        self.push(timeout::IdleLayer::new(timeout))
+    }
+
+    // Makes the service eagerly process and fail requests after the given timeout.
+    pub fn push_failfast(self, timeout: Duration) -> Layers<Pair<L, timeout::FailFastLayer>> {
+        self.push(timeout::FailFastLayer::new(timeout))
+    }
+
+    // Polls the inner service at least once per interval.
+    pub fn push_probe(self, interval: Duration) -> Layers<Pair<L, timeout::ProbeLayer>> {
+        self.push(timeout::ProbeLayer::new(interval))
+    }
+


this is sort of a meta-question but it feels like by now, we've added one of these helpers for pretty much every layer...is this really making the code all that much more readable than just using push directly? i'd kind of expect the helpers to mainly be used when we have largeish combinations of layers or complex layer configurations?

My current thought is that it makes imports slightly saner. I think, ideally, we figure out how to do this with some sort of TraitExt pattern (i.e. so the timeouts module can just expose all of its builders).

hawkw · 2020-03-05T22:34:13Z

linkerd/timeout/src/failfast.rs

+
+impl std::fmt::Display for FailFastError {
+    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
+        write!(f, "Service in fail-fast")


how easy will it be to trace these errors back to which service failed fast? we might want to emit trace events when failing a request in the fail-fast state.

or, if we wanted to be fancier, we could do something like giving the error a PhantomData field that's the inner service's type, and doing something like

Suggested change

write!(f, "Service in fail-fast")

write!(f, "{} service in fail-fast", std::any::type_name::<T>())

or, we could use the new tracing-error crate to start capturing the spans where these errors occur, so we can see which service was in fail-fast.

I think we should rely on something outside of the individual middlewares for this. I definitely don't want type names in display output (as they're not at all user-consumable)

hawkw · 2020-03-05T22:38:48Z

linkerd/timeout/src/idle.rs

+// === impl IdleError ===
+
+impl std::fmt::Display for IdleError {
+    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {


same comment as in fail-fast about indicating which service idled out

hawkw · 2020-03-05T22:39:08Z

linkerd/timeout/src/idle.rs

+
+            // Then we wait for the idle timeout, at which point the service
+            // should still be usable if we don't poll_ready again.
+            tokio_timer::Delay::new(Instant::now() + timeout + Duration::from_millis(1))


again, might be nice if we can do this w/ mock timers

linkerd/timeout/src/probe.rs

hawkw · 2020-03-05T22:41:55Z

linkerd/timeout/src/probe.rs

+            let service = ProbeLayer::new(timeout).layer(Ready);
+            tokio::spawn(Drive(service, Arc::downgrade(&count)));
+            let delay = (2 * timeout) + Duration::from_millis(3);
+            tokio_timer::Delay::new(Instant::now() + delay)


as above, it would be nice to use mock timers for this test...

kleimkuhler

Looks good!

This release builds on changes in the prior release to ensure that balancers process updates eagerly. Cache capacity limitations have been removed; and services now fail eagerly, rather than making all requests wait for the timeout to expire. Also, a bug was fixed in the way the `LINKERD2_PROXY_LOG` env variable is parsed. --- * Introduce a backpressure-propagating buffer (linkerd/linkerd2-proxy#451) * trace: update tracing-subscriber to 0.2.3 (linkerd/linkerd2-proxy#455) * timeout: Introduce FailFast, Idle, and Probe middlewares (linkerd/linkerd2-proxy#452) * cache: Let services self-evict (linkerd/linkerd2-proxy#456)

olix0r requested a review from a team March 4, 2020 16:21

olix0r added 6 commits March 4, 2020 21:14

Busy => Waiting

5aa9977

Use a Delay to be notified when the service shuold idle out

5eba878

Merge branch 'master' into ver/timeouts

35efc32

Add builders

f02b8eb

cleanup/simplify

899679e

ws

c7b662c

olix0r changed the title ~~timeout: Introduce FailFast and Idle middlewares~~ timeout: Introduce FailFast, Idle, and Probe middlewares Mar 5, 2020

olix0r mentioned this pull request Mar 5, 2020

Introduce a backpressure-propagating buffer #451

Merged

Remove unneeded return

4a4031b

hawkw approved these changes Mar 5, 2020

View reviewed changes

olix0r added 3 commits March 5, 2020 23:38

simplify probe error type

f536313

probe => probe_ready

f238d85

needless return

48142f0

olix0r requested review from kleimkuhler and a team March 5, 2020 23:44

kleimkuhler approved these changes Mar 6, 2020

View reviewed changes

olix0r merged commit 5206901 into master Mar 6, 2020

olix0r deleted the ver/timeouts branch March 6, 2020 20:40

olix0r mentioned this pull request Mar 10, 2020

proxy: v2.89.0 linkerd/linkerd2#4163

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

timeout: Introduce FailFast, Idle, and Probe middlewares #452

timeout: Introduce FailFast, Idle, and Probe middlewares #452

olix0r commented Mar 4, 2020 •

edited

Loading

hawkw left a comment

hawkw Mar 5, 2020

olix0r Mar 5, 2020

hawkw Mar 5, 2020

olix0r Mar 5, 2020

hawkw Mar 5, 2020

olix0r Mar 5, 2020

hawkw Mar 5, 2020

hawkw Mar 5, 2020

hawkw Mar 5, 2020

kleimkuhler left a comment

	write!(f, "Service in fail-fast")
	write!(f, "{} service in fail-fast", std::any::type_name::<T>())

timeout: Introduce FailFast, Idle, and Probe middlewares #452

timeout: Introduce FailFast, Idle, and Probe middlewares #452

Conversation

olix0r commented Mar 4, 2020 • edited Loading

hawkw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kleimkuhler left a comment

Choose a reason for hiding this comment

olix0r commented Mar 4, 2020 •

edited

Loading