-
Notifications
You must be signed in to change notification settings - Fork 271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
timeout: Introduce FailFast, Idle, and Probe middlewares #452
Conversation
This change introduces two new timeout middlewares: `idle` and `failfast`. `idle` causes the service to start failing if polled after being ready and unused for some timeout. This is intended to be driven by `probe-buffer`. `failfast` causes the service to become ready, failing requests, if the inner service does not become ready within a timeout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is very cool, and i didn't notice any blockers. i had a few nits and minor non-blocking suggestions; hopefully they're helpful.
|
||
// Then we wait for the idle timeout, at which point the service | ||
// should start failing fast. | ||
tokio_timer::Delay::new(Instant::now() + max_unavailable + Duration::from_millis(1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we implement this using tokio-timer's mock timer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have any examples of this? I'm open to it but these tests seem Good Enough, since we have a single thread and aren't really exposing ourselves to anything overtly racey
linkerd/app/core/src/svc.rs
Outdated
// Fails the inner service after it has not been polled for the given timeout. | ||
pub fn push_idle_timeout(self, timeout: Duration) -> Layers<Pair<L, timeout::IdleLayer>> { | ||
self.push(timeout::IdleLayer::new(timeout)) | ||
} | ||
|
||
// Makes the service eagerly process and fail requests after the given timeout. | ||
pub fn push_failfast(self, timeout: Duration) -> Layers<Pair<L, timeout::FailFastLayer>> { | ||
self.push(timeout::FailFastLayer::new(timeout)) | ||
} | ||
|
||
// Polls the inner service at least once per interval. | ||
pub fn push_probe(self, interval: Duration) -> Layers<Pair<L, timeout::ProbeLayer>> { | ||
self.push(timeout::ProbeLayer::new(interval)) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is sort of a meta-question but it feels like by now, we've added one of these helpers for pretty much every layer...is this really making the code all that much more readable than just using push
directly? i'd kind of expect the helpers to mainly be used when we have largeish combinations of layers or complex layer configurations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My current thought is that it makes imports slightly saner. I think, ideally, we figure out how to do this with some sort of TraitExt pattern (i.e. so the timeouts module can just expose all of its builders).
|
||
impl std::fmt::Display for FailFastError { | ||
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { | ||
write!(f, "Service in fail-fast") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how easy will it be to trace these errors back to which service failed fast? we might want to emit trace events when failing a request in the fail-fast state.
or, if we wanted to be fancier, we could do something like giving the error a PhantomData
field that's the inner service's type, and doing something like
write!(f, "Service in fail-fast") | |
write!(f, "{} service in fail-fast", std::any::type_name::<T>()) |
or, we could use the new tracing-error
crate to start capturing the spans where these errors occur, so we can see which service was in fail-fast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should rely on something outside of the individual middlewares for this. I definitely don't want type names in display output (as they're not at all user-consumable)
// === impl IdleError === | ||
|
||
impl std::fmt::Display for IdleError { | ||
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment as in fail-fast about indicating which service idled out
|
||
// Then we wait for the idle timeout, at which point the service | ||
// should still be usable if we don't poll_ready again. | ||
tokio_timer::Delay::new(Instant::now() + timeout + Duration::from_millis(1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, might be nice if we can do this w/ mock timers
linkerd/timeout/src/probe.rs
Outdated
let service = ProbeLayer::new(timeout).layer(Ready); | ||
tokio::spawn(Drive(service, Arc::downgrade(&count))); | ||
let delay = (2 * timeout) + Duration::from_millis(3); | ||
tokio_timer::Delay::new(Instant::now() + delay) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as above, it would be nice to use mock timers for this test...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
This release builds on changes in the prior release to ensure that balancers process updates eagerly. Cache capacity limitations have been removed; and services now fail eagerly, rather than making all requests wait for the timeout to expire. Also, a bug was fixed in the way the `LINKERD2_PROXY_LOG` env variable is parsed. --- * Introduce a backpressure-propagating buffer (linkerd/linkerd2-proxy#451) * trace: update tracing-subscriber to 0.2.3 (linkerd/linkerd2-proxy#455) * timeout: Introduce FailFast, Idle, and Probe middlewares (linkerd/linkerd2-proxy#452) * cache: Let services self-evict (linkerd/linkerd2-proxy#456)
This release builds on changes in the prior release to ensure that balancers process updates eagerly. Cache capacity limitations have been removed; and services now fail eagerly, rather than making all requests wait for the timeout to expire. Also, a bug was fixed in the way the `LINKERD2_PROXY_LOG` env variable is parsed. --- * Introduce a backpressure-propagating buffer (linkerd/linkerd2-proxy#451) * trace: update tracing-subscriber to 0.2.3 (linkerd/linkerd2-proxy#455) * timeout: Introduce FailFast, Idle, and Probe middlewares (linkerd/linkerd2-proxy#452) * cache: Let services self-evict (linkerd/linkerd2-proxy#456)
This change introduces two new timeout middlewares:
idle
andfailfast
.idle
causes the service to start failing if not been called within some timeout. This is intended to be driven byprobe-buffer
.failfast
causes the service to become ready, failing requests, if theinner service does not become ready within a timeout.
probe
ensures that the inner service is polled at least once per interval.