-
Notifications
You must be signed in to change notification settings - Fork 271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Overhaul buffering & caching to better-support backpressure #453
Conversation
This encompasses a number of related changes: * The `router` crate has been split up, decoupling its `cache` from its request-routing functionality. The `router` crate now _only_ handles request routing and not caching. * The `cache` crate now exposes a `MakeService<T, Req>` (rather than a `Service<Req>`, returning clones of each inner service). * `lock` is now used in place of a buffer in many places. This is done only to satisfy cloneability and will likely be changed (again) in a followup. * Service profiles no longer use routers/buffers internally. The concrete router picks an * Many middlewares have new Proxy implementations, some of which replace prior Service implementations (liek retry). * DNS refinement is now modled as a `Service` so it can use the same caching infrastructure. * Fallback is restructured to, for instance, avoid using a service profile stack when it cannot be created successfully. This change dramatically improves compilation times, likely due to simpler type constraints in the service profile middleware.
@olix0r i notice that there were a number of changes to the |
use tower::load_shed::error as shed; | ||
use tower_grpc::{self as grpc, Code}; | ||
use tracing::{debug, warn}; | ||
use tracing::debug; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very nitpicky: it seems a little inconsistent to import debug
by name, but use warn!
prefixed with tracing::
in the same file. slight pref for doing the same for both.
{ | ||
fn make_inner(addr: SocketAddr, dst: &ControlAddr, mk_svc: &mut M) -> Self { | ||
let target = client::Target { | ||
addr, | ||
server_name: dst.identity.clone(), | ||
}; | ||
|
||
info_span!("control", peer.addr = %addr, peer.identity = ?dst.identity) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we still care about propagating the peer address/identity to the inner service's future? or is this added somewhere else?
|
||
/// Determines the dispatch deadline for a request. | ||
pub trait Deadline<Req>: Clone { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
am i correct that all the deadline stuff was removed because most buffers are now locks, and we can limit the time spent waiting due to backpressure by timing out futures? just double-checking!
|
||
pub fn push_map_response<M: Clone>(self, map: M) -> Stack<stack::MapResponse<S, M>> { | ||
self.push(stack::MapResponseLayer::new(map)) | ||
/// Validates that this stack serves T-typed targets. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"serves", or "makes new services for"?
@@ -9,6 +7,8 @@ use tracing_subscriber::{ | |||
reload, EnvFilter, FmtSubscriber, | |||
}; | |||
|
|||
const ENV_LOG: &str = "LINKERD2_PROXY_LOG"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did rustfmt decide to move this around?
|
||
impl fmt::Display for NoCapacity { | ||
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result { | ||
write!(f, "router capacity reached ({})", self.0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: should this now be
write!(f, "router capacity reached ({})", self.0) | |
write!(f, "cache capacity reached ({})", self.0) |
/// A layer that that builds a routing service. | ||
/// | ||
/// A `Rec`-typed `Recognize` instance is used to produce a target for each | ||
/// `Req`-typed request. If the router doesn't already have a `Service` for this | ||
/// target, it uses a `Mk`-typed `Service` stack. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this looks like the same comment the old router layer...maybe:
/// A layer that that builds a routing service. | |
/// | |
/// A `Rec`-typed `Recognize` instance is used to produce a target for each | |
/// `Req`-typed request. If the router doesn't already have a `Service` for this | |
/// target, it uses a `Mk`-typed `Service` stack. | |
/// A layer that that builds a cache service. |
(and perhaps add some notes on the cache's eviction policy or something?)
@@ -74,15 +71,15 @@ where | |||
|
|||
fn call(&mut self, target: T) -> Self::Future { | |||
let path = target.to_string(); | |||
trace!("resolve {:?}", path); | |||
debug!(dst = %path, context=%self.context_token, "Resolving"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: weird spacing
debug!(dst = %path, context=%self.context_token, "Resolving"); | |
debug!(dst = %path, context = %self.context_token, "Resolving"); |
@@ -112,6 +109,7 @@ where | |||
.filter_map(|addr| pb::to_addr_meta(addr, &metric_labels)) | |||
.collect::<Vec<_>>(); | |||
if !addr_metas.is_empty() { | |||
debug!(endpoints = %addr_metas.len(), "Add"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: if len
is a usize, we shouldn't need to use Display
. not a big deal in practice, but we could theoretically use the typed integer value for e.g. a metric of endpoint counts or something, in the future...
debug!(endpoints = %addr_metas.len(), "Add"); | |
debug!(endpoints = addr_metas.len(), "Add"); |
let mut addrs = overrides.iter(); | ||
if let Some(a) = addrs.next() { | ||
write!(f, "{}", a)?; | ||
} | ||
for a in addrs { | ||
write!(f, ", {}", a)?; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tioli: i think this could also be accomplished with Formatter::debug_list
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will continue looking through this change and I'm sure the follow-up on this will help clarify more, but from what I have seen this is looking to me.
// ready service. | ||
.push_timeout(dispatch_timeout) | ||
// Removes the override header after it has been used to | ||
// determine a reuquest target. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: reuquest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no obvious blockers --- the rationale behind the change makes sense. i commented on a few minor things.
} | ||
|
||
mod sealed { | ||
pub trait PollUpdate { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be nice if there was a comment here explaining what this is for...its purpose is not very obvious.
This release includes a significant internal change to how backpressure is handled in the proxy. These changes fix a class of bugs related to discovery staleness, and it should be rarer to encounter "dispatch timeout" errors. --- * orig-proto: Be more flexible to stack placement (linkerd/linkerd2-proxy#444) * Remove Clone requirement from controller clients (linkerd/linkerd2-proxy#449) * server: Simplify HTTP server type constraints (linkerd/linkerd2-proxy#450) * Overhaul buffering & caching to better-support backpressure (linkerd/linkerd2-proxy#453)
This release includes a significant internal change to how backpressure is handled in the proxy. These changes fix a class of bugs related to discovery staleness, and it should be rarer to encounter "dispatch timeout" errors. --- * orig-proto: Be more flexible to stack placement (linkerd/linkerd2-proxy#444) * Remove Clone requirement from controller clients (linkerd/linkerd2-proxy#449) * server: Simplify HTTP server type constraints (linkerd/linkerd2-proxy#450) * Overhaul buffering & caching to better-support backpressure (linkerd/linkerd2-proxy#453)
* proxy: v2.88.0 This release includes a significant internal change to how backpressure is handled in the proxy. These changes fix a class of bugs related to discovery staleness, and it should be rarer to encounter "dispatch timeout" errors. --- * orig-proto: Be more flexible to stack placement (linkerd/linkerd2-proxy#444) * Remove Clone requirement from controller clients (linkerd/linkerd2-proxy#449) * server: Simplify HTTP server type constraints (linkerd/linkerd2-proxy#450) * Overhaul buffering & caching to better-support backpressure (linkerd/linkerd2-proxy#453)
This encompasses a number of related changes:
router
crate has been split up, decoupling itscache
from itsrequest-routing functionality. The
router
crate now only handlesrequest routing and not caching.
cache
crate now exposes aMakeService<T, Req>
(rather than aService<Req>
, returning clones of each inner service).lock
is now used in place of a buffer in many places. This is doneonly to satisfy cloneability and will likely be changed (again) in a
followup.
concrete router picks an
prior Service implementations (liek retry).
Service
so it can use the samecaching infrastructure.
profile stack when it cannot be created successfully.
This change dramatically improves compilation times, likely due to
simpler type constraints in the service profile middleware.