Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overhaul buffering & caching to better-support backpressure #453

Merged
merged 3 commits into from
Mar 5, 2020

Conversation

olix0r
Copy link
Member

@olix0r olix0r commented Mar 4, 2020

This encompasses a number of related changes:

  • The router crate has been split up, decoupling its cache from its
    request-routing functionality. The router crate now only handles
    request routing and not caching.
  • The cache crate now exposes a MakeService<T, Req> (rather than a
    Service<Req>, returning clones of each inner service).
  • lock is now used in place of a buffer in many places. This is done
    only to satisfy cloneability and will likely be changed (again) in a
    followup.
  • Service profiles no longer use routers/buffers internally. The
    concrete router picks an
  • Many middlewares have new Proxy implementations, some of which replace
    prior Service implementations (liek retry).
  • DNS refinement is now modled as a Service so it can use the same
    caching infrastructure.
  • Fallback is restructured to, for instance, avoid using a service
    profile stack when it cannot be created successfully.

This change dramatically improves compilation times, likely due to
simpler type constraints in the service profile middleware.

This encompasses a number of related changes:

* The `router` crate has been split up, decoupling its `cache` from its
  request-routing functionality. The `router` crate now _only_ handles
  request routing and not caching.
* The `cache` crate now exposes a `MakeService<T, Req>` (rather than a
  `Service<Req>`, returning clones of each inner service).
* `lock` is now used in place of a buffer in many places. This is done
  only to satisfy cloneability and will likely be changed (again) in a
  followup.
* Service profiles no longer use routers/buffers internally. The
  concrete router picks an
* Many middlewares have new Proxy implementations, some of which replace
  prior Service implementations (liek retry).
* DNS refinement is now modled as a `Service` so it can use the same
  caching infrastructure.
* Fallback is restructured to, for instance, avoid using a service
  profile stack when it cannot be created successfully.

This change dramatically improves compilation times, likely due to
simpler type constraints in the service profile middleware.
@olix0r olix0r requested a review from a team March 4, 2020 19:43
@hawkw
Copy link
Contributor

hawkw commented Mar 4, 2020

@olix0r i notice that there were a number of changes to the Addr types and to metrics labels that don't seem directly linked to any of the changes you mentioned in the PR description. It's not a blocker but I'm curious about what changed & why.

use tower::load_shed::error as shed;
use tower_grpc::{self as grpc, Code};
use tracing::{debug, warn};
use tracing::debug;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nitpicky: it seems a little inconsistent to import debug by name, but use warn! prefixed with tracing:: in the same file. slight pref for doing the same for both.

{
fn make_inner(addr: SocketAddr, dst: &ControlAddr, mk_svc: &mut M) -> Self {
let target = client::Target {
addr,
server_name: dst.identity.clone(),
};

info_span!("control", peer.addr = %addr, peer.identity = ?dst.identity)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we still care about propagating the peer address/identity to the inner service's future? or is this added somewhere else?


/// Determines the dispatch deadline for a request.
pub trait Deadline<Req>: Clone {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

am i correct that all the deadline stuff was removed because most buffers are now locks, and we can limit the time spent waiting due to backpressure by timing out futures? just double-checking!


pub fn push_map_response<M: Clone>(self, map: M) -> Stack<stack::MapResponse<S, M>> {
self.push(stack::MapResponseLayer::new(map))
/// Validates that this stack serves T-typed targets.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"serves", or "makes new services for"?

@@ -9,6 +7,8 @@ use tracing_subscriber::{
reload, EnvFilter, FmtSubscriber,
};

const ENV_LOG: &str = "LINKERD2_PROXY_LOG";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did rustfmt decide to move this around?


impl fmt::Display for NoCapacity {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
write!(f, "router capacity reached ({})", self.0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should this now be

Suggested change
write!(f, "router capacity reached ({})", self.0)
write!(f, "cache capacity reached ({})", self.0)

Comment on lines +14 to +18
/// A layer that that builds a routing service.
///
/// A `Rec`-typed `Recognize` instance is used to produce a target for each
/// `Req`-typed request. If the router doesn't already have a `Service` for this
/// target, it uses a `Mk`-typed `Service` stack.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this looks like the same comment the old router layer...maybe:

Suggested change
/// A layer that that builds a routing service.
///
/// A `Rec`-typed `Recognize` instance is used to produce a target for each
/// `Req`-typed request. If the router doesn't already have a `Service` for this
/// target, it uses a `Mk`-typed `Service` stack.
/// A layer that that builds a cache service.

(and perhaps add some notes on the cache's eviction policy or something?)

@@ -74,15 +71,15 @@ where

fn call(&mut self, target: T) -> Self::Future {
let path = target.to_string();
trace!("resolve {:?}", path);
debug!(dst = %path, context=%self.context_token, "Resolving");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: weird spacing

Suggested change
debug!(dst = %path, context=%self.context_token, "Resolving");
debug!(dst = %path, context = %self.context_token, "Resolving");

@@ -112,6 +109,7 @@ where
.filter_map(|addr| pb::to_addr_meta(addr, &metric_labels))
.collect::<Vec<_>>();
if !addr_metas.is_empty() {
debug!(endpoints = %addr_metas.len(), "Add");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: if len is a usize, we shouldn't need to use Display. not a big deal in practice, but we could theoretically use the typed integer value for e.g. a metric of endpoint counts or something, in the future...

Suggested change
debug!(endpoints = %addr_metas.len(), "Add");
debug!(endpoints = addr_metas.len(), "Add");

Comment on lines +54 to +60
let mut addrs = overrides.iter();
if let Some(a) = addrs.next() {
write!(f, "{}", a)?;
}
for a in addrs {
write!(f, ", {}", a)?;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tioli: i think this could also be accomplished with Formatter::debug_list?

Copy link
Contributor

@kleimkuhler kleimkuhler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will continue looking through this change and I'm sure the follow-up on this will help clarify more, but from what I have seen this is looking to me.

linkerd/app/inbound/src/lib.rs Show resolved Hide resolved
// ready service.
.push_timeout(dispatch_timeout)
// Removes the override header after it has been used to
// determine a reuquest target.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: reuquest

Copy link
Contributor

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no obvious blockers --- the rationale behind the change makes sense. i commented on a few minor things.

linkerd/router/src/lib.rs Outdated Show resolved Hide resolved
}

mod sealed {
pub trait PollUpdate {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be nice if there was a comment here explaining what this is for...its purpose is not very obvious.

@olix0r olix0r merged commit 871e56c into master Mar 5, 2020
@olix0r olix0r deleted the ver/backpressure branch March 5, 2020 00:09
olix0r added a commit to linkerd/linkerd2 that referenced this pull request Mar 5, 2020
This release includes a significant internal change to how backpressure
is handled in the proxy. These changes fix a class of bugs related to discovery
staleness, and it should be rarer to encounter "dispatch timeout"
errors.

---

* orig-proto: Be more flexible to stack placement (linkerd/linkerd2-proxy#444)
* Remove Clone requirement from controller clients (linkerd/linkerd2-proxy#449)
* server: Simplify HTTP server type constraints (linkerd/linkerd2-proxy#450)
* Overhaul buffering & caching to better-support backpressure (linkerd/linkerd2-proxy#453)
cpretzer pushed a commit to linkerd/linkerd2 that referenced this pull request Mar 5, 2020
This release includes a significant internal change to how backpressure
is handled in the proxy. These changes fix a class of bugs related to discovery
staleness, and it should be rarer to encounter "dispatch timeout"
errors.

---

* orig-proto: Be more flexible to stack placement (linkerd/linkerd2-proxy#444)
* Remove Clone requirement from controller clients (linkerd/linkerd2-proxy#449)
* server: Simplify HTTP server type constraints (linkerd/linkerd2-proxy#450)
* Overhaul buffering & caching to better-support backpressure (linkerd/linkerd2-proxy#453)
cpretzer added a commit to linkerd/linkerd2 that referenced this pull request Mar 5, 2020
* proxy: v2.88.0

This release includes a significant internal change to how backpressure
is handled in the proxy. These changes fix a class of bugs related to discovery
staleness, and it should be rarer to encounter "dispatch timeout"
errors.

---

* orig-proto: Be more flexible to stack placement (linkerd/linkerd2-proxy#444)
* Remove Clone requirement from controller clients (linkerd/linkerd2-proxy#449)
* server: Simplify HTTP server type constraints (linkerd/linkerd2-proxy#450)
* Overhaul buffering & caching to better-support backpressure (linkerd/linkerd2-proxy#453)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants