http: prefetch for upstreams #14143

alyssawilk · 2020-11-23T13:58:56Z

Commit Message: Adding predictive prefetch (useful mainly for HTTP/1.1 and TCP proxying) and uncommenting prefetch config.
Additional Description:
Risk Level: low (mostly config guarded)
Testing: unit, integration tests
Docs Changes: APIs unhidden
Release Notes: inline
Fixes #2755

repokitteh-read-only · 2020-11-23T13:59:02Z

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to api/envoy/.
CC @envoyproxy/api-watchers: FYI only for changes made to api/envoy/.

🐱

Caused by: #14143 was opened by alyssawilk.

see: more, trace.

alyssawilk · 2020-11-23T14:06:55Z

Tagging @antoniovicente for API discussion and code review (which I think is mostly ready to look at) but I don't want this PR landing until we have a solid API plan for mobile and I need some help from the mobile folks on that (cc @mattklein123 @junr03 @goaway)

For server-side prefetch, I think we generally want to have load-adjusted prefetch which is what is implemented. As the rate of queries go up, more connections are prefetched.
For mobile, AFIK for HTTP/1.1 we want to prefetch a fixed (maybe configurable, but fixed) number of connections per logical endpoint. (last I was paying attention was years ago, when I think it was usually either 6 or 8). Basically if you have one request for www.google.com you establish 6 connections on the assumption there will be a bunch of follow-up requests, and you want to head off latency issues. So I think for one of these fields we're going to want to have a one-of between ratio-based and fixed-config.

What I'm not sure of, is what that looks like for mobile today, is which one. I assume we're going to do DNS resolution, get back a few results, but only talk to one IPv4 or IPv6 address (not load balance across DNS results). If so I think we'd want to make the per-host prefetch either ratio-based or fixed, rather than the per-cluster prefetch?

alyssawilk · 2020-11-23T14:09:34Z

Or maybe we want this server-side as well, for the moral equivalent of min_connections_picked and it shouldn't be a one-of? Thoughts?

…

On Mon, Nov 23, 2020 at 8:59 AM repokitteh[bot] ***@***.***> wrote: CC @envoyproxy/api-shepherds: Your approval is needed for changes made to api/envoy/. CC @envoyproxy/api-watchers: FYI only for changes made to api/envoy/. 🐱 Caused by: #14143 <#14143> was opened by alyssawilk. see: more <#14143>, trace <https://prod.repokitteh.app/traces/ui/envoyproxy/envoy/07969190-2d94-11eb-902b-0c021e520f07> . — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#14143 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AELALPO2K7K7I7PA77GG6O3SRJTDRANCNFSM4T7QWNWQ> .

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

mattklein123 · 2020-11-23T19:19:21Z

Or maybe we want this server-side as well, for the moral equivalent of
min_connections_picked and it shouldn't be a one-of? Thoughts?

Having not thought about this a ton my initial feeling would be to not special case mobile at all and just have the right knobs that allow the user to select between load based and fixed as you mention? I can take a look at the API also if you want.

alyssawilk · 2020-11-23T20:09:47Z

yeah, I don't think config would be server or client specific, just that one of these (either per-host or per-upstream) needs to be a oneof to handle the fixed prefetching case.
I could just solve the problem by making them both one-of and we can add the relevant one for mobile and leave the other as to-be-implemented-iff-needed?

mattklein123 · 2020-11-23T22:31:01Z

I could just solve the problem by making them both one-of and we can add the relevant one for mobile and leave the other as to-be-implemented-iff-needed?

I think that sounds reasonable at a high level but I can take a look at the proposed API once @antoniovicente does a pass.

antoniovicente · 2020-11-23T23:36:24Z

Tagging @antoniovicente for API discussion and code review (which I think is mostly ready to look at) but I don't want this PR landing until we have a solid API plan for mobile and I need some help from the mobile folks on that (cc @mattklein123 @junr03 @goaway)

For server-side prefetch, I think we generally want to have load-adjusted prefetch which is what is implemented. As the rate of queries go up, more connections are prefetched.
For mobile, AFIK for HTTP/1.1 we want to prefetch a fixed (maybe configurable, but fixed) number of connections per logical endpoint. (last I was paying attention was years ago, when I think it was usually either 6 or 8). Basically if you have one request for www.google.com you establish 6 connections on the assumption there will be a bunch of follow-up requests, and you want to head off latency issues. So I think for one of these fields we're going to want to have a one-of between ratio-based and fixed-config.

What I'm not sure of, is what that looks like for mobile today, is which one. I assume we're going to do DNS resolution, get back a few results, but only talk to one IPv4 or IPv6 address (not load balance across DNS results). If so I think we'd want to make the per-host prefetch either ratio-based or fixed, rather than the per-cluster prefetch?

Are you talking about a prefetch limit or limit on the number of active connections to a domain? Seems like the Chrome limits are something like 6 connections per host name, and 10 connections total per page load. I don't remember what chrome and others browsers did as regarding prefetching. That said, should we expect HTTP/1.1 from mobile devices? Ideally the client would use HTTP2 or QUIC for which a single connection ought to be sufficient due to multiplexing. Possible exception being websocket connections.

antoniovicente

Looks pretty good. I think my main worry is the change to limit prefetches across the board based on the number of healthy hosts.

antoniovicente · 2020-11-23T23:41:15Z

source/common/upstream/cluster_manager_impl.cc

    ConnectionPool::Instance* prefetch_pool = pick_prefetch_pool();
    if (prefetch_pool) {
-      prefetch_pool->maybePrefetch(cluster_entry->cluster_info_->peekaheadRatio());
+      if (!prefetch_pool->maybePrefetch(cluster_entry->cluster_info_->peekaheadRatio())) {


nit: if (!prefetch_pool->maybePrefetch(peekahead_ratio) {

antoniovicente · 2020-11-24T00:12:10Z

source/common/upstream/cluster_manager_impl.cc

+  // 3 here is arbitrary. Just as in ConnPoolImplBase::tryCreateNewConnections
+  // we want to limit the work which can be done on any given prefetch attempt.
+  for (int i = 0; i < 3; ++i) {
+    if ((state.pending_streams_ + 1 + state.active_streams_) * peekahead_ratio <=


Why the +1 in the expression above?

I see a similar +1 in ConnPoolImplBase::shouldCreateNewConnection when doing global prefetch but not when doing per-upstream prefetch

It seems that this is mimcing the global prefetch logic in shouldCreateNewConnection

Worth a comment that references shouldCreateNewConnection?

Added a comment - lmk if that doesn't clarify.

FWIW I've read this a few times and I'm still struggling a bit on what we are comparing. When I see <= my eye wants to not return but we do return. Perhaps invert? Up to you.

source/common/upstream/load_balancer_impl.cc

antoniovicente · 2020-11-24T00:45:19Z

source/common/upstream/load_balancer_impl.cc

@@ -774,6 +780,10 @@ void EdfLoadBalancerBase::refresh(uint32_t priority) {
 }

 HostConstSharedPtr EdfLoadBalancerBase::peekAnotherHost(LoadBalancerContext* context) {
+  if (stashed_random_.size() + 1 > total_healthy_hosts_) {
+    return nullptr;
+  }


Is 1 the right max ratio of prefetched connections to healthy hosts? I imagine that when the number of endpoints is small it would be beneficial to set this ratio > 1.0, specially if host weights are not all equal.

This one isn't a ratio thing - we currently cap #prefetches to the number of healthy hosts.

Are there plans to relax that restriction? It seems to get in the way of getting to a fixed number of connections in cases where the number of healthy upstreams is less than the desired number of connections.

as needed. Right now I was aiming at the server side, where for low QPS you'd genreally prefetch a few, and for high qps you'd want per upstream but as we move towards mobile if we only have a single upstream per endpoint we'd want to have more than 1 prefetch.
I don't want it unlimited because it's wasteful and it acts as a useful upper bound.

test/common/upstream/cluster_manager_impl_test.cc

antoniovicente · 2020-11-24T01:04:13Z

test/integration/integration_test.cc

+    responses.pop_front();
+    clients.front()->close();
+    clients.pop_front();
+  }


Are there some assertions we can make on the total number of upstream connections created based on prefetch parameters?

yeah, basically the connections with no streams shouldn't exceed the streams times the prefetch ratio, but I don't think there's a non-race way to get that data.

I guess it would be possible to create 10 clients and wait until number of established upstream connections is 15 (#clients + #healthy_upstreams)

Given the loop above I think that the only assertion we can make is that the number of upstream connections is less than max(clients.size()) + num_healthy_hosts since clients are also being removed.

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

alyssawilk · 2020-12-02T14:37:12Z

api/envoy/config/cluster/v3/cluster.proto

@@ -589,7 +589,6 @@ message Cluster {
    google.protobuf.Duration max_interval = 2 [(validate.rules).duration = {gt {nanos: 1000000}}];
  }

-  // [#not-implemented-hide:]
  message PrefetchPolicy {


So I know for mobile we're going to want to have fixed prefetch numbers (prefetch 6/8 connections), so I was thinking to make it a oneof (ratio, fixed_prefetch)
But I think the fixed_prefetch will end up being the moral equivalent of min_connections_picked, at which point we should leave this as-is and allow extending it. SG @antoniovicente

alyssawilk · 2020-12-02T14:41:43Z

api/envoy/config/cluster/v3/cluster.proto

@@ -589,7 +589,6 @@ message Cluster {
    google.protobuf.Duration max_interval = 2 [(validate.rules).duration = {gt {nanos: 1000000}}];
  }

-  // [#not-implemented-hide:]
  message PrefetchPolicy {


And then one question for @mattklein123 , I assuming we do end up with at minimum
fixed per-cluster-prefetch
ratio per-cluster-prefetch
ratio per-upstream prefetch

lmk if you think it's worth having wrapper messages for per-upstream and per-cluster.

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

goaway · 2020-12-03T13:41:04Z

This is a nit, but how married are we to the name "prefetch"? I feel it's somewhat overloaded with h2's (etc.) notion of push and could be confusing. What about something like "prewarm" or "proactive_connect/connections"

alyssawilk · 2020-12-03T14:54:34Z

hm, I think of prewarm as a different action, where you send data down a connection early to get the cwnd set up correctly. If anyone else thinks prefetch could get conflated with push I'm up for trying to come up with a better name.

antoniovicente

Changes look good. Would be good to add some additional e2e integration tests, future PR seems fine.

antoniovicente · 2020-12-08T02:21:37Z

test/integration/integration_test.cc

+    responses.pop_front();
+    clients.front()->close();
+    clients.pop_front();
+  }


I guess it would be possible to create 10 clients and wait until number of established upstream connections is 15 (#clients + #healthy_upstreams)

Given the loop above I think that the only assertion we can make is that the number of upstream connections is less than max(clients.size()) + num_healthy_hosts since clients are also being removed.

alyssawilk · 2020-12-08T19:37:21Z

@mattklein123 or @snowp up for second pass?

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

mattklein123

Thanks for moving out the renames. I have more confusion about some of the logic but otherwise LGTM. Thank you.

/wait

mattklein123 · 2020-12-18T18:06:40Z

source/common/upstream/cluster_manager_impl.cc

+    if ((state.connecting_stream_capacity_ + state.active_streams_) >
+        (state.pending_streams_ + 1 + state.active_streams_) * peekahead_ratio) {


I'm feeling very dense, but I'm having a really hard time with this logic. Two things:

We are mixing streams and connections. When you add one, do we need to be taking into account the number of streams per connection? It seems like there is math missing here but maybe I'm not understanding it (likely). (I have additional confusion about why we are adding +1 on this side of the equation vs. the other side, but maybe that will be more clear once we talk about connections vs. streams).

Is it possible to lift this logic out into a small helper shared with the same logic in shouldCreateNewConnection and add a bunch more comments? I think that would help me. (Same question about +1 in that function)

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

alyssawilk · 2021-01-11T17:50:41Z

ok, utility function added. This (+comments) make more sense now?

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

mattklein123

Thanks for the helper utility. It's becoming more clear to me. :) A few more questions/comments.

/wait

mattklein123 · 2021-01-12T21:38:33Z

source/common/conn_pool/conn_pool_base.h

+  // If anticipate_incoming_stream is true this assumes a call to newStream is
+  // pending, which is true for global preconnect.
+  static bool shouldConnect(size_t pending_streams, size_t active_streams,
+                            uint32_t connecting_capacity, float preconnect_ratio,


Pedantically for this to make sense this should be connecting_and_connected_capacity, right? Since it's not just about active streams, it's about all excess capacity and all pending or existing connections? If that is correct maybe update so it's more clear?

mattklein123 · 2021-01-12T21:40:30Z

source/common/conn_pool/conn_pool_base.cc

+  return shouldConnect(pending_streams_.size(), num_active_streams_, connecting_stream_capacity_,
+                       perUpstreamPreconnectRatio());


Is the idea here basically to use either the global ratio or the per-upstream ratio to decide on whether to make a pre-connect for this host only? If so would it be more clear to call this once and take the max of the global ratio and the per-upstream ratio? I'm confused why we would pass true for anticipate above but not here?

fundamentally there's two types of prefetching, local and global. local is done every time there's a connection change, so will be adequately prefetched. It doesn't need to anticipate as it's called after new streams are assigned. global does need to anticipate, or incoming traffic to upstream A could never prefetch a connection for B (given the streams in B would be zero, it'd zero out the function). So this function will only meaningfully call one of the two blocks. I'll make it more clear with an if-else and add some more comments about the +1.

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

mattklein123

Thanks, LGTM!

mattklein123 · 2021-01-12T22:40:28Z

source/common/conn_pool/conn_pool_base.cc

+    // prefetching for the next upcoming stream, which will likely be assigned to this pool.
+    // We may eventually want to track preconnect_attempts to allow more preconnecting for
+    // heavily weighted upstreams or sticky picks.
+    return shouldConnect(pending_streams_.size(), num_active_streams_, connecting_stream_capacity_,


In a future change you might consider renaming connecting_stream_capacity_ also to make it match connecting and connected but no need to block this change on that.

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

htuch

/lgtm api

antoniovicente

Thanks for improving pre-connect functionality.

@lambdai

Signed-off-by: qinggniq <livewithblank@gmail.com> add auth helper Signed-off-by: qinggniq <livewithblank@gmail.com> refactor mysql server greeting codec Signed-off-by: qinggniq <livewithblank@gmail.com> codec encode Signed-off-by: qinggniq <livewithblank@gmail.com> add unit test for mysql greeting codec Signed-off-by: qinggniq <livewithblank@gmail.com> complete login resp codec Signed-off-by: qinggniq <livewithblank@gmail.com> add login message test Signed-off-by: qinggniq <livewithblank@gmail.com> feat link error Signed-off-by: qinggniq <livewithblank@gmail.com> refactor mysql login resp Signed-off-by: qinggniq <livewithblank@gmail.com> pass all codec test Signed-off-by: qinggniq <livewithblank@gmail.com> only contain codec change Signed-off-by: qinggniq <livewithblank@gmail.com> remove header Signed-off-by: qinggniq <livewithblank@gmail.com> http: expose encoded headers/trailers via callbacks (envoyproxy#14544) In order to support upstream filters passing ownership of headers and then being able to reference them after the fact, expose a HTTP filter function that allows reading the header maps back. Signed-off-by: Snow Pettersen <snowp@lyft.com> Implement request header processing in ext_proc (envoyproxy#14385) Send request headers to the server and apply header mutations based on the response. The rest of the protocol is still ignored. Signed-off-by: Gregory Brail <gregbrail@google.com> 1.17.0 release (envoyproxy#14624) Signed-off-by: Matt Klein <mklein@lyft.com> kick off v1.18.0 (envoyproxy#14637) Signed-off-by: Matt Klein <mklein@lyft.com> filter manager: drop assert (envoyproxy#14633) Signed-off-by: Raul Gutierrez Segales <rgs@pinterest.com> docs: update ext_proc docs to reflect implementation status (envoyproxy#14636) Signed-off-by: Gregory Brail <gregbrail@google.com> [tls] Expose ServerContextImpl::selectTlsContext (envoyproxy#14592) Signed-off-by: Chad Retz <chad.retz@stackpath.com> http: support creating filters with match tree (envoyproxy#14430) Adds support for wrapping a HTTP filter with an ExtensionWithMatcher proto to create the filters with an associated match tree. Under the hood this makes use of a wrapper filter factory that manages creating the match tree and adding it to the FM alongside the associated filter. Also includes some code to register factories for input/actions, allowing them to be referenced in the proto configuration. Signed-off-by: Snow Pettersen <snowp@lyft.com> fix empty connection debug logs (envoyproxy#14666) Fixes envoyproxy#14661 Signed-off-by: Rama Chavali <rama.rao@salesforce.com> tcp_proxy: ignore transfer encoding in HTTP/1.1 CONNECT responses (envoyproxy#14623) Commit Message: Ignore the transfer encoding header in CONNECT responses Additional Description: NONE Risk Level: low Testing: integration test Docs Changes: NONE Release Notes: https://github.com/irozzo-1A/envoy/blob/ignore-transfer-encoding/docs/root/version_history/current.rst#new-features Platform Specific Features: NONE Fixes envoyproxy#11308 Signed-off-by: Iacopo Rozzo <iacopo@kubermatic.com> ci: fix docs tag build (envoyproxy#14653) Signed-off-by: Lizan Zhou <lizan@tetrate.io> HTTP health checker: handle GOAWAY from HTTP2 upstreams (envoyproxy#13599) Makes the HTTP health checker handle GOAWAY properly. When the NO_ERROR code is received, any in flight request will be allowed to complete, at which time the connection will be closed and a new connection created on the next interval. GOAWAY frames with codes other than NO_ERROR are treated as a health check failure, and immediately close the connection. Signed-off-by: Michael Puncel <mpuncel@squareup.com> upstream: clean up feature parsing code (envoyproxy#14629) Fixing a perfectly safe and fairly terrible version merge in the ALPN pr the "refactor all upstream config" PRs. the original code created the new options for new config, and parseFeatures handled parsing features from either the new options, or the old config. I decided that was too complicated, changed the code to always create the new options struct and forgot to clean up parseFeatures to assume the presence of the new options struct and remove handling things the old style way. Risk Level: low (clean up inaccessible code) Testing: added one extra unit test just because Docs Changes: n/a Release Notes: n/a Signed-off-by: Alyssa Wilk <alyssar@chromium.org> upstream: force a full rebuild on host weight changes (envoyproxy#14569) This will allow us to build load balancers that pre-compute data structures based on host weights (for example using weighted queues), to work around some of the deficiencies of EDF scheduling. This behavior can be temporarily disabled by setting the envoy.reloadable_features.upstream_host_weight_change_causes_rebuild feature flag to false. Fixes envoyproxy#14360 Signed-off-by: Matt Klein <mklein@lyft.com> access log: add support for command formatter extensions (envoyproxy#14512) Signed-off-by: Raul Gutierrez Segales <rgs@pinterest.com> test: improving dynamic_forward_proxy coverage (envoyproxy#14672) Risk Level: n/a (test only) Signed-off-by: Alyssa Wilk <alyssar@chromium.org> access_logs: removing disallow_unbounded_access_logs (envoyproxy#14677) Signed-off-by: Alyssa Wilk <alyssar@chromium.org> wasm: replace the obsolete contents in wasm-cc's README with docs link (envoyproxy#14628) Signed-off-by: Kenjiro Nakayama <nakayamakenjiro@gmail.com> grpc-json-transcoder: support root path (envoyproxy#14585) Signed-off-by: Xuyang Tao <taoxuy@google.com> ecds: add config source for network filter configs (envoyproxy#14674) Signed-off-by: Kuat Yessenov <kuat@google.com> fix comment for parameters end_stream of decodeData/encodeData. (envoyproxy#14620) Signed-off-by: wangfakang <fakangwang@gmail.com> [fuzz] Fix bugs in HPACK fuzz test (envoyproxy#14638) - Use after free because nghttp2_nv object has pointers to the underlying strings and copying them resulted in a use after free when the copy was used after the original was destroyed - Fixed sorting issues and tested leading/trailing whitespace headers (I can no longer reproduce an issue I saw where a null byte appeared after decoding whitespace, maybe the former fix fixed this) Risk Level: Low Testing: Added regression tests and cases for whitespace headers Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=28880 https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=28869 Signed-off-by: Asra Ali <asraa@google.com> v3 packages updates for omit_canary_hosts proto (envoyproxy#14117) Risk Level: LOW Testing: unit ( proto_format and docs ) part of envoyproxy#12841 Signed-off-by: Abhay Narayan Katare <abhay.katare@india.nec.com> streaminfo/mocks: delay filter_state_ dereference (envoyproxy#14612) By dereferencing filter_state_ in the constructor, any test that sets filter_state_ will dereference an invalid pointer. This may not be a common use-case, but it came up when writing some microbenchmarks for a custom filter where I needed to reset the FilterState on each iteration of the benchmark. Signed-off-by: Brian Wolfe <brian.wolfe@airbnb.com> http: support passing match result action to filter (envoyproxy#14462) Adds support for passing through a match action from a match tree to the associated HTTP filter. Some care has to be taken here around dual filters, so we introduce an abstraction that moves handling HttpMatchingData updates and applying the match result into a FilterMatchState object that is shared between all filter wrappers for a given filter. This should also avoid having to match twice for dual filters: the match result is shared for both filters, instead of both of them having to independently arrive at it with the same data. Signed-off-by: Snow Pettersen <snowp@lyft.com> refactor: use unitfloat in more places (envoyproxy#14396) Commit Message: Use UnitFloat in place of float in more locations Additional Description: UnitFloat represents a floating point value that is guaranteed to be in the range [0, 1]. Use it in place of floats that also have the same expectation in OverloadActionState and connection listeners. This PR introduces no functional changes. Risk Level: low Testing: ran affected tests Docs Changes: n/a Release Notes: n/a Platform Specific Features: n/a Signed-off-by: Alex Konradi <akonradi@google.com> [tls] add missing built in cipher stat names (envoyproxy#14676) * add missing ciphers Signed-off-by: Asra Ali <asraa@google.com> docs: adding coverage walkthroguh (envoyproxy#14688) Risk Level: n/a Testing: n/a Docs Changes: adding developer docs Release Notes: n/a Signed-off-by: Alyssa Wilk <alyssar@chromium.org> local ratelimit: Add descriptor support in HTTP Local Rate Limiting (envoyproxy#14588) Signed-off-by: Kuat Yessenov <kuat@google.com> Co-authored-by: gargnupur <gargnupur@google.com> http: prefetch for upstreams (envoyproxy#14143) Commit Message: Adding predictive prefetch (useful mainly for HTTP/1.1 and TCP proxying) and uncommenting prefetch config. Additional Description: Risk Level: low (mostly config guarded) Testing: unit, integration tests Docs Changes: APIs unhidden Release Notes: inline Fixes envoyproxy#2755 Signed-off-by: Alyssa Wilk <alyssar@chromium.org> docs: Give a hint to specify type_url instead (envoyproxy#14562) Signed-off-by: Dhi Aurrahman <dio@rockybars.com> Remove flaky_on_windows tag from proxy_filter_integration_test (envoyproxy#14680) Testing: Ran proxy_filter_integration_test thousands of times Signed-off-by: Randy Miller <rmiller14@gmail.com> upstream: Fix moving EDS hosts between priorities. (envoyproxy#14483) At present if health checks are enabled and passing then moving an EDS host from P0->P1 is a NOOP, and P1->P0 results in an abort. In the first case: * P0 processing treats A as being removed because it's not in P0's list of endpoints anymore. * P0 marks A's existing Host as PENDING_DYNAMIC_REMOVAL. It marks A as having been updated in this config apply. * P1 skips over A because it is marked as updated in this update cycle already. In the second case: * P0 updates the priority on the existing Host. It is appended to the vector of added hosts. * P1 marks A's existing Host as PENDING_DYNAMIC_REMOVAL. It does adjust the removed host vector as the host is still pending removal. * A's Host is now in both priorities and is PENDING_DYNAMIC_REMOVAL. This is wrong, and would cause problems later but doesn't have a chance to because: * onClusterMemberUpdate fires with A's existing Host in the added vector (remember it wasn't removed from P1!) * HealthChecker attempts to create a new health check session on A, which results in an abort from the destructor of the already existing one. This was masked in tests by the tests enabling ignore_health_on_host_removal. We fix this by passing in the set of addresses that appear in the endpoint update. If a host being considered for removal appears in this set, and it isn't being duplicated into the current priority as a result of a health check address change, then we assume it's being moved and will immediately remove it. To simplify the case where a host's health check address is being changed AND it is being moved between priorities we always apply priority moves in place before we attempt any other modifications. This means such a case is treated as a separate priority move, followed by the health check address change. fixes envoyproxy#11517 Signed-off-by: Jonathan Oddy <jonathan.oddy@transferwise.com> examples: Add TLS SNI sandbox (envoyproxy#13975) Signed-off-by: Ryan Northey <ryan@synca.io> [utility]: Change behavior of main thread verification utility (envoyproxy#14660) Currently, the isMainThread function can only be called during the lifetime of thread local instance because the singleton that store main thread id is initialized in the constructor of tls instance and cleared in the destructor of tls instance. Change the utility so that outside the lifetime of tls instance, the function return true by default because everything is in main thread when threading is off. Risk Level: low Testing: change unit to reflect change of behavior. Signed-off-by: chaoqin-li1123 <chaoqin@uchicago.edu> Windows build: Add repository cache to CI (envoyproxy#14678) Signed-off-by: Sunjay Bhatia <sunjayb@vmware.com> ext-proc: Support "immediate_response" options for request headers (envoyproxy#14652) This lets ext_proc servers return an immediate HTTP response (such as to indicate an error) in response to a request_headers message. Signed-off-by: Gregory Brail <gregbrail@google.com> stats: convert tag extractor regexs to Re2 (envoyproxy#14519) Risk Level: high, the regexes are updated to match more specific patterns. Testing: unit tests Fixes envoyproxy#14439 Signed-off-by: Dmitry Rozhkov <dmitry.rozhkov@intel.com> hcm: removing envoy.reloadable_features.early_errors_via_hcm envoyproxy#14641 (envoyproxy#14684) Risk Level: Low (removal of deprecated disabled code) Testing: n/a Docs Changes: n/a Release Notes: inline Fixes envoyproxy#14641 Signed-off-by: Alyssa Wilk <alyssar@chromium.org> test: Fix O(1/32k) flakiness in H2 flood tests that disable writes based on source port of outgoing connections. (envoyproxy#14695) It is possible for the kernel to assign the same source port to both the client connection used by the test framework to connect to the Envoy and the Envoy's client connection to the upstream. When the source port is reused by both connections, the test client times out while trying to send the request because disabling write on the upstream connection also disabled writes on the test's client connection. Signed-off-by: Antonio Vicente <avd@google.com> tls: add missing stats for signature algorithms. (envoyproxy#14703) While there, refresh supported cipher suites and add more warnings. Signed-off-by: Piotr Sikora <piotrsikora@google.com> connection: tighten network connection buffer limits (envoyproxy#14333) Signed-off-by: Antonio Vicente <avd@google.com> xdstp: LDS glob collection support. (envoyproxy#14311) This patch introduces support for LDS xdstp:// collection URLs for glob collections over ADS. Context parameters are currently computed from node and resource URLs. Followup PRs will add support for other collection types (CDS, SRDS), non-ADS, provide dynamic context parameter update, extend support to singleton resources and then other xdstp:// features (list collections, redirects, alternatives, etc.) Part of envoyproxy#11264. Risk level: Low (opt-in) Testing: ADS integration test added. Various unit tests following implementation. Signed-off-by: Harvey Tuch <htuch@google.com> listener manager: avoid unique -> shared conversion (envoyproxy#14693) buildFilterChainInternal() returns a shared_ptr, so let's make that instead of unique_ptr. Signed-off-by: Raul Gutierrez Segales <rgs@pinterest.com> proto: re-implement RepeatedPtrUtil::hash(). (envoyproxy#14701) This changes RepeatedPtrUtil::hash() implementation to match MessageUtil::hash(), which was re-implemented in envoyproxy#8231. Reported by Tomoaki Fujii (Google). Signed-off-by: Piotr Sikora <piotrsikora@google.com> tcp: setting nodelay on all connections (envoyproxy#14574) This should have minimal effect, new server side connections had no-delay, codecs set no-delay, and upstream pools set no-delay. Traffic not using the tcp connection pool may be affected as well as raw use of the TCP client. Risk Level: Medium (data plane) Testing: new unit tests Docs Changes: n/a Release Notes: inline Runtime guard: envoy.reloadable_features.always_nodelay Signed-off-by: Alyssa Wilk <alyssar@chromium.org> test: Add multiheader TE + Content-Length test (envoyproxy#14686) Signed-off-by: Yan Avlasov <yavlasov@google.com> http2: Flip the upstream H2 frame flood and abuse checks to ON by default (envoyproxy#14443) Signed-off-by: Yan Avlasov <yavlasov@google.com> Fix the emsdk patching. (envoyproxy#14673) If the patch fails, because of `|| true`, bazel continues the build. Signed-off-by: Jonh Wendell <jonh.wendell@redhat.com> test: print test parameters meaningfully (envoyproxy#14604) Signed-off-by: Alex Konradi <akonradi@google.com> Migrate v2 thrift_filter to v3 api and corresponding docs changes. (envoyproxy#13885) part of envoyproxy#12841 Signed-off-by: Abhay Narayan Katare <abhay.katare@india.nec.com> http: reinstating prior connect timeout behavior (envoyproxy#14685) Signed-off-by: Alyssa Wilk <alyssar@chromium.org> Fix typo (envoyproxy#14716) Signed-off-by: Hu Shuai <hus.fnst@cn.fujitsu.com> master -> main (envoyproxy#14729) Various fixes Signed-off-by: Matt Klein <mklein@lyft.com> readme: fix logo URL (envoyproxy#14733) Signed-off-by: Matt Klein <mklein@lyft.com> Bump nghttp2 to 1.42.0 (envoyproxy#14730) - Drops nghttp2 PR1468 patch - Requires bazel_external_cmake to support copts, defines to drop the rest Risk Level: low Testing: CI Fixes envoyproxy#1417 Signed-off-by: William A Rowe Jr <wrowe@vmware.com> Pick up current bazel-build-tools tag (envoyproxy#14734) Signed-off-by: William A Rowe Jr <wrowe@vmware.com> access-logger: support request/response headers size (envoyproxy#14692) Add following command operator in access logger %REQUEST_HEADER_BYTES% %RESPONSE_HEADER_BYTES% %RESPONSE_TRAILER_BYTES% Risk Level: Low Testing: unit test Docs Changes: done Release Notes: done Signed-off-by: Xuyang Tao <taoxuy@google.com> dynamic_forward_proxy: envoy.reloadable_features.enable_dns_cache_circuit_breakers deprecation (envoyproxy#14683) * dynamic_forward_proxy: deprecation Signed-off-by: Shikugawa <rei@tetrate.io> Add support for google::protobuf::ListValue formatting (envoyproxy#14518) Signed-off-by: Itamar Kaminski <itamark@google.com> tls: improve TLS handshake/read/write error log (envoyproxy#14600) Signed-off-by: Shikugawa <Shikugawa@gmail.com> config: switch from std::set to absl::flat_hash_set for resource names. (envoyproxy#14739) This was a cleanup deferred from the review of envoyproxy#14311. The idea is to switch to the more efficient unordered absl::flat_hash_set across the resource subscription code base. Internally, we still use std::set (and even explicitly sort in the http_subscription_impl) to avoid changing any wire ordering. It seems desirable to preserve this for two reasons: (1) this derisks this PR as an internal-only change and (2) having deterministic wire ordering makes debug of xDS issues somewhat easier. Risk level: Low Testing: Updated tests. Signed-off-by: Harvey Tuch <htuch@google.com> network filters: avoid unnecessary std::shared_ptrs (envoyproxy#14711) While debugging a crash in: envoyproxy#13592 I ended up discussing with @lambdai and @mattklein123 whether network filters can hold references to things owned by their corresponding FactoryFilterCb. The answer is yes and the HCM and some other notable filters already use references instead of std::shared_ptrs. So let's consistently do this everywhere to avoid someone else asking this same question in the future. Plus, it's always nice to create fewer std::shared_ptrs. Follow-up on: envoyproxy#8633 Signed-off-by: Raul Gutierrez Segales <rgs@pinterest.com> docs: Updated version history with 1.13.8 release notes. (envoyproxy#14742) Signed-off-by: Christoph Pakulski <christoph@tetrate.io> Dispatcher: keeps a stack of tracked objects. (envoyproxy#14573) Dispatcher will now keep a stack of tracked objects; on crash it'll "unwind" and have those objects dump their state. Moreover, it'll invoke fatal actions with the tracked objects. This allows us to dump more information during crash. See related PR: envoyproxy#14509 Will follow up with another PR dumping information at the codec/parser level. Signed-off-by: Kevin Baichoo <kbaichoo@google.com> bootstrap-extensions: fix a crash on http callout (envoyproxy#14478) Currently when the ServerFactoryContext is passed to bootstrap extensions, it is only partially initialized. Specifically, attempting to access the cluster manager will cause a nullptr access (and hence a crash) This PR splits the creation and initialized to 2 seperate fucntions. Early creation is required to not break the `default_socket_interface` feature. Once created, the extension will receive the ServerFactoryContext in a different callback (the newly added `serverInitialized`), once they are fully initialized. Commit Message: Fix a crash that happens when bootstrap extensions perform http calls. Additional Description: Risk Level: Low (small bug-fix) Testing: Unit tests updated; tested manually with the changes as well. Docs Changes: N/A Release Notes: N/A Fixes envoyproxy#14420 Signed-off-by: Yuval Kohavi <yuval.kohavi@gmail.com> overload: create scaled timers via the dispatcher (envoyproxy#14679) Refactor the existing pathway for creating scaled Timer objects away from the ThreadLocalOverloadState and into the Dispatcher interface. This allows scaled timers to be created without plumbing through a bunch of extra state. Signed-off-by: Alex Konradi <akonradi@google.com> http: removing nvoy.reloadable_features.fix_upgrade_response envoyproxy#14643 (envoyproxy#14706) Risk Level: Low (removing deprecated disabled code) Testing: n/a Docs Changes: n/a Release Notes: inline Fixes envoyproxy#14643 Signed-off-by: Alyssa Wilk <alyssar@chromium.org> http: removing envoy.reloadable_features.fixed_connection_close (envoyproxy#14705) Risk Level: Low (removing deprecated guarded code) Testing: n/a Docs Changes: n/a Release Notes: inline Fixes envoyproxy#14645 Signed-off-by: Alyssa Wilk <alyssar@chromium.org> Revert "network filters: avoid unnecessary std::shared_ptrs (envoyproxy#14711)" (envoyproxy#14755) This reverts commit 72db81d. Per discussion in envoyproxy#14717 and via Slack, we'll come up with a different approach since using a std::function to keep state presents a few challenges. Signed-off-by: Raul Gutierrez Segales <rgs@pinterest.com> Clarify Consecutive Gateway Failure docs (envoyproxy#14738) It was initially unclear to me that when split_external_local_origin_errors is in the default setting of false that local origin failures will be counted as Consecutive Gateway Failures. It is clear above that they are counted by the Consecutive 5xx detection type but since I had that disabled I was surprised to find them counted in Consecutive Gateway Failure. I think the logic makes sense though so just attempting to clarify the docs here. Signed-off-by: Matthew Mead-Briggs <mmb@yelp.com> thrift proxy: fix crash when using payload_passthrough (envoyproxy#14723) We started seeing crashes triggered by ConnectionManager::passthroughEnabled() once we enabled `payload_passthrough`. That code assumes that there will _always_ be an active RPC. However, this is not true after a local response has been sent (e.g.: no healthy upstream, no cluster, no route, etc.). Risk Level: low Testing: unit tests added Doc Changes: n/a Release Notes: n/a Signed-off-by: Raul Gutierrez Segales <rgs@pinterest.com> thrift proxy: add comments explaining local replies (envoyproxy#14754) Risk Level: low Testing: n/a Docs Changes: n/a Release Notes: n/a Signed-off-by: Raul Gutierrez Segales <rgs@pinterest.com> wasm: update V8 to v8.9.255.6. (envoyproxy#14764) Signed-off-by: Piotr Sikora <piotrsikora@google.com> Request headers to add (envoyproxy#14747) Being consistent about treating Host: and :authority the same way in Envoy header modification. Risk Level: Medium (changes allowed modifiable headers) Testing: new unit tests Docs Changes: yes Release Notes: inline Signed-off-by: Alyssa Wilk <alyssar@chromium.org> tls: update BoringSSL to fbbf8781 (4324). (envoyproxy#14763) Signed-off-by: Piotr Sikora <piotrsikora@google.com> docs: change getting started order (envoyproxy#14774) Sandboxes are more relevant to new users than the other sections. Signed-off-by: Matt Klein <mklein@lyft.com> oauth2: set accept header on access_token request (envoyproxy#14538) Co-authored-by: Dhi Aurrahman <dio@tetrate.io> Signed-off-by: Richard Patel <me@terorie.dev> router: Remove envoy.reloadable_features.consume_all_retry_headers (envoyproxy#14662) This patch removes the envoy.reloadable_features.consume_all_retry_headers runtime flag. Signed-off-by: Martin Matusiak <numerodix@gmail.com> docs: API review checklist (envoyproxy#14399) * API review checklist Signed-off-by: Mark D. Roth <roth@google.com> [docs] Add guidance on ENVOY_BUG in STYLE.md (envoyproxy#14575) * Add guidanceon ENVOY_BUG and macro usage to STYLE.md Signed-off-by: Asra Ali <asraa@google.com> filters: Add test/server:filter_config_test (envoyproxy#14746) As part of envoyproxy#14470, I'll be modifying the base filter interface to include an overridable dependencies() function. This is prep work. Risk Level: Low (test only) Doc Change: n/a Release Notes: n/a Signed-off-by: Auni Ahsan <auni@google.com> dns: removing envoy.reloadable_features.fix_wildcard_matching envoyproxy#14644 (envoyproxy#14768) Signed-off-by: Alyssa Wilk <alyssar@chromium.org> test: FakeUpstream threading fixes (envoyproxy#14526) Signed-off-by: Antonio Vicente <avd@google.com> server: add FIPS mode statistic indicating FIPS compliance (envoyproxy#14719) Signed-off-by: Ravindra Akella <rakella@rakella-ltm.internal.salesforce.com> Add error_state to all config dump resources (envoyproxy#14689) Store the NACKed resource in each resources Risk Level: None Fixes: envoyproxy#14431 Signed-off-by: Lidi Zheng <lidiz@google.com> docs: fix two typos in jwt_authn_filter (envoyproxy#14796) Signed-off-by: Lukasz Jernas <lukasz.jernas@allegro.pl> tcp: adding logs and debug checks (envoyproxy#14771) Adding some logs and one ENVOY_BUG around the new TCP pool. Signed-off-by: Alyssa Wilk <alyssar@chromium.org> google_grpc: attempt to reduce lock contention between completionThread() and onCompletedOps() (envoyproxy#14777) Holding a stream's lock while running handleOpCompletion can result in the completion queue having to wait until the lock is released before adding messages on that stream to completed_ops_. In cases where the completion queue is shared across multiple gRPC streams, delivery of new messages on all streams is blocked until the lock held by the first stream while executing onCompletedOps. Signed-off-by: Antonio Vicente <avd@google.com> filters: Add dependencies.proto (envoyproxy#14750) Introduces the FilterDependency proto. This isn't quite an extension, but it's a common proto to be used by all filter extensions. Risk Level: Low (proto addition only) Signed-off-by: Auni Ahsan <auni@google.com> tools: Syncing api/BUILD file to generated_api_shadow (envoyproxy#14792) After chatting with @akonradi on Slack, it seems the generated_api_shadow/BUILD file was not being updated by proto_format since PR envoyproxy#9719. This PR copies the api/BUILD file to generated_api_shadow. Risk Level: Low (relevant for development) Signed-off-by: Adi Suissa-Peleg <adip@google.com> Add debug log for slow config updates for GRPC subscriptions (envoyproxy#14343) Risk Level: Low Testing: Docs Changes: N/A Release Notes: N/A Signed-off-by: Adam Schaub <adamsjob@google.com> oauth2 filter: Make OAuth scopes configurable. (envoyproxy#14168) New optional parameter 'auth_scopes' added to the filter. The default value is 'user' (if not provided) to avoid breaking changes to users updating to the latest version. Signed-off-by: andreyprezotto <andreypp@gmail.com> Co-authored-by: Nitin Goyal <nitingoyal.dev@gmail.com> upstream: Optimize LoadStatsReporter::startLoadReportPeriod implementation (envoyproxy#14803) cm_.clusters() is not O(1) in part due to it creating maps and returning by value. This means that startLoadReportPeriod was effectively O(n**2) on number of clusters since cm_.clusters() is called for every active cluster. Risk Level: low, functional no-op Testing: Existing tests. We may want a benchmark. Signed-off-by: Antonio Vicente <avd@google.com> ext_proc: Implement response path for headers only (envoyproxy#14713) Implement header processing on the response path by sending the response_headers message to the processor and handling the result. Also update the docs in the .proto file. Signed-off-by: Gregory Brail <gregbrail@google.com> reformat code Signed-off-by: qinggniq <livewithblank@gmail.com>

alyssawilk requested a review from snowp as a code owner November 23, 2020 13:58

repokitteh-read-only bot added the api label Nov 23, 2020

alyssawilk assigned antoniovicente Nov 23, 2020

Peekahead.

377a655

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

alyssawilk force-pushed the prefetch branch from 8eb8a19 to 377a655 Compare November 23, 2020 15:00

fix_proto

a3f873e

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

antoniovicente reviewed Nov 24, 2020

View reviewed changes

yanavlasov self-assigned this Dec 1, 2020

antoniovicente added the waiting:any label Dec 1, 2020

repokitteh-read-only bot removed the waiting:any label Dec 2, 2020

alyssawilk added 2 commits December 2, 2020 10:15

comments

03d25da

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

Merge branch 'master' into prefetch

0de2a56

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

alyssawilk commented Dec 2, 2020

View reviewed changes

speling

00d0506

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

junr03 assigned goaway and junr03 Dec 2, 2020

antoniovicente previously approved these changes Dec 8, 2020

View reviewed changes

alyssawilk unassigned goaway, junr03 and yanavlasov Dec 8, 2020

alyssawilk added 2 commits December 17, 2020 16:44

comments

f4619e3

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

Merge branch 'master' into prefetch

84882f2

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

repokitteh-read-only bot removed the waiting label Dec 18, 2020

mattklein123 requested changes Dec 18, 2020

View reviewed changes

repokitteh-read-only bot added the waiting label Dec 18, 2020

alyssawilk added 2 commits January 6, 2021 11:52

Merge branch 'master' into prefetch

568ac1c

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

utility

5f3ef8a

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

repokitteh-read-only bot removed the waiting label Jan 7, 2021

spelling

228f441

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

alyssawilk removed the v2-freeze label Jan 7, 2021

alyssawilk added 2 commits January 12, 2021 11:06

Merge branch 'master' into prefetch

7f391a9

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

Merge branch 'master' into prefetch

575636e

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

mattklein123 requested changes Jan 12, 2021

View reviewed changes

repokitteh-read-only bot added the waiting label Jan 12, 2021

comments

ff6aba8

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

repokitteh-read-only bot removed the waiting label Jan 12, 2021

mattklein123 previously approved these changes Jan 12, 2021

View reviewed changes

repokitteh-read-only bot removed the api label Jan 12, 2021

Merge branch 'master' into prefetch

dbb54a7

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

alyssawilk dismissed mattklein123’s stale review via dbb54a7 January 13, 2021 14:26

repokitteh-read-only bot added the api label Jan 13, 2021

htuch reviewed Jan 13, 2021

View reviewed changes

repokitteh-read-only bot removed the api label Jan 13, 2021

antoniovicente approved these changes Jan 13, 2021

View reviewed changes

alyssawilk merged commit 1c43e39 into envoyproxy:master Jan 13, 2021

		if ((state.connecting_stream_capacity_ + state.active_streams_) >
		(state.pending_streams_ + 1 + state.active_streams_) * peekahead_ratio) {

		return shouldConnect(pending_streams_.size(), num_active_streams_, connecting_stream_capacity_,
		perUpstreamPreconnectRatio());

http: prefetch for upstreams #14143

http: prefetch for upstreams #14143

Conversation

alyssawilk commented Nov 23, 2020 • edited Loading

repokitteh-read-only bot commented Nov 23, 2020

alyssawilk commented Nov 23, 2020

alyssawilk commented Nov 23, 2020 via email

mattklein123 commented Nov 23, 2020

alyssawilk commented Nov 23, 2020

mattklein123 commented Nov 23, 2020

antoniovicente commented Nov 23, 2020

antoniovicente left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

goaway commented Dec 3, 2020

alyssawilk commented Dec 3, 2020

antoniovicente left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alyssawilk commented Dec 8, 2020

mattklein123 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alyssawilk commented Jan 11, 2021

mattklein123 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattklein123 Jan 12, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattklein123 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

htuch left a comment

Choose a reason for hiding this comment

antoniovicente left a comment

Choose a reason for hiding this comment

alyssawilk commented Nov 23, 2020 •

edited

Loading

mattklein123 Jan 12, 2021 •

edited

Loading