tcp: allow conn pool cancel to request connection close #4684

zuercher · 2018-10-10T22:29:48Z

Allows cancellation of a pending connection via the TCP connection pool to
request that the pending connection be closed if it cannot be immediately used.
This allows a caller to terminate an unneeded upstream connection. In particular,
it allows the tcp_proxy to prevent an unused connection caused by a downstream
reset from lingering unused.

Risk Level: medium
Testing: unit tests
Docs Changes: n/a
Release Notes: n/a
Fixes: #4409

Signed-off-by: Stephan Zuercher stephan@turbinelabs.io

Allows cancellation of a pending connection via the TCP connection pool to request that the pending connection be closed if it cannot be immediately used. *Risk Level*: medium *Testing*: unit tests *Docs Changes*: n/a *Release Notes*: n/a *Fixes*: envoyproxy#4409 Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>

Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>

ggreenway

Looks good overall

ggreenway · 2018-10-11T17:06:53Z

include/envoy/tcp/conn_pool.h

   */
-  virtual void cancel() PURE;
+  virtual void cancel(bool close = false) PURE;


Can this be an enum instead of a bool, for readability at the callsite?

Sure. Will do.

Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>

mattklein123

In general LGTM but one question. Also, would it be possible to update the PR description to better describe the bug being fixed? I read the related issue but TBH it's pretty confusing so I want to make sure I'm reviewing the right thing. Thank you!

mattklein123 · 2018-10-11T23:29:37Z

include/envoy/tcp/conn_pool.h

+  // available for a future connection request.
+  Default,
+  // When a connection request is canceled, closes a pending connection if there are more pending
+  // connections that pending connection requests.


s/that/than?

mattklein123 · 2018-10-11T23:36:10Z

source/common/tcp/conn_pool.cc

  ENVOY_LOG(debug, "canceling pending request");
  request.removeFromList(pending_requests_);
  host_->cluster().stats().upstream_rq_cancelled_.inc();
+
+  // If the cancel requests closure of excess connections and there are more pending connections
+  // than requests, close the most recently created pending connection.


For my understanding, the idea here is that if there are multiple pending requests, we only want to kill the matching one that this request created? What about if another busy connection becomes available and then the request uses that? Will we still have an excess that we need to kill? I'm wondering if this might be a little more complicated...

I think this feature can only completely prevent excess connections if the caller always prevents re-use of connections (which TCP proxy does). That results in every connection request creating a new connection and this change makes it so that canceled connection requests don't result in a ready upstream connection with no matching downstream.

One could argue that there's another feature that causes requests in the ready_conns_ list to be closed if they haven't been assigned within some time period. I suppose that could cover this case reasonably well, if that timeout were set relatively low, since TCP proxy connections normally exist in ready_conns_ transiently.

That said, to support some odd-ball protocol where the server transmits without first receiving a client request, I think you want to end the connection ASAP.

I'm still a little confused about the original bug report. For my understanding, can you summarize the sequence events in the bug report?

The original bug was non-SSL client connection to SSL listener causes tcp_proxy to open an upstream connection. The downstream is reset (because it's not using SSL) but the conn pool connection to the upstream is opened and left idle until the next time the conn pool is used.

I think the idle timeout only applies to active connections assigned to the tcp proxy, not idle connections in the pool (unless I've misunderstood something).

I think there's a minor issue around connection use (not really re-use since the upstream will never have received any data), but it's somewhat hypothetical since I can't think of a protocol where the server transmits first. Even then the conn pool will close the connection if it receives data on an unassigned connection.

It's certainly different behavior from the pre-conn-pool tcp proxy which would always close connections when the downstream went away.

but it's somewhat hypothetical since I can't think of a protocol where the server transmits first.

MySQL, sadly.

Given that the connection reuse might be a real issue for MySQL, I'm wondering if the fix needs to be more targeted to basically bind a pending TCP connection to a pending request in certain cases, and then just kill the connection if canceled during connection? This is basically what tcp_proxy did before the conn pool change and I think we need to go back to that behavior? WDYT?

For TCP proxy, we get that behavior with this change. Connections will only every be busy (assigned, tcp proxy will close when it done with the conn) or pending. So we'll close pending connections on cancel because each cancellation will always see one more pending conn than request. It won't necessarily be the connection that was specifically triggered by the downstream, but I don't think there's any information exchange that would matter.

If you want to write a mysql proxy that reuses upstream connections, you do end up in the situation where a connection request kicks off a new connection and before it completes a busy connection becomes ready. And that's possible even without cancellations.

I think handling the mysql case means the cancel behavior here as well as not moving pending connections to ready in processIdleConnections when there's no pending request.

Normally, I'd just add a config flag to Cluster to enable this behavior, but I had wanted to make this automatic for the TCP proxy and I expect we'd want that for a mysql proxy as well. Any thoughts on how the conn pool could know or do we just make a flag and try to document it well enough that people know to set it?

Honestly I think for right now, if this change takes us back to the behavior we had previously I think it's fine, and we can revisit an L7 MySQL proxy case later? I just want to make sure that L4 MySQL proxy will work. Is it worth adding any more comments around this discussion? I feel it's nuanced/complicated and I wouldn't want to lose this for future code readers?

Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>

) Allows cancellation of a pending connection via the TCP connection pool to request that the pending connection be closed if it cannot be immediately used. *Risk Level*: medium *Testing*: unit tests *Docs Changes*: n/a *Release Notes*: n/a *Fixes*: envoyproxy#4409 Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io> Signed-off-by: Yang Song <yasong@yasong00.cam.corp.google.com>

zuercher requested a review from ggreenway October 10, 2018 22:29

zuercher mentioned this pull request Oct 10, 2018

TCP connection pool have a mode where unassigned connections are closed (for TCP proxy) #4409

Closed

alyssawilk assigned ggreenway Oct 11, 2018

kick ci

3c0677d

Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>

ggreenway requested changes Oct 11, 2018

View reviewed changes

zuercher added 2 commits October 11, 2018 14:43

convert param to enum

ed2cb07

Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>

Merge branch 'lyft-master' into stephan/tcp-conn-pool-fix-4409

694f7a3

Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>

ggreenway previously approved these changes Oct 11, 2018

View reviewed changes

mattklein123 requested changes Oct 11, 2018

View reviewed changes

zuercher added 2 commits October 12, 2018 10:03

fix typo

6cc1999

Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>

Merge branch 'lyft-master' into stephan/tcp-conn-pool-fix-4409

a5b457b

Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>

zuercher dismissed ggreenway’s stale review via a5b457b October 12, 2018 17:03

zuercher added 2 commits October 15, 2018 09:36

add additional comments

6e1765b

Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>

Merge branch 'lyft-master' into stephan/tcp-conn-pool-fix-4409

f47a345

Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>

ggreenway approved these changes Oct 15, 2018

View reviewed changes

mattklein123 approved these changes Oct 16, 2018

View reviewed changes

zuercher merged commit 0d185c2 into envoyproxy:master Oct 16, 2018

This was referenced Oct 20, 2020

transport sockets: expose proxy protocol socket #12762

Merged

Proxy protocol transport socket intermittently injects wrong addresses #13659

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tcp: allow conn pool cancel to request connection close #4684

tcp: allow conn pool cancel to request connection close #4684

zuercher commented Oct 10, 2018 •

edited

Loading

ggreenway left a comment

ggreenway Oct 11, 2018

zuercher Oct 11, 2018

mattklein123 left a comment

mattklein123 Oct 11, 2018

mattklein123 Oct 11, 2018

zuercher Oct 12, 2018 •

edited

Loading

zuercher Oct 12, 2018

mattklein123 Oct 12, 2018

zuercher Oct 12, 2018

zuercher Oct 12, 2018

mattklein123 Oct 12, 2018

mattklein123 Oct 12, 2018

zuercher Oct 12, 2018

mattklein123 Oct 12, 2018

tcp: allow conn pool cancel to request connection close #4684

tcp: allow conn pool cancel to request connection close #4684

Conversation

zuercher commented Oct 10, 2018 • edited Loading

ggreenway left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattklein123 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zuercher Oct 12, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zuercher commented Oct 10, 2018 •

edited

Loading

zuercher Oct 12, 2018 •

edited

Loading