Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tcp: allow conn pool cancel to request connection close #4684

Merged

Conversation

zuercher
Copy link
Member

@zuercher zuercher commented Oct 10, 2018

Allows cancellation of a pending connection via the TCP connection pool to
request that the pending connection be closed if it cannot be immediately used.
This allows a caller to terminate an unneeded upstream connection. In particular,
it allows the tcp_proxy to prevent an unused connection caused by a downstream
reset from lingering unused.

Risk Level: medium
Testing: unit tests
Docs Changes: n/a
Release Notes: n/a
Fixes: #4409

Signed-off-by: Stephan Zuercher stephan@turbinelabs.io

Allows cancellation of a pending connection via the TCP connection pool to
request that the pending connection be closed if it cannot be immediately used.

*Risk Level*: medium
*Testing*: unit tests
*Docs Changes*: n/a
*Release Notes*: n/a
*Fixes*: envoyproxy#4409

Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>
Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>
Copy link
Contributor

@ggreenway ggreenway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall

*/
virtual void cancel() PURE;
virtual void cancel(bool close = false) PURE;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be an enum instead of a bool, for readability at the callsite?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Will do.

Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>
Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>
ggreenway
ggreenway previously approved these changes Oct 11, 2018
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general LGTM but one question. Also, would it be possible to update the PR description to better describe the bug being fixed? I read the related issue but TBH it's pretty confusing so I want to make sure I'm reviewing the right thing. Thank you!

// available for a future connection request.
Default,
// When a connection request is canceled, closes a pending connection if there are more pending
// connections that pending connection requests.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/that/than?

ENVOY_LOG(debug, "canceling pending request");
request.removeFromList(pending_requests_);
host_->cluster().stats().upstream_rq_cancelled_.inc();

// If the cancel requests closure of excess connections and there are more pending connections
// than requests, close the most recently created pending connection.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my understanding, the idea here is that if there are multiple pending requests, we only want to kill the matching one that this request created? What about if another busy connection becomes available and then the request uses that? Will we still have an excess that we need to kill? I'm wondering if this might be a little more complicated...

Copy link
Member Author

@zuercher zuercher Oct 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this feature can only completely prevent excess connections if the caller always prevents re-use of connections (which TCP proxy does). That results in every connection request creating a new connection and this change makes it so that canceled connection requests don't result in a ready upstream connection with no matching downstream.

One could argue that there's another feature that causes requests in the ready_conns_ list to be closed if they haven't been assigned within some time period. I suppose that could cover this case reasonably well, if that timeout were set relatively low, since TCP proxy connections normally exist in ready_conns_ transiently.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said, to support some odd-ball protocol where the server transmits without first receiving a client request, I think you want to end the connection ASAP.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still a little confused about the original bug report. For my understanding, can you summarize the sequence events in the bug report?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original bug was non-SSL client connection to SSL listener causes tcp_proxy to open an upstream connection. The downstream is reset (because it's not using SSL) but the conn pool connection to the upstream is opened and left idle until the next time the conn pool is used.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the idle timeout only applies to active connections assigned to the tcp proxy, not idle connections in the pool (unless I've misunderstood something).

I think there's a minor issue around connection use (not really re-use since the upstream will never have received any data), but it's somewhat hypothetical since I can't think of a protocol where the server transmits first. Even then the conn pool will close the connection if it receives data on an unassigned connection.

It's certainly different behavior from the pre-conn-pool tcp proxy which would always close connections when the downstream went away.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but it's somewhat hypothetical since I can't think of a protocol where the server transmits first.

MySQL, sadly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the connection reuse might be a real issue for MySQL, I'm wondering if the fix needs to be more targeted to basically bind a pending TCP connection to a pending request in certain cases, and then just kill the connection if canceled during connection? This is basically what tcp_proxy did before the conn pool change and I think we need to go back to that behavior? WDYT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For TCP proxy, we get that behavior with this change. Connections will only every be busy (assigned, tcp proxy will close when it done with the conn) or pending. So we'll close pending connections on cancel because each cancellation will always see one more pending conn than request. It won't necessarily be the connection that was specifically triggered by the downstream, but I don't think there's any information exchange that would matter.

If you want to write a mysql proxy that reuses upstream connections, you do end up in the situation where a connection request kicks off a new connection and before it completes a busy connection becomes ready. And that's possible even without cancellations.

I think handling the mysql case means the cancel behavior here as well as not moving pending connections to ready in processIdleConnections when there's no pending request.

Normally, I'd just add a config flag to Cluster to enable this behavior, but I had wanted to make this automatic for the TCP proxy and I expect we'd want that for a mysql proxy as well. Any thoughts on how the conn pool could know or do we just make a flag and try to document it well enough that people know to set it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly I think for right now, if this change takes us back to the behavior we had previously I think it's fine, and we can revisit an L7 MySQL proxy case later? I just want to make sure that L4 MySQL proxy will work. Is it worth adding any more comments around this discussion? I feel it's nuanced/complicated and I wouldn't want to lose this for future code readers?

Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>
Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>
Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>
Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>
@zuercher zuercher merged commit 0d185c2 into envoyproxy:master Oct 16, 2018
soya3129 pushed a commit to soya3129/envoy that referenced this pull request Oct 19, 2018
)

Allows cancellation of a pending connection via the TCP connection pool to
request that the pending connection be closed if it cannot be immediately used.

*Risk Level*: medium
*Testing*: unit tests
*Docs Changes*: n/a
*Release Notes*: n/a
*Fixes*: envoyproxy#4409

Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>

Signed-off-by: Yang Song <yasong@yasong00.cam.corp.google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants