-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tcp: allow conn pool cancel to request connection close #4684
Merged
zuercher
merged 8 commits into
envoyproxy:master
from
turbinelabs:stephan/tcp-conn-pool-fix-4409
Oct 16, 2018
Merged
Changes from all commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
ed1a99b
tcp: allow conn pool cancel to request connection close
zuercher 3c0677d
kick ci
zuercher ed2cb07
convert param to enum
zuercher 694f7a3
Merge branch 'lyft-master' into stephan/tcp-conn-pool-fix-4409
zuercher 6cc1999
fix typo
zuercher a5b457b
Merge branch 'lyft-master' into stephan/tcp-conn-pool-fix-4409
zuercher 6e1765b
add additional comments
zuercher f47a345
Merge branch 'lyft-master' into stephan/tcp-conn-pool-fix-4409
zuercher File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my understanding, the idea here is that if there are multiple pending requests, we only want to kill the matching one that this request created? What about if another busy connection becomes available and then the request uses that? Will we still have an excess that we need to kill? I'm wondering if this might be a little more complicated...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this feature can only completely prevent excess connections if the caller always prevents re-use of connections (which TCP proxy does). That results in every connection request creating a new connection and this change makes it so that canceled connection requests don't result in a ready upstream connection with no matching downstream.
One could argue that there's another feature that causes requests in the ready_conns_ list to be closed if they haven't been assigned within some time period. I suppose that could cover this case reasonably well, if that timeout were set relatively low, since TCP proxy connections normally exist in ready_conns_ transiently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That said, to support some odd-ball protocol where the server transmits without first receiving a client request, I think you want to end the connection ASAP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still a little confused about the original bug report. For my understanding, can you summarize the sequence events in the bug report?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original bug was non-SSL client connection to SSL listener causes tcp_proxy to open an upstream connection. The downstream is reset (because it's not using SSL) but the conn pool connection to the upstream is opened and left idle until the next time the conn pool is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the idle timeout only applies to active connections assigned to the tcp proxy, not idle connections in the pool (unless I've misunderstood something).
I think there's a minor issue around connection use (not really re-use since the upstream will never have received any data), but it's somewhat hypothetical since I can't think of a protocol where the server transmits first. Even then the conn pool will close the connection if it receives data on an unassigned connection.
It's certainly different behavior from the pre-conn-pool tcp proxy which would always close connections when the downstream went away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MySQL, sadly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that the connection reuse might be a real issue for MySQL, I'm wondering if the fix needs to be more targeted to basically bind a pending TCP connection to a pending request in certain cases, and then just kill the connection if canceled during connection? This is basically what tcp_proxy did before the conn pool change and I think we need to go back to that behavior? WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For TCP proxy, we get that behavior with this change. Connections will only every be busy (assigned, tcp proxy will close when it done with the conn) or pending. So we'll close pending connections on cancel because each cancellation will always see one more pending conn than request. It won't necessarily be the connection that was specifically triggered by the downstream, but I don't think there's any information exchange that would matter.
If you want to write a mysql proxy that reuses upstream connections, you do end up in the situation where a connection request kicks off a new connection and before it completes a busy connection becomes ready. And that's possible even without cancellations.
I think handling the mysql case means the cancel behavior here as well as not moving pending connections to ready in processIdleConnections when there's no pending request.
Normally, I'd just add a config flag to Cluster to enable this behavior, but I had wanted to make this automatic for the TCP proxy and I expect we'd want that for a mysql proxy as well. Any thoughts on how the conn pool could know or do we just make a flag and try to document it well enough that people know to set it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly I think for right now, if this change takes us back to the behavior we had previously I think it's fine, and we can revisit an L7 MySQL proxy case later? I just want to make sure that L4 MySQL proxy will work. Is it worth adding any more comments around this discussion? I feel it's nuanced/complicated and I wouldn't want to lose this for future code readers?