-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TCP connection pool have a mode where unassigned connections are closed (for TCP proxy) #4409
Comments
For the tcp_proxy, the connection pool should always open a new connection to the upstream because Envoy has no knowledge of what protocol is being used and cannot safely reuse connections. This works correctly for the tcp_proxy when the downstream connection actually exchanges data with the upstream. I believe what's happening here is that your first downstream connection is closing before the upstream connection is completed and the unused upstream connection ends up in the connection pool's list of ready connections. The second downstream connection is immediately assigned the ready upstream connection and when the downstream connection is closed, so is the upstream. In the first case, the upstream connection should be terminated as we've no idea if it's safe to reuse it for another downstream connection. Fixing this is going to cause less connection reuse than there is now. |
To address the question of determinism, the connection pools are actually maintained per worker thread. So if the one downstream connection is handled by a different worker thread than another, you may see a new connection formed. |
That makes sense.
That matches my understanding of the logs.
Why ? Since the connection hasn't been used yet, it is safe to reuse for future connections, what makes you think otherwise ? |
Most protocols have the client write first, but that's not necessary. If the pending upstream connection sends data expectig there to be a client, Envoy's conn pool will close the connection with an error (and whatever data was sent will be lost). |
Do I understand correctly that this bug should remain open or a new one created with a more descriptive title to ensure that tcp connections are closed regardless of whether they were used or not by the downstream connection ? |
We can leave this one open. I can edit the title. |
AIUI, the root of this issue is that the test app in question in a single-threaded, single connection python script? It seems that a response including closing the connection following a "Connection: close" response from this app would return the app to a listening state, and solve the problem of a keep-alive connection. I don't believe keep-alives are compatible with having an insufficient number of configured back-end connections. @jvshahid can you try terminating the connection when satisfying a single request, and see if the test cases pass? |
I think this is orthogonal to the issue reported here, I honestly don't know whether the application closes the connection after sending back a response or not. The issue that we originally reported on slack (and didn't do a good job explaining in this github issue) is a result of the following steps:
We originally opened this issue because Envoy behavior isn't deterministic. Sometimes Envoy uses a connection from the pool and sometimes it doesn't. Per our slack conversation, that isn't expected behavior. In addition, @zuercher pointed out in #4409 (comment) that the TCP connection shouldn't be added to the pool in the first place, given that the client might have some state attached to the connection and reusing the upstream connection for a new downstream connection could lead to confusing results. |
I've been playing with this issue in this repo. When the Envoy listener is configured to terminate TLS, then I see lingering ESTABLISHED connections from Envoy to upstream and the occasional slow request. However when the listener is not doing TLS termination, then everything seems fine. I didn't see this TLS bit mentioned anywhere else in this thread, just wanted to share it in case it is relevant. |
@zuercher are you still planning to work on this issue? It is blocking us, and we're considering trying to tackle it... but we haven't made any contributions to Envoy before, and aren't sure quite where to start. |
I intended to tackle this at some point. I'll try to take a look at it today. |
I think I have a working implementation of this fix. I'll post a PR tomorrow. |
Allows cancellation of a pending connection via the TCP connection pool to request that the pending connection be closed if it cannot be immediately used. *Risk Level*: medium *Testing*: unit tests *Docs Changes*: n/a *Release Notes*: n/a *Fixes*: envoyproxy#4409 Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>
@rosenhouse Is it possible for you to test with #4684 and see if it fixes your issue? |
I’m out for the next few days. Perhaps @tylerschultz could give it s whirl? |
We're working with @jvshahid's teammates this morning to try it out. |
@zuercher We did some testing with your PR and the reproductions steps outlined in the repo from @rosenhouse. We found that upstream connections are now closed and in the "TIME_WAIT" state waiting to be cleaned up. We did not see any connections in the "ESTABLISHED" state. @jvshahid @tylerschultz @rosenhouse We also did some testing with this PR and our own bosh-lite test environment, and found that the previously failing CATs test involving the python app is no longer failing. @zuercher Thank you for making this PR. It solves our issue and we are looking forward to your PR getting merged. |
Allows cancellation of a pending connection via the TCP connection pool to request that the pending connection be closed if it cannot be immediately used. *Risk Level*: medium *Testing*: unit tests *Docs Changes*: n/a *Release Notes*: n/a *Fixes*: #4409 Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>
Fixed in #4684 |
) Allows cancellation of a pending connection via the TCP connection pool to request that the pending connection be closed if it cannot be immediately used. *Risk Level*: medium *Testing*: unit tests *Docs Changes*: n/a *Release Notes*: n/a *Fixes*: envoyproxy#4409 Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io> Signed-off-by: Yang Song <yasong@yasong00.cam.corp.google.com>
Envoy version used are
envoy version: d2fa5f3b4c1f09a8e6fa96ecc8f64555d67a85f6/1.8.0-dev/Clean/RELEASE
envoy version: d2fa5f3b4c1f09a8e6fa96ecc8f64555d67a85f6/1.8.0-dev/Clean/RELEASE
Bug Template
We have noticed that Envoy doesn't always reuse connections from the connection pool and sometimes will open new connections unnecessarily.
Description
We discussed this bug in slack where we were asked to open a github issue.
Repro steps:
Use the following config and have a dummy process listening on port
8080
then runnc -vz localhost 61001
repeatedly. The repro steps are very similar to those in #4310Config:
config.yaml
Logs:
Below are the logs with the netstat output every each run of
nc -vz localhost 61001
Envoy trace logs interspersed with `netstat -anp | grep 8080` output
The text was updated successfully, but these errors were encountered: