-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[nginx] Lost client requests when updating a deployment and using keep alive #489
Comments
@foxylion this was reported a long time ago kubernetes-retired/contrib#1123 |
Thanks for the issue reference. I didn't look in the old issue tracker, sorry. This makes me think nginx inside the controller does not handle the closed connections of the pods it is routing to well. Because JMeter is handling everything well when it directly communicates with the pods (over the node port) but fails when communicating with the controller (which is never restarted). |
@aledbf I like @foxylion want to achieve a zero down time deployments and right now I'm kinda lost on how to do this. Is there a workaround? Maybe we should document this to make sure people know that this is a problem they might potentially face (specially since the Kubernetes doc states that it can do zero downtime deployments) |
This is true. The real issue here is how nginx handles requests when keep-alive is enabled. |
Okay, so we are seeing the same issue. May question is now: If not we should document this prominently. Because no one would ever consider that ingress (with nginx) would not support the broad claim of Kubernetes being able to do zero downtime deployments. (I really hope there is anything we can do to allow this) |
@aledbf from my very short experience using k8s, if you don't implement a technique like the one described here: https://www.chrismoos.com/2016/09/28/zero-downtime-deployments-kubernetes/ you won't get vanilla zero downtime deployment. We did some tests on our own side and if we disable support for Keep-Alive connections on the server side, then the approach described on that article works indeed. We're going to try and shorten the Keep-Alive timeout to make sure is far below pod's termination periods and see where that gets us. |
@foxylion can you try reducing the keep-alive timeout to something really below the pod's termination period. In theory, this will give clients time enough to finish the connection. |
The thing is, keep-alive is defined as "seconds until an idle connection is closed". When a connection is used constantly it will never be closed. So this will (in my opinion) not always work. |
I don't see a reference to idle connections here: http://nginx.org/en/docs/http/ngx_http_core_module.html#keepalive_timeout |
@yoanisgil If their implementation is matching the RFC draft for the Keep-Alive header, then timeout should be interpreted as:
There is also a 'max' parameter which defined the maximum amount of requests can be performed on a single connection. But I can't find anything about a 'max seconds'. Also you read everywhere things like this:
|
@foxylion thanks for the information. We'll keep testing and see where that gets us. In the meantime it would be nice if someone with more in depth experience/knowledge from the k8s community could jump in and share a thought or two. |
@foxylion @yoanisgil to really fix this we need to avoid reloading nginx in order to update the upstreams. |
@aledbf I'm having a hard time understanding this. When I disable keep-alive on my client (JMeter) there are absolutely no issues with ingress nginx. Only when enabling the keep-alive ingress nginx does drop some of the requests during a rollout.
So it is the problem that nginx is dropping requests when doing a reload on a configuration change, but only when the connections to nginx use keep-alive? |
Exactly. This is a know issue in nginx. This is not related to the ingress controller or kubernetes. |
@foxylion at some point the old nginx worker is going to close the existing connection.
to
This change add the Also from this page https://wiki.apache.org/jmeter/JMeterSocketClosed follow this instructions to change the default jmeter configuration:
|
@aledbf Thanks a lot! I'll try this out. |
@foxylion, with @yoanisgil we find a way to ask nginx to renew the http connections.
"nginx -s reload" creates new workers and all connexions are thereby restarted. |
@foxylion any update? |
@aledbf I'm sorry, but currently there is no time to investigate further. |
Closing. Please reopen if you have more information |
I started some testing on the issue, but not yet finished. (Will reopen when finished testing) |
I think we are experiencing the same problem with Keep-alive HTTP connections. What makes it worse for us - we deploy multiple services to the cluster and they all have ingress resources. This means that whenever service is updated (new version is deployed or services scaled up / down) ingress controller runs |
Hi all, Sorry for commenting on an old issue. Is there any new information or workarounds on this? I've been doing a preStop of > my longest connection timeout to place the instance in a terminating state, remove it from the LB (no new connections received) before letting it go into the nginx graceful shutdown process. Graceful shutdown time is my longest connection timeout * 2. |
I use the following setup the reproduce this issue:
http://node:30086
orhttp://zero-downtime.domain.example
.zero-downtime-test
deploymentI think this my have something to do due to nginx is not correctly closing the connection to the backend when using keep alive.
This problem won't be visible when disabling "keep alive" connections in JMeter.
The text was updated successfully, but these errors were encountered: