-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ingress Controller forwarding traffic to a POD(IP) even after termination #7330
Comments
/remove-kind bug
To reproduce this problem, please add information here ranging from ;
|
Pretty sure this is related to the refresh cycle in nginx, which happens every second. Please try to add a preStop hook on your affected deployment with a "sleep 10" command, and try to terminate a pod after the change is applied. I'm pretty sure this will point you to the real issue, and it isn't really nginx controller related. |
This is a general question about gracefully shutdown. |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close |
@k8s-triage-robot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@toredash @tao12345666333 could you elaborate, why do you think this is related to an issue on the application's graceful shutdown side? From Kubernetes docs o Pod Lifecycle:
I can confirm that by "watching" the Endpoints from the application: the POD's IP is removed from the list right on the terminate call. ingress-nginx's own docs say, the controller uses Endpoints objects to build/re-build the upstreams. Base on these two facts, it's not clear, how a slow shutdown of the application, should cause the described issue. I expected controller to react on the change in Endpoints right after the POD's termination request has happened, and to remove this POD's IP from the upstream. |
@narqo nginx will poll the k8s api at 1s interval for an updated EndPoint-list.
For large endpoints, it can take time to compute the new list and reload the nginx procress. The result of this is that it can take >0-1s for nginx to detect and not forward traffic to a deleted backend/POD.
This is also true, but I think you are assuming/expect that changes is instantaneous reflected in your environment, when the code does try to do that at all.
That is true, and I don't see anyone have stated otherwise in this issue.
It does this, but I believe your expectations is not aligned with how the code works at the moment. Please look at the lua code mentioned above in regards to backend sync. The code does not attempt to make detect backend changes realtime. |
Had the same problem after restarting the pod; how can I fix it now? |
Solved, the reason is that the ingress controller pod has no space, and there is an error log: I1217 08:00:10.109340 8 event.go:282] Event(v1.ObjectReference{Kind:"Pod", Namespace:"kube-system", Name:"aaaa-nginx-ingress-controller-xxxxx", UID:" xxxxxxxxxxxx", APIVersion: "v1", ResourceVersion: "xxxxxx", FieldPath: ""}): type: 'Warning' reason: 'RELOAD' Error reloading NGINX: write /etc/nginx/opentracing.json: no space left on device |
NGINX Ingress controller version: 0.29.0
Kubernetes version: 1.19
Environment: Production
Cloud provider or hardware configuration: Amazon EKS
OS: Linux
What happened:
As soon as we delete a pod, we are seeing 502 errors in the NGINX Ingress controller. There is a small blip and seeing good amount of errors in our production.
Log message:
Note that the upstream IP in the above NGINX log: 10.53.24.125 is the POD IP that was just deleted.
What you expected to happen:
When a POD is deleted, NGINX Ingress controller should not be forwarding the requests to a deleted POD IP but it appears to be caching the POD IPs which should not be the case.
To isolate the problem, we accessed the K8s service using port-forward and we saw no issues with it.
It was the NGINX controller reporting 502 errors.
How to reproduce it:
When there's a good amount of load to the application, delete a pod in the deployment and should instantly see the above mentioned errors. (We had about 200 TPS when this happened)
Anything else we need to know:
We are running NGINX Ingress Controller v0.29.0 on a EKS 1.19 cluster. We also tried upgrading the version to v0.33 and v0.45 but the issue still exists.
Tried updating the ConfigMap with below but no luck:
/kind bug
The text was updated successfully, but these errors were encountered: