delete upstream healthcheck annotation #3207

ElvinEfendi · 2018-10-09T01:31:13Z

What this PR does / why we need it:
These annotation has little value in ingress-nginx. ingress-nginx relies on k8s for healthchecking.

Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged): fixes #

Special notes for your reviewer:

aledbf · 2018-10-09T01:37:01Z

@ElvinEfendi please also remove the docs https://github.com/kubernetes/ingress-nginx/blob/b46523a1f47ddf34caec3d87574b17dd9c6ea764/docs/user-guide/nginx-configuration/annotations.md

ElvinEfendi · 2018-10-09T15:10:35Z

/assign @aledbf

aledbf · 2018-10-09T15:18:21Z

/lgtm

k8s-ci-robot · 2018-10-09T15:18:29Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: aledbf, ElvinEfendi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [ElvinEfendi,aledbf]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

AndrewFarley · 2019-07-11T22:03:40Z

So I'm running into issues where we need to set max_fails and because of this "little value" removal we can not now, is there a workaround to still set max_fails?

See: socketio/socket.io#1739

antoineco · 2019-07-11T22:10:07Z

@AndrewFarley max_fails translates to the failureThreshold field of Kubernetes' readiness probes. The recommendation is to use that on your target application container.

https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/#configure-probes

AndrewFarley · 2019-07-11T23:12:55Z

This situation per the link is one in which I am not using this as so much for a health check mechanism but as a retry and anti-flapping mechanism between upstreams. The only solution I can find is to use max_fails, and since you've removed this, I am left with the option of reverse diff-ing this patch and maintaining our own fork for this purpose. Thoughts @antoineco ?

AndrewFarley · 2019-07-11T23:17:15Z

I'm sure I don't have all the context or reasoning here in this PR, but I generally don't understand why you would remove a feature even one of "little value" that didn't have any effect if it wasn't going to be used by anyone in favor of using k8s health checks.

aledbf · 2019-07-11T23:27:05Z

The only solution I can find is to use max_fails, and since you've removed this, I am left with the option of reverse diff-ing this patch and maintaining our own fork for this purpose.

This was removed due to the Lua load balancer change in the ingress controller to avoid reloads when a pod is created/destroyed. Right now there is no easy way to implement an equivalent to the max_fails feature.
That said, the advice from @antoineco to use the probe feature from Kubernetes is the way to go. Additionally, you can use the https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/configmap/#proxy-next-upstream setting to configure in which cases you want to try a different pod from the service being exposed behind the ingress

AndrewFarley · 2019-07-11T23:29:29Z

@aledbf Proxy next upstream might work for this situation, I'll try to fiddle with it, thanks for the info and history on the thought behind this change. Thanks

panpan0000 · 2019-09-17T01:54:17Z

Hi, @AndrewFarley @aledbf
In short,
(a) the problem without timely failover in ingress is kind of big problem in some cases, other than little value.
(b) the proxy-next-upstream is not working, unfortunately.

details below:

(a) We cannot always rely on the liveness-probe. There's a situation which is the problem: a node crash (we can simulate it by reboot a node). Liveness-probe was taken care by kubelet, which is not working when node down. And kubernetes will not mark that node from ready to NotReady until 40s timeout. Until then, the pods on that node will be changed from Running to Unknown state.
That say, there will be 40+ seconds which ingress-nginx thought pods are alive but actually not, the requests which send to those dead pods will suffer from 504 Gateway Timeout.

This kind of 40+s duration and problem is unacceptable for many critical business.

(b) the proxy_next_upstream is not working , neither for ingress-nginx version 0.21.0 nor 0.25.1.
So without it, we don't have a timely failover mechanism in ingress-nginx !

(1)setup two pods for a deployment, running on two nodes.
(2) confirm the setting in ingress-nginx's nginx.conf has been properly setup via configmap.

			proxy_next_upstream                      error timeout http_504 http_503 http_500 http_502 http_404 invalid_header;
			proxy_next_upstream_timeout             2;

(3) looping curl -H "Host:xxx" ingressIp:port (yes, it's GET , not the same issue like #4174)
(4) reboot one node
(5) there will be 40+s , 50% request failed with 504 Gateway timeout

panpan0000 · 2019-09-17T02:20:28Z

I found a post mentioning proxy_next_upstream is not effective for upstreams by balancer_by_lua_block inside openrestry , the founder of openrestry @agentzh proposed to use set_more_tries in lua instead.
But I found ingress code does have this code ngx_balancer.set_more_tries(1)
https://github.com/kubernetes/ingress-nginx/blob/master/rootfs/etc/nginx/lua/balancer.lua#L244

I don't have idea why it's not working...

the original post:
openresty/lua-resty-core#51

agentzh · 2019-09-17T03:48:18Z

@panpan0000 I'm afraid you misinterpret my comment there. The proxy_next_upstream is definitely effective at the same time, see the official documentation for set_more_triesmentioned in that comment:

https://github.com/openresty/lua-resty-core/blob/master/lib/ngx/balancer.md#set_more_tries

panpan0000 · 2019-09-17T08:25:56Z

@agentzh thank you for your clarification !
So in theory, ingress-nginx code should work.
I filed a new issue to track this problem #4567

annProg · 2019-12-05T07:54:48Z

when use ingress for custom endpoint which not belong to any Pod, upstream healthcheck is necessary.

k8s-ci-robot requested a review from antoineco October 9, 2018 01:31

ElvinEfendi mentioned this pull request Oct 9, 2018

Only support dynamic configuration #3198

Merged

k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 9, 2018

delete upstream healthcheck annotation

78f12c2

ElvinEfendi changed the title ~~[WIP] delete upstream healthcheck annotation~~ delete upstream healthcheck annotation Oct 9, 2018

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 9, 2018

k8s-ci-robot assigned aledbf Oct 9, 2018

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 9, 2018

k8s-ci-robot merged commit 808c2be into kubernetes:master Oct 9, 2018

diegows mentioned this pull request Dec 19, 2018

Support nginx.ingress.kubernetes.io/upstream-max-fails at versuon 0.21.0 ? #3550

Closed

This was referenced Sep 17, 2019

proxy-next-upstream isn't working #4174

Closed

proxy-next-upstream is not working ..(GET operation) #4567

Closed

ElvinEfendi deleted the delete-upstream-healthcheck branch September 17, 2019 21:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

delete upstream healthcheck annotation #3207

delete upstream healthcheck annotation #3207

ElvinEfendi commented Oct 9, 2018

aledbf commented Oct 9, 2018

ElvinEfendi commented Oct 9, 2018

aledbf commented Oct 9, 2018

k8s-ci-robot commented Oct 9, 2018

AndrewFarley commented Jul 11, 2019 •

edited

Loading

antoineco commented Jul 11, 2019

AndrewFarley commented Jul 11, 2019

AndrewFarley commented Jul 11, 2019

aledbf commented Jul 11, 2019

AndrewFarley commented Jul 11, 2019

panpan0000 commented Sep 17, 2019 •

edited

Loading

panpan0000 commented Sep 17, 2019 •

edited

Loading

agentzh commented Sep 17, 2019

panpan0000 commented Sep 17, 2019

annProg commented Dec 5, 2019 •

edited

Loading

delete upstream healthcheck annotation #3207

delete upstream healthcheck annotation #3207

Conversation

ElvinEfendi commented Oct 9, 2018

aledbf commented Oct 9, 2018

ElvinEfendi commented Oct 9, 2018

aledbf commented Oct 9, 2018

k8s-ci-robot commented Oct 9, 2018

AndrewFarley commented Jul 11, 2019 • edited Loading

antoineco commented Jul 11, 2019

AndrewFarley commented Jul 11, 2019

AndrewFarley commented Jul 11, 2019

aledbf commented Jul 11, 2019

AndrewFarley commented Jul 11, 2019

panpan0000 commented Sep 17, 2019 • edited Loading

panpan0000 commented Sep 17, 2019 • edited Loading

agentzh commented Sep 17, 2019

panpan0000 commented Sep 17, 2019

annProg commented Dec 5, 2019 • edited Loading

AndrewFarley commented Jul 11, 2019 •

edited

Loading

panpan0000 commented Sep 17, 2019 •

edited

Loading

panpan0000 commented Sep 17, 2019 •

edited

Loading

annProg commented Dec 5, 2019 •

edited

Loading