Status.LoadBalancer.Ingress got often updated and updated to wrong value sometimes #3269

ukinau · 2018-10-19T02:42:48Z

Is this a request for help? (If yes, you should use our troubleshooting guide and community support channels, see https://kubernetes.io/docs/tasks/debug-application-cluster/troubleshooting/.):
No

What keywords did you search in NGINX Ingress controller issues before filing this one?
(If you have found any duplicates, you should instead reply there.): "Ingress.Status wrong" "Ingress.Status got often updated"

Is this a BUG REPORT or FEATURE REQUEST? (choose one):
BUG REPORT

NGINX Ingress controller version:
nginx-0.16.2, but it's in master as well

Kubernetes version (use kubectl version):
Server Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.3", GitCommit:"2bba0127d85d5a46ab4b778548be28623b32d0b0", GitTreeState:"clean", BuildDate:"2018-05-21T09:05:37Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}

Environment:

Cloud provider or hardware configuration: OpenStack
OS (e.g. from /etc/os-release):
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
Kernel (e.g. uname -a): 3.10.0-862.9.1.el7.x86_64
Install tools: RKE (https://github.com/rancher/rke)

What happened:
When we run ingress-controller on 4 nodes and create around 4-6 ingress resources,
one of the ingress resource(always same resource)'s Ingress.Status.LoadBalancer.Ingress filed got very often updated(3 times in 2min) although all ingress-controller pods are active and nothing changed.
Even some time ingress controller updated status with wrong value like duplicated hostname.

What you expected to happen:
As long as all ingress controller pods are active and nothing changed to node, ingress resource's
.Status.LoadBalancer.Ingress filed won't be updated.

How to reproduce it (as minimally and precisely as possible):

run ingress-controller on bunch of nodes (like 15 nodes)
- physical machine with many core is better to reproduce
create many ingress resource (like 20 ingress)
periodically check kubectl get event, you will see many Update on one(or two) of the ingress resource got often updated.

Anything else we need to know:
Actually I investigated this problem and found the place which is very suspicious and fix it in our environment. Although I create PR, let me explain what is happening.
basically this is happening because ingress-controller update wrong information in Ingress.Status.LoadBalancer.Ingress over and over.
Here is the code to update status
https://github.com/kubernetes/ingress-nginx/blob/master/internal/ingress/status/status.go#L149-L153

In this logic, as a first step, try to get all node ip/hostname running ingress-controller (this seems fine),
And then update status of each ingress resource (https://github.com/kubernetes/ingress-nginx/blob/master/internal/ingress/status/status.go#L326).

Updating each ingress resource will be done in parallel. here have the problem actually we passed original status information's slice reference to each updating function and multiple Goroutine try to sort same status information's slice. This will break the slice and then some times update status with wrong value
https://github.com/kubernetes/ingress-nginx/blob/master/internal/ingress/status/status.go#L335
https://github.com/kubernetes/ingress-nginx/blob/master/internal/ingress/status/status.go#L349

So basically we should not manipulate original status information's slice reference in a function to be done in parallel, or if still we need to manipulate original reference, we should use deepcopy before pass it

The text was updated successfully, but these errors were encountered:

Currently ingress controller try to update status for each ingress resource in a parallel by using Goroutine, and inside this Goroutine we are trying to sort same IngressStatus reference which is shared between all Goroutine, this will break the original refrence if some Goroutine tried to sort exact same time. So we should have done sorting before passing reference to each Goroutine to prevent from breaking original reference fixes: kubernetes#3269

ukinau mentioned this issue Oct 19, 2018

Don't sort IngressStatus from each Goroutine(update for each ingress) #3270

Merged

k8s-ci-robot closed this as completed in #3270 Oct 19, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Status.LoadBalancer.Ingress got often updated and updated to wrong value sometimes #3269

Status.LoadBalancer.Ingress got often updated and updated to wrong value sometimes #3269

ukinau commented Oct 19, 2018 •

edited

Loading

Status.LoadBalancer.Ingress got often updated and updated to wrong value sometimes #3269

Status.LoadBalancer.Ingress got often updated and updated to wrong value sometimes #3269

Comments

ukinau commented Oct 19, 2018 • edited Loading

ukinau commented Oct 19, 2018 •

edited

Loading