Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Status.LoadBalancer.Ingress got often updated and updated to wrong value sometimes #3269

Closed
ukinau opened this issue Oct 19, 2018 · 0 comments · Fixed by #3270
Closed

Status.LoadBalancer.Ingress got often updated and updated to wrong value sometimes #3269

ukinau opened this issue Oct 19, 2018 · 0 comments · Fixed by #3270

Comments

@ukinau
Copy link
Contributor

ukinau commented Oct 19, 2018

Is this a request for help? (If yes, you should use our troubleshooting guide and community support channels, see https://kubernetes.io/docs/tasks/debug-application-cluster/troubleshooting/.):
No

What keywords did you search in NGINX Ingress controller issues before filing this one?
(If you have found any duplicates, you should instead reply there.): "Ingress.Status wrong" "Ingress.Status got often updated"


Is this a BUG REPORT or FEATURE REQUEST? (choose one):
BUG REPORT

NGINX Ingress controller version:
nginx-0.16.2, but it's in master as well

Kubernetes version (use kubectl version):
Server Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.3", GitCommit:"2bba0127d85d5a46ab4b778548be28623b32d0b0", GitTreeState:"clean", BuildDate:"2018-05-21T09:05:37Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}

Environment:

  • Cloud provider or hardware configuration: OpenStack

  • OS (e.g. from /etc/os-release):
    NAME="CentOS Linux"
    VERSION="7 (Core)"
    ID="centos"
    ID_LIKE="rhel fedora"
    VERSION_ID="7"
    PRETTY_NAME="CentOS Linux 7 (Core)"
    ANSI_COLOR="0;31"
    CPE_NAME="cpe:/o:centos:centos:7"
    HOME_URL="https://www.centos.org/"
    BUG_REPORT_URL="https://bugs.centos.org/"
    CENTOS_MANTISBT_PROJECT="CentOS-7"
    CENTOS_MANTISBT_PROJECT_VERSION="7"
    REDHAT_SUPPORT_PRODUCT="centos"
    REDHAT_SUPPORT_PRODUCT_VERSION="7"

  • Kernel (e.g. uname -a): 3.10.0-862.9.1.el7.x86_64

  • Install tools: RKE (https://github.com/rancher/rke)

What happened:
When we run ingress-controller on 4 nodes and create around 4-6 ingress resources,
one of the ingress resource(always same resource)'s Ingress.Status.LoadBalancer.Ingress filed got very often updated(3 times in 2min) although all ingress-controller pods are active and nothing changed.
Even some time ingress controller updated status with wrong value like duplicated hostname.

What you expected to happen:
As long as all ingress controller pods are active and nothing changed to node, ingress resource's
.Status.LoadBalancer.Ingress filed won't be updated.

How to reproduce it (as minimally and precisely as possible):

  • run ingress-controller on bunch of nodes (like 15 nodes)
    • physical machine with many core is better to reproduce
  • create many ingress resource (like 20 ingress)
  • periodically check kubectl get event, you will see many Update on one(or two) of the ingress resource got often updated.

Anything else we need to know:
Actually I investigated this problem and found the place which is very suspicious and fix it in our environment. Although I create PR, let me explain what is happening.
basically this is happening because ingress-controller update wrong information in Ingress.Status.LoadBalancer.Ingress over and over.
Here is the code to update status
https://github.com/kubernetes/ingress-nginx/blob/master/internal/ingress/status/status.go#L149-L153

In this logic, as a first step, try to get all node ip/hostname running ingress-controller (this seems fine),
And then update status of each ingress resource (https://github.com/kubernetes/ingress-nginx/blob/master/internal/ingress/status/status.go#L326).

Updating each ingress resource will be done in parallel. here have the problem actually we passed original status information's slice reference to each updating function and multiple Goroutine try to sort same status information's slice. This will break the slice and then some times update status with wrong value
https://github.com/kubernetes/ingress-nginx/blob/master/internal/ingress/status/status.go#L335
https://github.com/kubernetes/ingress-nginx/blob/master/internal/ingress/status/status.go#L349

So basically we should not manipulate original status information's slice reference in a function to be done in parallel, or if still we need to manipulate original reference, we should use deepcopy before pass it

ukinau added a commit to ukinau/ingress-nginx that referenced this issue Oct 19, 2018
Currently ingress controller try to update status for each ingress
resource in a parallel by using Goroutine, and inside this Goroutine we
are trying to sort same IngressStatus reference which is shared between
all Goroutine, this will break the original refrence if some Goroutine
tried to sort exact same time.
So we should have done sorting before passing reference to each
Goroutine to prevent from breaking original reference

fixes: kubernetes#3269
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant