LoadBalancerNegNotReady #931

RafiGreenberg · 2019-11-06T17:14:56Z

I upgraded our GKE cluster to 1.13.11-gke.11

Since then 1 newly created service using NEG is failing to become healthy even though the pods are reporting the container is healthy.

kubectl describe pod

<snip>

    Port:           8080/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Wed, 06 Nov 2019 09:08:10 -0800
    Ready:          True
    Restart Count:  0

<snip>

Readiness Gates:
  Type                                       Status
  cloud.google.com/load-balancer-neg-ready
Conditions:
  Type                                       Status
  cloud.google.com/load-balancer-neg-ready
  Initialized                                True
  Ready                                      False
  ContainersReady                            True
  PodScheduled                               True

<snip>

Events:
  Type     Reason                   Age    From                                                 Message
  ----     ------                   ----   ----                                                 -------
  Normal   LoadBalancerNegNotReady  3m55s  neg-readiness-reflector                              Waiting for pod to become healthy in at least one of the NEG(s): [k8s1-74b928a2-default-ww-api-8080-a3c1e454]
  Normal   Scheduled                3m55s  default-scheduler                                    Successfully assigned default/ww-api-575475649c-q7dz4 to gke-dev-cluster-1-dev-pool-3-9b3e5c8e-7ms5
  Normal   Pulled                   3m54s  kubelet, gke-dev-cluster-1-dev-pool-3-9b3e5c8e-7ms5  Container image "gcr.io/ranker-infra/ww-api:0c9bae40cd44b8da075d79b7005e5ed0119f95d2" already present on machine
  Normal   Created                  3m54s  kubelet, gke-dev-cluster-1-dev-pool-3-9b3e5c8e-7ms5  Created container
  Normal   Started                  3m54s  kubelet, gke-dev-cluster-1-dev-pool-3-9b3e5c8e-7ms5  Started container

The text was updated successfully, but these errors were encountered:

tornado67 · 2019-11-27T15:17:00Z

Same issue on 1.13.11-gke.14 .

tobiasbrodersen · 2020-01-14T13:40:54Z

We're experiencing issue on version v1.13.11-gke.14 aswell, it seems to be correclated to NEGs and the new readiness gates introcuded in 1.13+
Removing our annotations on the service:

beta.cloud.google.com/backend-config: '{"default": "istio"}'
cloud.google.com/app-protocols: '{"https":"HTTP2"}'
cloud.google.com/neg: '{"ingress": true}'

And reapplying them makes all healthcheck pass and traffic is forwarded again.
I'm trying to dig further into the documentation and will report back if I get any findings.

freehan · 2020-01-14T19:01:46Z

Some backgrounds:
OSS K8s Pod readiness gate feature
usage of pod readiness gate for Container Native LB

Based on the troubleshooting guide:

Look for neg-status annotation on the service with neg annotation.
It should contain the NEG name and the locations. More info here
Look for backend-service. More info here.
Check if the corresponding endpoints showed up in the backend service and they healthy.

If not then check a few things:
0. validate if the cluster satisfies the requirements.

validate the health check configuration on backend service is correct and it is health checking backends as expected
validate firewall is opened for health check requests to pass and arrive at the destination.
validate the backends are receiving health check requests and responding correctly.

fejta-bot · 2020-04-13T19:47:49Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2020-05-13T20:31:17Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot · 2020-06-12T21:14:19Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

k8s-ci-robot · 2020-06-12T21:14:34Z

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

ajaysourcedigital · 2021-08-23T15:54:29Z

If you define LivenessProbe and readinessProbe inside the YAML definition file, it should go away.

rramkumar1 assigned freehan Nov 19, 2019

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 13, 2020

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 13, 2020

k8s-ci-robot closed this as completed Jun 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LoadBalancerNegNotReady #931

LoadBalancerNegNotReady #931

RafiGreenberg commented Nov 6, 2019 •

edited

Loading

tornado67 commented Nov 27, 2019

tobiasbrodersen commented Jan 14, 2020 •

edited

Loading

freehan commented Jan 14, 2020

fejta-bot commented Apr 13, 2020

fejta-bot commented May 13, 2020

fejta-bot commented Jun 12, 2020

k8s-ci-robot commented Jun 12, 2020

ajaysourcedigital commented Aug 23, 2021

LoadBalancerNegNotReady #931

LoadBalancerNegNotReady #931

Comments

RafiGreenberg commented Nov 6, 2019 • edited Loading

tornado67 commented Nov 27, 2019

tobiasbrodersen commented Jan 14, 2020 • edited Loading

freehan commented Jan 14, 2020

fejta-bot commented Apr 13, 2020

fejta-bot commented May 13, 2020

fejta-bot commented Jun 12, 2020

k8s-ci-robot commented Jun 12, 2020

ajaysourcedigital commented Aug 23, 2021

RafiGreenberg commented Nov 6, 2019 •

edited

Loading

tobiasbrodersen commented Jan 14, 2020 •

edited

Loading