api request hang and stuck #1740

geotransformer · 2022-03-11T01:21:21Z

What happened (please include outputs or screenshots):
ubuntu@xxxx-control-plane-1:~ kubectl logs -n foo xxx-controller-bc4db5d46-h594h
2022-03-10 23:43:49.065 INFO main: Check Apiserver connection
2022-03-10 23:43:51.946 WARNING urllib3.connectionpool: Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f4d5a8d3630>: Failed to establish a new connection: [Errno 113] No route to host',)': /apis/apiextensions.k8s.io/v1/
ubuntu@xxxx-control-plane-1:~

What you expected to happen:
ubuntu@xxxx-control-plane: kubectl logs xxx-controller-5696b686fb-c5tb5 --previous
2022-02-27 19:37:30.000 INFO main: Check Apiserver connection
2022-02-27 19:37:32.251 WARNING urllib3.connectionpool: Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7faacf65b6d8>: Failed to establish a new connection: [Errno 113] No route to host',)': /apis/apiextensions.k8s.io/v1/
2022-02-27 19:37:35.323 WARNING urllib3.connectionpool: Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7faacf65b7f0>: Failed to establish a new connection: [Errno 113] No route to host',)': /apis/apiextensions.k8s.io/v1/
2022-02-27 19:37:38.395 WARNING urllib3.connectionpool: Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7faacf65b898>: Failed to establish a new connection: [Errno 113] No route to host',)': /apis/apiextensions.k8s.io/v1/
2022-02-27 19:37:41.467 ERROR main: Exception when calling ApiextensionsV1Api->get_api_resources: HTTPSConnectionPool(host='10.96.0.1', port=443): Max retries exceeded with url: /apis/apiextensions.k8s.io/v1/ (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7faacf65b978>: Failed to establish a new connection: [Errno 113] No route to host',))

How to reproduce it (as minimally and precisely as possible):
Happened intermittently, Cannot reproduce it manually. Only happens in automation testing in a multi-nodes k8s cluster

Anything else we need to know?:

#!/usr/bin/env python3

try:
    from kubernetes import client, config, watch
    from kubernetes.client.configuration import Configuration
    from kubernetes.config import kube_config
    import os
    import time
    import sys
except ImportError as e:
    raise ImportError(str(e)

class xxx_controller:
    def __init__(self):
        self.api_client = client.ApiClient()
        self.v1 = client.ApiextensionsV1Api(self.api_client)
        self.api_instance = client.CoreV1Api(self.api_client)
        self.crds = client.CustomObjectsApi(self.api_client)

    def check_apiserver_conn(self):
        try:
            logger.info("Check Apiserver connection")
            api_response = self.v1.get_api_resources()
        except Exception as e:
            logger.error("Exception when calling ApiextensionsV1Api->get_api_resources: %s\n" % e)

def main():
    config.load_incluster_config()
    cObj = xxx_controller()
    cObj.check_apiserver_conn()

if __name__ == "__main__":
    main()

Environment:

Kubernetes version (kubectl version): 1.21
OS (e.g., MacOS 10.13.6): Ubuntu 18.04
Python version (python --version) 3.6
Python client version (pip list | grep kubernetes)
root@xxx-bc4db5d46-h594h:/opt/run/server# pip list | grep kube
kubernetes 21.7.0
WARNING: You are using pip version 21.0.1; however, version 21.3.1 is available.
You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command.

The text was updated successfully, but these errors were encountered:

roycaihw · 2022-03-28T16:47:24Z

It's hard to tell what went wrong from the error message along. It's a networking connection issue. If you could manually reproduce the issue or provide more details we may be able to help more.

aviresonai · 2022-03-30T11:14:42Z

We expirience a similar (and reporducable) problem with read_namespaced_stateful_set
This is a regression in release 23.3.0 (22.6.0) works fine
our code runs on GKE version 1.20.15-gke.1000 inside a pod and looks like


k8s.config.load_incluster_config()
api_client = k8s.client.ApiClient()
self.k8sappsclient = k8s.client.AppsV1Api(api_client)
self.k8sappsclient.read_namespaced_stateful_set("mysetname", "my_namepsace")

the call to read_namespaced_stateful_set never returns

roycaihw · 2022-03-31T01:07:31Z

Does the server respond if you call a different API, or use kubectl inside the pod? Would be usefully if we can capture the HTTP requests (using debug logging for this client, and -v=9 for kubectl)

aviresonai · 2022-03-31T05:28:06Z

Does the server respond if you call a different API, or use kubectl inside the pod? Would be usefully if we can capture the HTTP requests (using debug logging for this client, and -v=9 for kubectl)

Tried the debug option and it hep me understand that

the hung issue was related to our logging and not to kubernetes driver
The actual issue is that 23.3.0 throws an exception on this API (which we havent seen on previous versions) - below is the actual exception stack trace

2022-03-31 05:20:40,915 - yowza.yapi.k8s.k8s_cluster_facade(140490894980928) - k8s_cluster_facade.py:544 - INFO - call read_namespaced_stateful_set
Traceback (most recent call last):
File "irocket/pepper/pepper.py", line 320, in
main()
File "irocket/pepper/pepper.py", line 314, in main
pepper = Pepper()
File "irocket/pepper/pepper.py", line 71, in init
self.k8s_cluster_facade.query_stateful_set_replicas(self.mesh_stateful_set_name)
File "/usr/src/app/yapi/k8s/k8s_cluster_facade.py", line 545, in query_stateful_set_replicas
api_response = self.k8sappsclient.read_namespaced_stateful_set(stateful_set, self.namespace)
File "/usr/local/lib/python3.7/dist-packages/kubernetes/client/api/apps_v1_api.py", line 7223, in read_namespaced_stateful_set
return self.read_namespaced_stateful_set_with_http_info(name, namespace, **kwargs) # noqa: E501
File "/usr/local/lib/python3.7/dist-packages/kubernetes/client/api/apps_v1_api.py", line 7324, in read_namespaced_stateful_set_with_http_info
collection_formats=collection_formats)
File "/usr/local/lib/python3.7/dist-packages/kubernetes/client/api_client.py", line 353, in call_api
_preload_content, _request_timeout, _host)
File "/usr/local/lib/python3.7/dist-packages/kubernetes/client/api_client.py", line 192, in __call_api
return_data = self.deserialize(response_data, response_type)
File "/usr/local/lib/python3.7/dist-packages/kubernetes/client/api_client.py", line 264, in deserialize
return self.__deserialize(data, response_type)
File "/usr/local/lib/python3.7/dist-packages/kubernetes/client/api_client.py", line 303, in __deserialize
return self.__deserialize_model(data, klass)
File "/usr/local/lib/python3.7/dist-packages/kubernetes/client/api_client.py", line 639, in __deserialize_model
kwargs[attr] = self.__deserialize(value, attr_type)
File "/usr/local/lib/python3.7/dist-packages/kubernetes/client/api_client.py", line 303, in __deserialize
return self.__deserialize_model(data, klass)
File "/usr/local/lib/python3.7/dist-packages/kubernetes/client/api_client.py", line 641, in __deserialize_model
instance = klass(**kwargs)
File "/usr/local/lib/python3.7/dist-packages/kubernetes/client/models/v1_stateful_set_status.py", line 79, in init
self.available_replicas = available_replicas
File "/usr/local/lib/python3.7/dist-packages/kubernetes/client/models/v1_stateful_set_status.py", line 119, in available_replicas
raise ValueError("Invalid value for available_replicas, must not be None") # noqa: E501
ValueError: Invalid value for available_replicas, must not be None

roycaihw · 2022-04-01T20:41:59Z

Thanks @aviresonai! Interesting, not sure if @geotransformer hit the same issue.

Invalid value for available_replicas, must not be None

That is almost certainly a similar issue to kubernetes-client/gen#52. Reading the k8s API, it looks like the field used to be optional, but in kubernetes/kubernetes#104045 it was changed to be a required field. However the API server may still return a statefulset with this field missing, which fails the openapi-generated client-side validation.

Typically we fix this kind of issues by marking the field optional in k8s API. Would you mind opening an issue/PR in k8s?

aviresonai · 2022-04-03T10:03:55Z

Thanks @aviresonai! Interesting, not sure if @geotransformer hit the same issue.

Invalid value for available_replicas, must not be None

That is almost certainly a similar issue to kubernetes-client/gen#52. Reading the k8s API, it looks like the field used to be optional, but in kubernetes/kubernetes#104045 it was changed to be a required field. However the API server may still return a statefulset with this field missing, which fails the openapi-generated client-side validation.

Typically we fix this kind of issues by marking the field optional in k8s API. Would you mind opening an issue/PR in k8s?

Thanks @roycaihw , I am not sure how to describe this as n API error (the API version we use is older then the client and at that time this field was not mandatory) yet , working with older versions of the API seems like a requirment for the client library not for the API

roycaihw · 2022-04-04T16:43:51Z

The fix will be in the upstream openapi spec, and eventually land in this python client. You're correct that we won't change the server's behavior. Here is an example: kubernetes/kubernetes#64996

k8s-triage-robot · 2022-07-03T16:54:12Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2022-08-02T17:50:05Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2022-09-01T17:50:57Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen
Mark this issue or PR as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-ci-robot · 2022-09-01T17:51:00Z

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen

Mark this issue or PR as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

geotransformer added the kind/bug Categorizes issue or PR as related to a bug. label Mar 11, 2022

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 3, 2022

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 2, 2022

k8s-ci-robot closed this as completed Sep 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

api request hang and stuck #1740

api request hang and stuck #1740

geotransformer commented Mar 11, 2022 •

edited

Loading

roycaihw commented Mar 28, 2022

aviresonai commented Mar 30, 2022

roycaihw commented Mar 31, 2022

aviresonai commented Mar 31, 2022

roycaihw commented Apr 1, 2022

aviresonai commented Apr 3, 2022

roycaihw commented Apr 4, 2022

k8s-triage-robot commented Jul 3, 2022

k8s-triage-robot commented Aug 2, 2022

k8s-triage-robot commented Sep 1, 2022

k8s-ci-robot commented Sep 1, 2022

api request hang and stuck #1740

api request hang and stuck #1740

Comments

geotransformer commented Mar 11, 2022 • edited Loading

roycaihw commented Mar 28, 2022

aviresonai commented Mar 30, 2022

roycaihw commented Mar 31, 2022

aviresonai commented Mar 31, 2022

roycaihw commented Apr 1, 2022

aviresonai commented Apr 3, 2022

roycaihw commented Apr 4, 2022

k8s-triage-robot commented Jul 3, 2022

k8s-triage-robot commented Aug 2, 2022

k8s-triage-robot commented Sep 1, 2022

k8s-ci-robot commented Sep 1, 2022

geotransformer commented Mar 11, 2022 •

edited

Loading