Hard-coded addresses of scheduler and controller manager causes unhealthy ComponentStatus #96848

borgerli · 2020-11-25T03:54:29Z

We deployed k8s control plane(kube-apiserver, kube-schedudler and kube-controller-manager) in separate containers, so that they could not access each other through the 127.0.0.1 loopback interface.

But as https://github.com/kubernetes/kubernetes/blob/master/pkg/registry/core/rest/storage_core.go#L345, the validation address is hard-coded as 127.0.0.1 which causes the check failed.

What happened:
scheduler and controller-manager Component Status shows Unhealthy

NAME                 STATUS      MESSAGE                                                                                       ERROR
scheduler            Unhealthy   Get "http://127.0.0.1:10251/healthz": dial tcp 127.0.0.1:10251: connect: connection refused   
controller-manager   Unhealthy   Get "http://127.0.0.1:10252/healthz": dial tcp 127.0.0.1:10252: connect: connection refused

What you expected to happen:

The validation addresses should be detected automatically and the component status shows Healthy.

Anything else we need to know?:

Environment:

Kubernetes version (use kubectl version): 1.18
Cloud provider or hardware configuration:
OS (e.g: cat /etc/os-release):
Kernel (e.g. uname -a):
Install tools:
Network plugin and version (if this is a network-related bug):
Others:

The text was updated successfully, but these errors were encountered:

borgerli · 2020-11-25T03:56:47Z

/sig api-machinery

pacoxu · 2020-11-25T04:48:12Z

As I know, it is caused by that insecure port of scheduler and kube-controller-manager are deprecated.

https://github.com/kubernetes/kubernetes/blob/ac62c47889bcb29cd488a4a7149f90ab9da836e8/pkg/scheduler/apis/config/types.go#L42-49
DefaultInsecureSchedulerPort

    livenessProbe:
      failureThreshold: 8
      httpGet:
        host: 127.0.0.1
        path: /healthz
        port: 10259
        scheme: HTTPS
      initialDelaySeconds: 15
      timeoutSeconds: 15

The readiness/liveness is now using 10259 https port to check cs status.

Warning: v1 ComponentStatus is deprecated in v1.19+
ComponentStatus is use insecure port to check scheduler status.

This may not be fixed as it is deprecated.

pacoxu · 2020-11-25T04:50:58Z

Workaround is to open insecure port which is not secure.

remove --port=0 in scheduler manifests file.

kubeadm

/etc/kubernetes/manifests/kube-scheduler.yaml
/etc/kubernetes/manifests/kube-controller-manager.yaml

borgerli · 2020-11-25T05:46:48Z

Workaround is to open insecure port which is not secure.

remove --port=0 in scheduler manifests file.

kubeadm

/etc/kubernetes/manifests/kube-scheduler.yaml

/etc/kubernetes/manifests/kube-controller-manager.yaml

Thanks for the reply.

The problem is that the IP addresses of scheduler and controller-manager is hard-coded to 127.0.0.1. However, in our situation, scheduler and controller-manager are running in different containers other than the kube-apiserver, and so they could not be reached by kube-apiserver through 127.0.0.1.

Regarding the deprecation of v1 ComponentStatus API, we should detect the status of scheduler and controller-manager by other means instead of by checking the ComponentStatus? Thanks.

pacoxu · 2020-11-25T05:55:38Z

You meet same issue as #19570 (comment)

Per #93570,
kube-apiserver: the componentstatus API is deprecated. This API provided status of etcd, kube-scheduler, and kube-controller-manager components, but only worked when those components were local to the API server, and when kube-scheduler and kube-controller-manager exposed unsecured health endpoints. Instead of this API, etcd health is included in the kube-apiserver health check and kube-scheduler/kube-controller-manager health checks can be made directly against those components' health endpoints.

borgerli · 2020-11-25T05:58:35Z

Got it. Many thanks.

Closing this issue.

fedebongio · 2020-12-03T21:22:55Z

/triage accepted

borgerli added the kind/bug Categorizes issue or PR as related to a bug. label Nov 25, 2020

k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 25, 2020

k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Nov 25, 2020

borgerli closed this as completed Nov 25, 2020

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Dec 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hard-coded addresses of scheduler and controller manager causes unhealthy ComponentStatus #96848

Hard-coded addresses of scheduler and controller manager causes unhealthy ComponentStatus #96848

borgerli commented Nov 25, 2020

borgerli commented Nov 25, 2020

pacoxu commented Nov 25, 2020

pacoxu commented Nov 25, 2020 •

edited

Loading

borgerli commented Nov 25, 2020

pacoxu commented Nov 25, 2020

borgerli commented Nov 25, 2020

fedebongio commented Dec 3, 2020

Hard-coded addresses of scheduler and controller manager causes unhealthy ComponentStatus #96848

Hard-coded addresses of scheduler and controller manager causes unhealthy ComponentStatus #96848

Comments

borgerli commented Nov 25, 2020

borgerli commented Nov 25, 2020

pacoxu commented Nov 25, 2020

pacoxu commented Nov 25, 2020 • edited Loading

borgerli commented Nov 25, 2020

pacoxu commented Nov 25, 2020

borgerli commented Nov 25, 2020

fedebongio commented Dec 3, 2020

pacoxu commented Nov 25, 2020 •

edited

Loading