Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hard-coded addresses of scheduler and controller manager causes unhealthy ComponentStatus #96848

Closed
borgerli opened this issue Nov 25, 2020 · 7 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@borgerli
Copy link
Contributor

We deployed k8s control plane(kube-apiserver, kube-schedudler and kube-controller-manager) in separate containers, so that they could not access each other through the 127.0.0.1 loopback interface.

But as https://github.com/kubernetes/kubernetes/blob/master/pkg/registry/core/rest/storage_core.go#L345, the validation address is hard-coded as 127.0.0.1 which causes the check failed.

What happened:
scheduler and controller-manager Component Status shows Unhealthy

NAME                 STATUS      MESSAGE                                                                                       ERROR
scheduler            Unhealthy   Get "http://127.0.0.1:10251/healthz": dial tcp 127.0.0.1:10251: connect: connection refused   
controller-manager   Unhealthy   Get "http://127.0.0.1:10252/healthz": dial tcp 127.0.0.1:10252: connect: connection refused   

What you expected to happen:

The validation addresses should be detected automatically and the component status shows Healthy.

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version): 1.18
  • Cloud provider or hardware configuration:
  • OS (e.g: cat /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Network plugin and version (if this is a network-related bug):
  • Others:
@borgerli borgerli added the kind/bug Categorizes issue or PR as related to a bug. label Nov 25, 2020
@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 25, 2020
@borgerli
Copy link
Contributor Author

/sig api-machinery

@k8s-ci-robot k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Nov 25, 2020
@pacoxu
Copy link
Member

pacoxu commented Nov 25, 2020

As I know, it is caused by that insecure port of scheduler and kube-controller-manager are deprecated.

https://github.com/kubernetes/kubernetes/blob/ac62c47889bcb29cd488a4a7149f90ab9da836e8/pkg/scheduler/apis/config/types.go#L42-49
DefaultInsecureSchedulerPort

    livenessProbe:
      failureThreshold: 8
      httpGet:
        host: 127.0.0.1
        path: /healthz
        port: 10259
        scheme: HTTPS
      initialDelaySeconds: 15
      timeoutSeconds: 15

The readiness/liveness is now using 10259 https port to check cs status.

Warning: v1 ComponentStatus is deprecated in v1.19+
ComponentStatus is use insecure port to check scheduler status.

This may not be fixed as it is deprecated.

@pacoxu
Copy link
Member

pacoxu commented Nov 25, 2020

Workaround is to open insecure port which is not secure.

remove --port=0 in scheduler manifests file.

kubeadm

  • /etc/kubernetes/manifests/kube-scheduler.yaml
  • /etc/kubernetes/manifests/kube-controller-manager.yaml

@borgerli
Copy link
Contributor Author

Workaround is to open insecure port which is not secure.

remove --port=0 in scheduler manifests file.

kubeadm

  • /etc/kubernetes/manifests/kube-scheduler.yaml
  • /etc/kubernetes/manifests/kube-controller-manager.yaml

Thanks for the reply.

The problem is that the IP addresses of scheduler and controller-manager is hard-coded to 127.0.0.1. However, in our situation, scheduler and controller-manager are running in different containers other than the kube-apiserver, and so they could not be reached by kube-apiserver through 127.0.0.1.

Regarding the deprecation of v1 ComponentStatus API, we should detect the status of scheduler and controller-manager by other means instead of by checking the ComponentStatus? Thanks.

@pacoxu
Copy link
Member

pacoxu commented Nov 25, 2020

You meet same issue as #19570 (comment)

Per #93570,
kube-apiserver: the componentstatus API is deprecated. This API provided status of etcd, kube-scheduler, and kube-controller-manager components, but only worked when those components were local to the API server, and when kube-scheduler and kube-controller-manager exposed unsecured health endpoints. Instead of this API, etcd health is included in the kube-apiserver health check and kube-scheduler/kube-controller-manager health checks can be made directly against those components' health endpoints.

@borgerli
Copy link
Contributor Author

Got it. Many thanks.

Closing this issue.

@fedebongio
Copy link
Contributor

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Dec 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

4 participants