Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

none driver on openEuler #8420

Closed
gaozhekang opened this issue Jun 9, 2020 · 19 comments
Closed

none driver on openEuler #8420

gaozhekang opened this issue Jun 9, 2020 · 19 comments
Assignees
Labels
co/none-driver kind/support Categorizes issue or PR as a support question. l/zh-CN Issues in or relating to Chinese triage/needs-information Indicates an issue needs more information in order to work on it.

Comments

@gaozhekang
Copy link

gaozhekang commented Jun 9, 2020

<!-- 请在报告问题时使用此模板,并提供尽可能详细的信息。否则可能导致响应延迟。谢谢!-->
我在arm64的vm中执行minikube start --vm-driver=none出现报错,可以确认的是docker-ce已经安装并且可以运行容器。

执行 docker run hello-world 可以看到打印:
Hello from Docker!
This message shows that your installation appears to be working correctly.

$ rpm -qa | grep docker
docker-ce-cli-19.03.11-3.el7.aarch64
docker-ce-19.03.11-3.el7.aarch64

$ rpm -qa | grep kubectl
kubectl-1.18.3-0.aarch64
kubeadm-1.18.3-0.aarch64
kubelet-1.18.3-0.aarch64

**重现问题所需的命令**:minikube start --vm-driver=none

**失败的命令的完整输出**:<details>
* minikube v1.11.0 on Openeuler 20.03 (arm64)
  - KUBECONFIG=/etc/kubernetes/admin.conf:config-demo:config-demo-2
* Using the none driver based on existing profile
* Starting control plane none minikube in cluster minikube
* Restarting existing none bare metal machine for "minikube" ...
* OS release is openEuler 20.03 (LTS)
* Preparing Kubernetes v1.18.3 on Docker 19.03.11 ...
! Unable to restart cluster, will reset it: getting k8s client: client config:  client config: context "minikube" does not exist
! initialization failed, will try again: run: /bin/bash -c "sudo env PATH=/var/lib/minikube/binaries/v1.18.3:$PATH kubeadm in it --config /var/tmp/minikube/kubeadm.yaml --ignore-preflight-errors=DirAvailable--etc-kubernetes-manifests,DirAvailable--var-lib-minikube,DirAvailable--var-lib-minikube-etcd,FileAvailable--etc-kubernetes-manifests-kube-scheduler.yaml,FileAvailable-etc-kubernetes-manifests-kube-apiserver.yaml,FileAvailable--etc-kubernetes-manifests-kube-controller-manager.yaml,FileAvailable--etc-kubernetes-manifests-etcd.yaml,Port-10250,Swap": exit status 1
stdout:
[init] Using Kubernetes version: v1.18.3
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
......
[kubelet-check] Initial timeout of 40s passed.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused.
......
</details>


**`minikube logs`命令的输出**: <details>


</details>

这个看起来像是因为kubelet运行异常导致,通过systemctl status kubelet可以看到kubelet服务为activating (auto-restart)状态,并且退出码为203

$ journalctl -xeu kubelet
Jun 09 15:26:07 localhost.localdomain systemd[1]: kubelet.service: Main process exited, code=exited, status=203/EXEC
-- Subject: Unit process exited
-- Defined-By: systemd
-- Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- An ExecStart= process belonging to unit kubelet.service has exited.
--
-- The process' exit code is 'exited' and its exit status is 203.

**使用的操作系统版本**:openEuler 20.03 (LTS)
@gaozhekang gaozhekang added the l/zh-CN Issues in or relating to Chinese label Jun 9, 2020
@medyagh
Copy link
Member

medyagh commented Jun 9, 2020

@gaozhekang while I dont know abouy openEuler, I like to know is there a reason you choose none driver over docker ?

I wonder if have you tried out newest driver Docker Driver with latest version of minikube?
you could try these with normal user (not sudo)
minikube delete
minikube start --driver=docker

for more information on the docker driver checkout:
https://minikube.sigs.k8s.io/docs/drivers/docker/

@medyagh medyagh added triage/needs-information Indicates an issue needs more information in order to work on it. co/none-driver kind/support Categorizes issue or PR as a support question. labels Jun 9, 2020
@medyagh medyagh changed the title Cannot execute “minikube start --vm-driver=none” on openEuler none driver on openEuler Jun 9, 2020
@afbjorklund
Copy link
Collaborator

@medyagh : afaik, we don't support arm64 with docker yet.

@gaozhekang
Copy link
Author

As what @afbjorklund said, when I tried to use --driver=docker. It showed that arm64 is not supported.

@medyagh
Copy link
Member

medyagh commented Jun 10, 2020

As what @afbjorklund said, when I tried to use --driver=docker. It showed that arm64 is not supported.

ah ...sorry about that. you are right that suggestion wouldn't work

@gaozhekang
Copy link
Author

I tried to reinstall my env and exec "minikube start --vm-driver=none --image-mirror-country=cn" again, it still reported the error before. But I found more info like this:

$ minikube start --vm-driver=none --image-mirror-country=cn
* minikube v1.11.0 on Openeuler 20.03 (arm64)
  - KUBECONFIG=/etc/kubernetes/admin.conf:config-demo:config-demo-2
* Using the none driver based on existing profile
* Starting control plane none minikube in cluster minikube
* Restarting existing none bare metal machine for "minikube" ...
* OS release is openEuler 20.03 (LTS)
* Preparing Kubernetes v1.18.3 on Docker 19.03.11 ...
! Unable to restart cluster, will reset it: getting k8s client: client config:  client config: context "minikube" does not exist
! initialization failed, will try again: run: /bin/bash -c "sudo env PATH=/var/lib/minikube/binaries/v1.18.3:$PATH kubeadm in it --config /var/tmp/minikube/kubeadm.yaml --ignore-preflight-errors=DirAvailable--etc-kubernetes-manifests,DirAvailable--var-lib-minikube,DirAvailable--var-lib-minikube-etcd,FileAvailable--etc-kubernetes-manifests-kube-scheduler.yaml,FileAvailable-etc-kubernetes-manifests-kube-apiserver.yaml,FileAvailable--etc-kubernetes-manifests-kube-controller-manager.yaml,FileAvailable--etc-kubernetes-manifests-etcd.yaml,Port-10250,Swap": exit status 1
......
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused.
        Unfortunately, an error has occurred:
                timed out waiting for the condition

        This error is likely caused by:
                - The kubelet is not running
                - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

        If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
                - 'systemctl status kubelet'
                - 'journalctl -xeu kubelet'

        Additionally, a control plane component may have crashed or exited when started by the container runtime.
        To troubleshoot, list all containers using your preferred container runtimes CLI.

        Here is one example how you may list all Kubernetes containers running in docker:
                - 'docker ps -a | grep kube | grep -v pause
                Once you have found the failing container, you can inspect its logs with:
                - 'docker logs CONTAINEDRID'


stderr:
W0610 11:12:19.072776 1110706 configset.go:202] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]
        [WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
W0610 11:12:20.975400 1110706 manifests.go:225] the default kube-apiserver authorization-mode is "Node,RBAC"; using "Node.RBAC"
W0610 11:12:20.976615 1110706 manifests.go:225] the default kube-apiserver authorization-mode is "Node,RBAC"; using "Node.RBAC"
error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster
To see the stack trace of this error execute with --v=5 or higher

* Suggestion: Check output of 'journalctl -xeu kubelet', try passing --extra-config=kubelet.cgroup-driver=systemd to minikube start
* Related issue: https://github.com/kubernetes/minikube/issues/4172

Accordint to suggestion, I tried "minikube start --vm-driver=none --image-mirror-country=cn --extra-config=kubelet.cgroup-driver=systemd" and nothing changed.
And according to the output of "docker info", cgroup driver is "cgroupfs", not "systemd".
Besides, I checked the status of kubelet service, it's activating (auto-restart) and the exit-code is 203.

@medyagh
Copy link
Member

medyagh commented Jun 10, 2020

@gaozhekang

two things I noticed
1-

The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get 
http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused.

are you using a VPN or firewall ? seems like u can not hit localhost:1248

2- the cgroups seems to be not the one that kubeadm wants

@gaozhekang
Copy link
Author

Thanks. I have disabled firewalld and flushed iptables, so firewall may not be the problem. I guess maybe it's because google is not accessible in China, so I replaced minkube kubelet kubectl and kubeadm with aliyun repo. It seems kubelet service starts to become running. And "journal -xeu kubelet" outputs:

Jun 11 10:22:28 localhost.localdomain kubelet[281340]: E0611 10:22:28.324222 281340 kubelet.go:2267] node "localhost.localdomain" not found
Jun 11 10:22:28 localhost.localdomain kubelet[281340]: E0611 10:22:28.368961 281340 event.go:269] Unable to write event: 'Post https://control-plane.minikube.internal:8443/api/v1/namespaces/default/events: dial tcp 192.168.122.123:8443: connect: connection refused' (may retry after sleeping)

/etc/hosts is:

127.0.0.1    localhost localhost.localdomain localhost4 localhost4.localdomain
::1               localhost localhost.localdomain localhost6 localhost6.localdomain
192.168.122.123   server.example.com node1
192.168.122.121   client.example.com master
127.0.0.1    host.minikube.internal
192.168.122.123   control-plane.minikube.internal

"exec-opts": ["native.cgroupdriver=systemd"] is added to /etc/docker/daemon.json, and the cgroup warning disappeared.

While port 10248 connection and cgroup warning are solved, it stills report an error when running "minikube start --vm-driver=none --registry-mirror=https://registry.docker-cn.com --v=10"

[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifest"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubecheck-check] Initial timeout of 40s passed.
        Unfortunately, an error has occurred:
                timed out waiting for the condition

        This error is likely caused by:
                - The kubelet is not running
                - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

        If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
                - 'systemctl status kubelet'
                - 'journalctl -xeu kubelet'

        Additionally, a control plane component may have crashed or exited when started by the container runtime.
        To troubleshoot, list all containers using your preferred container runtimes CLI.

        Here is one example how you may list all Kubernetes containers running in docker:
                - 'docker ps -a | grep kube | grep -v pause
                Once you have found the failing container, you can inspect its logs with:
                - 'docker logs CONTAINEDRID'


stderr:
W0610 10:32:08.535462 283702 configset.go:202] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]
W0610 10:32:10.916124 283702 manifests.go:225] the default kube-apiserver authorization-mode is "Node,RBAC"; using "Node.RBAC"
W0610 11:12:20.916124 283702 manifests.go:225] the default kube-apiserver authorization-mode is "Node,RBAC"; using "Node.RBAC"
error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster
To see the stack trace of this error execute with --v=5 or higher

@gaozhekang
Copy link
Author

I try the same way in CentOS 7.5 + x86_64 env, and it reports the same error. So maybe this is an known bug?

@medyagh
Copy link
Member

medyagh commented Jun 15, 2020

I haven't personally tired minikbue with arm arch yet, but I would like to have an integration test for this.

@gaozhekang have you tried KVM driver maybe there be luck with that one ?

@afbjorklund
Copy link
Collaborator

afbjorklund commented Jun 15, 2020

I haven't personally tired minikbue with arm arch yet,

Since we only support the "none" driver, the experience is pretty much the same as kubeadm.

Since nobody has mentioned SELinux yet, and this is CentOS, then I suspect that it is #7905

Also it is arm64 not arm, but that's another story.

KVM doesn't work, since we don't have an ARM ISO.

@afbjorklund
Copy link
Collaborator

afbjorklund commented Jun 15, 2020

but I would like to have an integration test for this.

We would need some hardware for this, see #6280

But it would nice with some CentOS tests: #3552

@gaozhekang
Copy link
Author

Thanks. I have tried another times and Centos + X86_64 is OK, but the same way on openEuler + arm64. By the way, my CentOS + X86_64 env has different network env from the arm64 one, so I guess maybe there is some problem with network?

@medyagh
Copy link
Member

medyagh commented Jul 7, 2020

 what is the error you get with the one with different nettwork ? @gaozhekang

@gaozhekang
Copy link
Author

The error is 40s timeout and

stderr:
W0610 10:32:08.535462 283702 configset.go:202] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]
W0610 10:32:10.916124 283702 manifests.go:225] the default kube-apiserver authorization-mode is "Node,RBAC"; using "Node.RBAC"
W0610 11:12:20.916124 283702 manifests.go:225] the default kube-apiserver authorization-mode is "Node,RBAC"; using "Node.RBAC"
error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster
To see the stack trace of this error execute with --v=5 or higher

@medyagh
Copy link
Member

medyagh commented Jul 10, 2020

The error is 40s timeout and

stderr:
W0610 10:32:08.535462 283702 configset.go:202] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]
W0610 10:32:10.916124 283702 manifests.go:225] the default kube-apiserver authorization-mode is "Node,RBAC"; using "Node.RBAC"
W0610 11:12:20.916124 283702 manifests.go:225] the default kube-apiserver authorization-mode is "Node,RBAC"; using "Node.RBAC"
error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster
To see the stack trace of this error execute with --v=5 or higher

that is the W lines (warning), do you mind pasting the whole logs?

@medyagh
Copy link
Member

medyagh commented Jul 29, 2020

I haven't heard back from you, I wonder if you still have this issue?
Regrettably, there isn't enough information in this issue to make it actionable, and a long enough duration has passed, so this issue is likely difficult to replicate.

I will close this issue for now but please feel free to reopen whenever you feel ready to provide more information.

@medyagh medyagh closed this as completed Jul 29, 2020
@kevinzs2048
Copy link

/assign

@kevinzs2048
Copy link

This is Kevin from Linaro, I will corporate with Huawei guys to continue work on this.
Also we have Arm64 machines which can offer to upstream as the Arm64 CI

@kevinzs2048
Copy link

@gaozhekang Hi, could you tell me how to install the Kubeadm/kubectl/kubelet in OpenEuler?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
co/none-driver kind/support Categorizes issue or PR as a support question. l/zh-CN Issues in or relating to Chinese triage/needs-information Indicates an issue needs more information in order to work on it.
Projects
None yet
Development

No branches or pull requests

4 participants