Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apply microk8s cgroups2 QOS patch #550

Merged
merged 2 commits into from
Jul 17, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
55 changes: 54 additions & 1 deletion docs/src/snap/reference/troubleshooting.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,8 @@
This page provides techniques for troubleshooting common Canonical Kubernetes
issues.

## Kubectl error: "dial tcp 127.0.0.1:6443: connect: connection refused"

## Kubectl error: `dial tcp 127.0.0.1:6443: connect: connection refused`

### Problem

Expand Down Expand Up @@ -33,6 +34,58 @@ Use `k8s config` instead of `k8s kubectl config` to generate a kubeconfig file
that is valid for use on external machines.


## Kubelet Error: `failed to initialize top level QOS containers`

### Problem


This is related to the `kubepods` cgroup not getting the cpuset controller up on
the kubelet. kubelet needs a feature from cgroup and the kernel may not be set
up appropriately to provide the cpuset feature.

```
E0125 00:20:56.003890 2172 kubelet.go:1466] "Failed to start ContainerManager" err="failed to initialize top level QOS containers: root container [kubepods] doesn't exist"
```

### Explanation

An excellent deep-dive of the issue exists at
[kubernetes/kubernetes #122955][kubernetes-122955].

Commenter [@haircommander][] [states][kubernetes-122955-2020403422]
> basically: we've figured out that this issue happens because libcontainer
> doesn't initialize the cpuset cgroup for the kubepods slice when the kubelet
> initially calls into it to do so. This happens because there isn't a cpuset
> defined on the top level of the cgroup. however, we fail to validate all of
> the cgroup controllers we need are present. It's possible this is a
> limitation in the dbus API: how do you ask systemd to create a cgroup that
> is effectively empty?

> if we delegate: we are telling systemd to leave our cgroups alone, and not
> remove the "unneeded" cpuset cgroup.


### Solution

This is in the process of being fixed upstream via
[kubernetes/kuberetes #125923][kubernetes-125923].

In the meantime, the best solution is to create a `Delegate=yes` configuration
in systemd.

```bash
mkdir -p /etc/systemd/system/snap.k8s.kubelet.service.d
cat /etc/systemd/system/snap.k8s.kubelet.service.d/delegate.conf <<EOF
[Service]
Delegate=yes
EOF
reboot
```

<!-- LINKS -->

[kubeconfig file]: https://kubernetes.io/docs/concepts/configuration/organize-cluster-access-kubeconfig/
[kubernetes-122955]: https://github.com/kubernetes/kubernetes/issues/122955
[kubernetes-125923]: https://github.com/kubernetes/kubernetes/pull/125923
[kubernetes-122955-2020403422]: https://github.com/kubernetes/kubernetes/issues/122955#issuecomment-2020403422
[@haircommander]: https://github.com/haircommander
Loading