Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

single node cluster crashes when node reboot (container IP changing makes kubeconf out of date) #3071

Closed
panpan0000 opened this issue Jan 17, 2023 · 3 comments
Labels
area/provider/podman Issues or PRs related to podman kind/feature Categorizes issue or PR as related to a new feature. triage/duplicate Indicates an issue is a duplicate of other open issue.

Comments

@panpan0000
Copy link

panpan0000 commented Jan 17, 2023

Impact

For a single node cluster, when host OS reboot, the cluster failed to work , e.g. : the kube-controller-manager will complain with logs

leaderelection.go:325] error retrieving resource lock kube-system/kube-controller-manager: 
Get "https://10.89.0.1:6443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/kube-controller-manager?timeout=5s": 
dial tcp 10.89.0.1:6443: connect: no route to host
....(repeating)

NOTE: remember the above IP 10.89.0.1..

Root Cause

(1) When host OS reboot, the IP address of eth0 inside kind container tends to be different than last time:

  • podman runtime: everytime os reboot or podman restart, the podman network will assign a "+1" IP to container. For example, first time IP is 10.89.0.1, the next time will be 10.89.0.2 , and so on.
  • docker runtime: there's still some chance the kind container being assign the previous IP, it depends to luck?

(2) When there's only one node for kind cluster, the below 2 kube-config will use nodeIP instead of localhost.

  • /etc/kubernetes/controller-manager.conf
  • /etc/kubernetes/scheduler.conf

In above conf files, content looks like:

apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: *
    server: https://10.89.0.1:6443           # <-------------- it uses the "container IP" here
  name: **

Workaround

I will have to use "sed" to replace the IP from container IP to 127.0.0.1 , right after kind cluster gets ready.
so every time container restarts or os reboot, k8s cluster can survive .

I haven't tested for multiple node situation, in theory , the result will be worse and solution may be more complicated.

Fixes (?)

I propose when kind itself detects only one node for this cluster, below code AdvertiseAddress can use localhost instead.

c.AdvertiseAddress = strings.Split(c.NodeAddress, ",")[0]

But again, multiple node situation might be harder.

My two cents, just to set the ball rolling...

@panpan0000 panpan0000 added the kind/feature Categorizes issue or PR as related to a new feature. label Jan 17, 2023
@panpan0000
Copy link
Author

panpan0000 commented Jan 17, 2023

somebody asked if below kind config can help:

kind: Cluster 
apiVersion: kind.x-k8s.io/v1alpha4
networking:
  apiServerAddress: "127.0.0.1"

Actually, it won't.

This field only changes the kubeadm certSANs: https://github.com/kubernetes-sigs/kind/blob/main/pkg/cluster/internal/kubeadm/config.go#L193

I tried it, issue remains.

@BenTheElder
Copy link
Member

KIND + Podman doesn't support restart. KIND + Docker does.

See: #2272, the existing tracking issue.

The problem with just switching to new IPs is we have no way to automatically discover the new peer IPs until podman supports equivalent embedded DNS to docker or someone proposes an alternate approach.

@BenTheElder BenTheElder added triage/duplicate Indicates an issue is a duplicate of other open issue. area/provider/podman Issues or PRs related to podman labels Jan 17, 2023
@panpan0000
Copy link
Author

for single node kind cluster, you can just resolve this problem by :

podman exec -it ${your_container_man}    bash -c  "sed -i 's/server: .*:6443/server: https:\/\/127.0.0.1:6443/g' /etc/kubernetes/controller-manager.conf"
podman exec -it  ${your_container_man}     bash -c  "sed -i 's/server: .*:6443/server: https:\/\/127.0.0.1:6443/g' /etc/kubernetes/scheduler.conf "

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/provider/podman Issues or PRs related to podman kind/feature Categorizes issue or PR as related to a new feature. triage/duplicate Indicates an issue is a duplicate of other open issue.
Projects
None yet
Development

No branches or pull requests

2 participants