Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nodes joined with kubeadm-style bootstrap tokens cannot automatically rejoin when the node object is deleted. #7797

Closed
camaeel opened this issue Jun 18, 2023 · 1 comment
Assignees
Milestone

Comments

@camaeel
Copy link

camaeel commented Jun 18, 2023

Environmental Info:
K3s Version:

k3s version v1.26.5+k3s1 (7cefebea)
go version go1.19.9

Node(s) CPU architecture, OS, and Version:
Linux dev-master-0 5.15.0-75-generic #82-Ubuntu SMP Tue Jun 6 23:10:23 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Cluster Configuration:
3 servers, 3 nodes, all on qemu VMs

Describe the bug:
Documentation here https://docs.k3s.io/architecture#how-agent-node-registration-works states that it is enough to delete the node and delete from this node /etc/rancher/node/password. I tried but got on the worker node log entries:

Jun 17 14:36:54 dev-worker-1 k3s[3828]: time="2023-06-17T14:36:54+02:00" level=info msg="Waiting to retrieve agent configuration; server is not ready: https://127.0.0.1:6444/v1-k3s/serving-kubelet.crt: 401 Unauthorized"

Log on master (server) nodes:

"unable to verify node identity: nodes \"dev-worker-1\" not found"

Steps To Reproduce:

  1. install k3s server
  2. generate bootstrap token using: k3s token create --ttl 24h
  3. install agent on another node and join it correctly to the cluster using this token
  4. stop agent systemd service (systemctl stop k3s-agent)
  5. delete agent node from the cluster (kubectl delete node xxx)
  6. delete rm -rf /etc/rancher/node/ on host xxx
  7. start agent service systemctl start k3s-agent
  8. observe logs

If I deleted rm -r /etc/rancher/node/ /var/lib/rancher/k3s/agent/client-kubelet.crt /var/lib/rancher/k3s/agent/client-kubelet.key in step 7 the node was correctly recreated and joined in the cluster.

Expected behavior:
Either update the documentation or refresh client-kubelet.crt and client-kubelet.key during rejoining to the cluster.

Actual behavior:
Node is not able to correctly rejoin the cluster after being deleted following the official documentation.

Additional context / logs:
(https://rancher-users.slack.com/archives/CGGQEHPPW/p1687006438335789)

@ShylajaDevadiga
Copy link
Contributor

Validated on master branch using commit id 3461739

Docs: https://docs.k3s.io/cli/token#bootstrap

Environment Details

Infrastructure
Cloud EC2 instance

Node(s) CPU architecture, OS, and Version:
Ubuntu 22.04

Cluster Configuration:
Two node, one server one agent

Replication results using k3s v1.27.3+k3s1

ubuntu@ip-172-31-2-173:~$ k3s  token create --ttl 24h
<TOKEN>

Used the token generated to join the agent node

ubuntu@ip-172-31-2-173:~$ kubectl get nodes
NAME               STATUS   ROLES                  AGE     VERSION
ip-172-31-2-173    Ready    control-plane,master   2m13s   v1.27.3+k3s1
ip-172-31-12-211   Ready    <none>                 0s      v1.27.3+k3s1

Deleted agent node after stopping agent service

ubuntu@ip-172-31-2-173:~$ kubectl delete node ip-172-31-12-211
node "ip-172-31-12-211" deleted
ubuntu@ip-172-31-2-173:~$ kubectl get nodes -w
NAME              STATUS   ROLES                  AGE     VERSION
ip-172-31-2-173   Ready    control-plane,master   4m13s   v1.27.3+k3s1

Deleted password file on agent node

sudo rm -rf /etc/rancher/node/

After starting the agent service now, agent refuses to join the cluster with below logs
On server

Jul 12 15:20:41 ip-172-31-2-173 k3s[1673]: time="2023-07-12T15:20:41Z" level=error msg="Sending HTTP 401 response to 172.31.12.211:37108: unable to verify node identity: nodes \"ip-172-31-12-211\" not found"

On Agent

Jul 12 15:26:41 ip-172-31-12-211 k3s[2263]: time="2023-07-12T15:26:41Z" level=info msg="Waiting to retrieve agent configuration; server is not ready: https://127.0.0.1:6444/v1-k3s/serving-kubelet.crt: 401 Unauthorized"

Workaround as described after removing the client-kubelet.crt and client-kubelet.key and password file, agent was able to join the cluster

ubuntu@ip-172-31-2-173:~$ kubectl get nodes -w
NAME               STATUS   ROLES                  AGE   VERSION
ip-172-31-2-173    Ready    control-plane,master   12m   v1.27.3+k3s1
ip-172-31-12-211   Ready    <none>                 15s   v1.27.3+k3s1

Validation results using commit id 3461739 from master branch

Following the above steps, agent successfully joined the cluster after restarting the service without workaround

ubuntu@ip-172-31-7-90:~$ kubectl get nodes
NAME               STATUS   ROLES                  AGE    VERSION
ip-172-31-7-90     Ready    control-plane,master   110s   v1.27.3+k3s-34617390
ip-172-31-12-135   Ready    <none>                 29s    v1.27.3+k3s-34617390
ubuntu@ip-172-31-7-90:~$ kubectl delete node ip-172-31-12-135
node "ip-172-31-12-135" deleted
ubuntu@ip-172-31-7-90:~$ kubectl get nodes
NAME               STATUS   ROLES                  AGE     VERSION
ip-172-31-7-90     Ready    control-plane,master   2m27s   v1.27.3+k3s-34617390
ip-172-31-12-135   Ready    <none>                 6s      v1.27.3+k3s-34617390
ubuntu@ip-172-31-7-90:~$

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

4 participants