-
Notifications
You must be signed in to change notification settings - Fork 549
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Documentation for Adding a new non-Talos Node to Cluster #3990
Comments
bumping this! just went through this experience, and followed this more or less. some of the things I ran into:
you could 100% automate the certificate provisioning to make it less painful, but meh. |
I was trying to follow the above from @grepler & @hatf0. Huge props and I personally really appreciate the effort/work put in for filing this issue. I'm frankly too stupid and/or impatient for the steps listed though. On top of that I really felt off editing the systemd dropins, creating my own CSR, approving it in the cluster, etc. Luckily I found this gist by @kvaps https://gist.github.com/kvaps/b9b6a8cc07b889a1f60bffc1ceba514d I was able to tweak it for myself on a debian/wsl2 machine. I did it this way because it let me NOT have to adjust any of the out-of-the-box systemd dropins that were installed along side of before running the script I stopped the #!/bin/bash -e
# NOTE $VIP and $TARGET
# $VIP is the IP of a control node with talos installed
# $TARGET is the IP/hostname of the machine that you want to install these files to
# I personally set these via direnv, e.g.:
#
# # in .envrc
# source_up
# export VIP=<control plane IP>
# export TARGET=<target machine IP/hostname>
talosctl -n "$VIP" cat /etc/kubernetes/kubeconfig-kubelet > ./kubelet.conf
talosctl -n "$VIP" cat /etc/kubernetes/bootstrap-kubeconfig > ./bootstrap-kubelet.conf
talosctl -n "$VIP" cat /etc/kubernetes/pki/ca.crt > ./ca.crt
sed -i "/server:/ s|:.*|: https://${VIP}:6443|g" \
./kubelet.conf \
./bootstrap-kubelet.conf
clusterDomain=$(talosctl -n "$VIP" get kubeletconfig -o jsonpath="{.spec.clusterDomain}")
clusterDNS=$(talosctl -n "$VIP" get kubeletconfig -o jsonpath="{.spec.clusterDNS}")
# super stupid way for updating the container runtime socket. please update per your usecase
socketPath="/var/run/containerd/containerd.sock"
if ssh root@$TARGET "ls $socketPath" &> /dev/null; then
printf ""
else
socketPath=/var/run/crio/crio.sock
fi
echo "Using socket path: $socketPath"
cat > var-lib-kubelet-config.yaml <<EOT
kind: KubeletConfiguration
apiVersion: kubelet.config.k8s.io/v1beta1
authentication:
anonymous:
enabled: false
webhook:
enabled: true
x509:
clientCAFile: /etc/kubernetes/pki/ca.crt
authorization:
mode: Webhook
clusterDomain: "$clusterDomain"
clusterDNS: $clusterDNS
runtimeRequestTimeout: "0s"
cgroupDriver: systemd # uhhhh might want to update this for anything else
containerRuntimeEndpoint: unix:/$socketPath
EOT
scp bootstrap-kubelet.conf root@$TARGET:/etc/kubernetes/bootstrap-kubelet.conf
scp kubelet.conf root@$TARGET:/etc/kubernetes/kubelet.conf
ssh root@$TARGET "mkdir -p /etc/kubernetes/pki"
scp ca.crt root@$TARGET:/etc/kubernetes/pki/ca.crt
scp var-lib-kubelet-config.yaml root@$TARGET:/var/lib/kubelet/config.yaml After running and validating that the files in the if everything matches up, yes there are a bunch of SSH/SCP commands to the @andrewrynhard I reached out to you on reddit, this is more-or-less what I really needed. Apologies if |
Special thanks to @chr0n1x for providing the script! I tested it in Talos 1.7 with k8s 1.30. But I would like to make a few points:
$ talosctl -n 10.40.0.200 upgrade-k8s --to 1.30.1 --dry-run --pre-pull-images=false
automatically detected the lowest Kubernetes version 1.30.1
discovered controlplane nodes ["10.40.0.200" "10.40.0.201" "10.40.0.202"]
discovered worker nodes ["10.40.0.203"]
updating "kube-apiserver" to version "1.30.1"
> "10.40.0.200": starting update
> "10.40.0.201": starting update
> "10.40.0.202": starting update
updating "kube-controller-manager" to version "1.30.1"
> "10.40.0.200": starting update
> "10.40.0.201": starting update
> "10.40.0.202": starting update
updating "kube-scheduler" to version "1.30.1"
> "10.40.0.200": starting update
> "10.40.0.201": starting update
> "10.40.0.202": starting update
updating kube-proxy to version "1.30.1"
> "10.40.0.200": starting update
> skipped in dry-run
> "10.40.0.201": starting update
> skipped in dry-run
> "10.40.0.202": starting update
> skipped in dry-run
updating kubelet to version "1.30.1"
> "10.40.0.200": starting update
> "10.40.0.201": starting update
> "10.40.0.202": starting update
> "10.40.0.203": starting update
failed upgrading kubelet: error updating node "10.40.0.203": error watching service: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 10.40.0.203:50000: connect: connection refused"
E0809 12:48:40.398492 1 main.go:227] Failed to create SubnetManager: error retrieving pod spec for 'kube-system/kube-flannel-nrqph': Get "https://127.0.0.1:7445/api/v1/namespaces/kube-system/pods/kube-flannel-nrqph": dial tcp 127.0.0.1:7445: connect: connection refused After applying the machine config, you need to update the Kubernetes manifests. $ talosctl -n 10.40.0.200 get manifests -o yaml | yq eval-all '.spec | .[] | splitDoc' - > manifests.yaml
$ kubectl diff -f manifests.yaml
diff -u -N /tmp/LIVE-1090927806/v1.ConfigMap.kube-system.kubeconfig-in-cluster /tmp/MERGED-2252738197/v1.ConfigMap.kube-system.kubeconfig-in-cluster
--- /tmp/LIVE-1090927806/v1.ConfigMap.kube-system.kubeconfig-in-cluster 2024-08-09 17:43:02.905627200 +0500
+++ /tmp/MERGED-2252738197/v1.ConfigMap.kube-system.kubeconfig-in-cluster 2024-08-09 17:43:02.906627204 +0500
@@ -5,7 +5,7 @@
clusters:
- name: local
cluster:
- server: https://127.0.0.1:7445
+ server: https://1.2.3.4:6443
certificate-authority: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
users:
- name: service-account
$ kubectl apply --server-side -f manifests.yaml Unfortunately, the $ kubectl -n kube-system edit ds kube-flannel After disabling KubePrism and correcting the Kubernetes API server IP and port, the non-Talos node started working successfully. $ kubectl get no -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
test1 Ready control-plane 24d v1.30.1 10.40.0.200 <none> Talos (v1.7.4) 6.6.32-talos containerd://1.7.16
test2 Ready control-plane 24d v1.30.1 10.40.0.201 <none> Talos (v1.7.4) 6.6.32-talos containerd://1.7.16
test3 Ready control-plane 24d v1.30.1 10.40.0.202 <none> Talos (v1.7.4) 6.6.32-talos containerd://1.7.16
test4 Ready <none> 119m v1.30.3 10.40.0.203 <none> Rocky Linux 9.4 (Blue Onyx) 5.14.0-284.25.1.el9_2.x86_64 containerd://1.7.19 UPD: You can easily deploy HAProxy on non-Talos nodes, e.g. $ cat /etc/haproxy/conf.d/kubeprism.conf
frontend kubeprism
mode tcp
bind localhost:7445
default_backend k8s_api
backend k8s_api
mode tcp
server lb 1.2.3.4:6443 |
I was following the steps here to add some k0s nodes to my Talos cluster but then I ran into the issue where k0s by default wants to read the bootstrap config from a specially-named config map, In Talos 1.8.1, the API server, by default, starts with A simple solution is to create a ClusterRole that allows reading config maps (or even better, a RoleBinding to limit to this specific config map), and a ClusterRoleBind to bind all non-Talos After that, I create a ConfigMap in Talos, in the It happily joins the cluster and shows up in |
Thanks for your write-up, it was very helpful. I created an Ansible collection to connect my non-Talos nodes (Raspberry Pi CM3+ nodes) to my Turing Pi 2 - RK1 (community supported) nodes. But, I'll soon abandon my use of Talos, unfortunately, as I need the hardware acceleration support (using the RockChip rknn and rkmpp). And I don't have the spare time to figure out how to add kernel modules and get them signed. I couldn't even get the DRBD module extension working. However, I'd like to share my work, in case someone may find it useful: https://gitlab.com/agravgaard/ansible-collection-k8s/-/tree/Talos?ref_type=tags Best of luck :) |
Adding a Non-Talos Node to a Cluster
This is my personal walkthrough steps for adding a new non-Talos node to a Talos cluster. YMMV.
Description
I love the Talos setup process, but unfortunately I needed a node that could run an alternate runtime (sysbox), so I added a new Ubuntu 20.04 LTS node to the cluster created by Talos.
Since the cluster was not created with Kubeadm, I could not use the standard kubeadm join command. As such, these were the steps taken to setup a new node on the cluster:
Installation Steps
Install Requirements
First Install Kubeadm, kubectl and kubelet following the kubernetes documentation for your distribution.
Install CRI-O
I wanted to use CRI-O, and getting the right version of CRI-O is pretty complicated at the moment, I've found that all the guides are slightly outdated.
However, they are regularly packaged by folks over at OpenSUSE, so go to the link below to get the appropriate version for your kubernetes cluster. Note that you can edit the version in the URL to go directly to the appropriate package.
https://build.opensuse.org/package/show/devel:kubic:libcontainers:stable:cri-o:1.21/cri-o
Click the 'Download Package' button and then 'Add repository and install manually' to get an accurate, up-to-date list of commands to add the correct repository for the OS and OS version you are using.
You can then run the following to check that the correct package is being referenced.
sudo apt search cri-o
Take a snapshot of the machine and then install the runtime.
Get Bootstrap Token
run the following:
kubeadm token create --print-join-command
Copy the token string. This will be used to update the bootstrap file on the new node in a later step. Note that these are only valid for 24 hours.
Run Admin Join Command
observe that the kubeadm join command fails:
kubeadm join 10.1.7.1:6443 --token cyNNNd.5ig...zxmq4 --discovery-token-ca-cert-hash sha256:ae6..ea65
This will likely fail, Talos OS support indicates that you should manually configure Kubelet. Proceed to the next steps.
We will be following this document: https://medium.com/@toddrosner/kubernetes-tls-bootstrapping-cf203776abc7
Retrieve Talos Kubernetes configuration
you can copy out the running kubernetes configuration directory using:
talosctl -n 10.1.7.5 copy /etc/kubernetes -|talos_kubernetes.tar.gz
This will save an archive of the directory from the .5 worker to the current working directory.
We want three files / folders from this archive:
place the files on the new node
Copy these Files to /etc/kubernetes. Ensure that these three files are copied to the new node at /etc/kubernetes.
bootstrap-kubeconfig
kubelet.yaml
/pki/ca.crt
Instruct the kubelet to bootstrap TLS
you should add a new line to kubelet.yaml:
serverTLSBootstrap: true
This is in line with: https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-certs/#renew-certificates-with-the-kubernetes-certificates-api
Update the Bootstrap Token
Update the bootstrap-kubeconfig file on the new node with the token that was just generated in the steps above.
Update the Service Configuration
Since the service runs using systemd, we need to adjust the settings used by the service.
This was the thing that worked for me (not I am using sysbox runtime).
Installing kubeadm and kubectl adds a drop-in configuration file which we can adjust so that it aligns with the standard Talos file locations.
Edit the
--bootstrap-kubeconfig
,--kubeconfig
and--config
flag paths to the settings below.Also add the following flags to KUBELET_KUBECONFIG_ARGS to enable CRI-O:
Once complete, it should look like this:
Reload Services
Reload the systemd services to apply the changes:
systemctl daemon-reload
Then confirm that the changes were applied:
Get Service Definition from Systemd
You can confirm the service configuration settings with the following:
systemctl cat kubelet
Inside the
cat
results, you will look for the referenced file following the--bootstrap-kubeconfig
flag.Check Systemd Drop-Ins
Finally, check that the drop-in systemd extensions are being applied:
should can confirm that the drop-ins are applied using:
systemd-delta --type=extended
Setup Sysbox Runtime
Now that we have the new node joined to the cluster, we can follow through with the installation of the Sysbox OCI runtime environment, to facilitate container virtual machines.
Following these instructions: https://github.com/nestybox/sysbox/blob/master/docs/user-guide/install-k8s.md
System Requirements for storage, etc.
Make sure your system has all the required dependencies for your applications. In my case, I needed to install the nfs client.
Install NFS Common Client Software
Ensure that the node is able to run the NFS client:
apt install nfs-common
The text was updated successfully, but these errors were encountered: