Adding a k3s server node from a previous cluster causes 'x509: certificate signed by unknown authority' #2034

dkeightley · 2020-07-16T05:43:17Z

What kind of request is this (question/bug/enhancement/feature request): bug

Steps to reproduce (least amount of steps as possible):

Create a single server node cluster (a) with external datasource
Create a single server node cluster (b) with just a local sqlite datasource
Stop k3s on node (b), leave k3s installed as-is
Install k3s again on node (b), this time using the external datasource of (a) - ie. a two server node cluster sharing the same external datasource
The below logs can be observed

Result:

node (a):

Jul 16 04:11:04 osboxes k3s[16905]: time="2020-07-16T04:11:04.726999721Z" level=info msg="Tunnel endpoint watch event: [192.168.1.121:6443 192.168.1.176:6443]"
Jul 16 04:11:04 osboxes k3s[16905]: time="2020-07-16T04:11:04.728139835Z" level=info msg="Tunnel endpoint watch event: [192.168.1.176:6443]"
Jul 16 04:11:04 osboxes k3s[16905]: time="2020-07-16T04:11:04.728303781Z" level=info msg="Stopped tunnel to 192.168.1.121:6443"
Jul 16 04:11:04 osboxes k3s[16905]: time="2020-07-16T04:11:04.728463728Z" level=info msg="Connecting to proxy" url="wss://192.168.1.121:6443/v1-k3s/connect"
Jul 16 04:11:04 osboxes k3s[16905]: time="2020-07-16T04:11:04.777540692Z" level=error msg="Failed to connect to proxy" error="x509: certificate signed by unknown authority"
Jul 16 04:11:04 osboxes k3s[16905]: time="2020-07-16T04:11:04.777884602Z" level=error msg="Remotedialer proxy error" error="x509: certificate signed by unknown authority"
Jul 16 04:11:05 osboxes k3s[16905]: time="2020-07-16T04:11:05.490119463Z" level=info msg="Tunnel endpoint watch event: [192.168.1.121:6443 192.168.1.176:6443]"
Jul 16 04:11:05 osboxes k3s[16905]: time="2020-07-16T04:11:05.490549928Z" level=info msg="Connecting to proxy" url="wss://192.168.1.121:6443/v1-k3s/connect"
Jul 16 04:11:05 osboxes k3s[16905]: time="2020-07-16T04:11:05.496783853Z" level=error msg="Failed to connect to proxy" error="x509: certificate signed by unknown authority"
Jul 16 04:11:05 osboxes k3s[16905]: time="2020-07-16T04:11:05.497033531Z" level=error msg="Remotedialer proxy error" error="x509: certificate signed by unknown authority"
Jul 16 04:11:06 osboxes k3s[16905]: time="2020-07-16T04:11:06.902346885Z" level=info msg="Active TLS secret k3s-serving (ver=1772) (count 8): map[listener.cattle.io/cn-10.43.0.1:10.43.0.1 listener.cattle.io/cn-127.0.0.1:127.0.0.1 listener.cattle.io/cn-192.168.1.121:192.168.1.121 listener.cattle.io/cn-192.168.1.176:192.168.1.176 listener.cattle.io/cn-kubernetes:kubernetes listener.cattle.io/cn-kubernetes.default:kubernetes.default listener.cattle.io/cn-kubernetes.default.svc.cluster.local:kubernetes.default.svc.cluster.local listener.cattle.io/cn-localhost:localhost listener.cattle.io/hash:afe1d070f4360758c4709136f4c1002a31990ae67bc8258fcf852b0b10c779f9]"
Jul 16 04:11:07 osboxes k3s[16905]: time="2020-07-16T04:11:07.470714773Z" level=error msg="failed to authenticate request: x509: certificate signed by unknown authority"

node (b):

Jul 16 04:13:02 minibox k3s[23949]: E0716 04:13:02.776513   23949 authentication.go:53] Unable to authenticate the request due to an error: [invalid bearer token, Token has been invalidated]
Jul 16 04:13:02 minibox k3s[23949]: E0716 04:13:02.776665   23949 authentication.go:53] Unable to authenticate the request due to an error: [invalid bearer token, Token has been invalidated]
Jul 16 04:13:02 minibox k3s[23949]: time="2020-07-16T04:13:02.835369211Z" level=info msg="Connecting to proxy" url="wss://192.168.1.176:6443/v1-k3s/connect"
Jul 16 04:13:02 minibox k3s[23949]: time="2020-07-16T04:13:02.850108791Z" level=error msg="Failed to connect to proxy" error="websocket: bad handshake"
Jul 16 04:13:02 minibox k3s[23949]: time="2020-07-16T04:13:02.850153505Z" level=error msg="Remotedialer proxy error" error="websocket: bad handshake"
Jul 16 04:13:02 minibox k3s[23949]: E0716 04:13:02.913762   23949 authentication.go:53] Unable to authenticate the request due to an error: [invalid bearer token, Token has been invalidated]

The node (b) will fail to join, although the k3s-serving secret will be updated and signed by the CA on node (b).

Other details that may be helpful:

To recover the cluster the following steps were used (may need revising).

7a. Delete all nodes and the k3s-serving cert

kubectl --insecure-skip-tls-verify=true delete node $(hostname -s)
kubectl --insecure-skip-tls-verify=true -n kube-system delete secret k3s-serving
/usr/local/bin/k3s-uninstall.sh

7b. Reinstall k3s on at least 2 nodes (for me the issue didn't recover until I added 2). Deleting the k3s-serving secret and a k3s restart may be needed.
7c. To recover from invalidated tokens I had to clear all SA tokens from all namespaces and all pods (note: many pods were stuck in Terminating, so I used forceful commands):

kubectl get secret -A | awk '{ if ($3 == "kubernetes.io/service-account-token") system("kubectl -n " $1 " delete secret " $2) }'
kubectl delete pods -A --all --force --grace-period=0

Cluster information

Kubernetes version (use kubectl version): v1.18.4+k3s1 (97b7a0e)

gz#11262

The text was updated successfully, but these errors were encountered:

brandond · 2020-07-16T07:56:34Z

I believe this should probably be moved to k3s but I will let @cjellick decide

brandond · 2020-07-16T22:14:15Z

This is probably an odd corner case - k3s nodes don't expect to be hot-swapped into different clusters without having stuff from the previous installation cleaned out. However, the node joining the cluster should probably fail to do so if its local certs don't match those on the other nodes.

dverbeek84 · 2021-04-18T13:03:51Z

I have got the exact same issue, only i want to recover the whole cluster. After cluster failures.

fapatel1 · 2021-07-29T18:46:48Z

Tracking this is #3040

briandowns · 2021-08-02T19:20:13Z

PR #3398 should take care of this issue as it will introduce behavior that will update the certs on disk if they don't match and are older than the certificates in the datastore.

dkeightley · 2022-03-30T02:49:32Z

Is it possible to confirm which release the fix is included? Thanks!

zhoub · 2022-07-16T07:53:23Z

Bump, would like to know which version has shipped the fix.

brandond · 2022-07-19T19:37:36Z

This issue was closed like a year ago. Every currently supported version has the fix.

cjellick transferred this issue from rancher/rancher Jul 16, 2020

brandond self-assigned this Jul 18, 2020

brandond added the [zube]: To Triage label Jul 18, 2020

davidnuzik added [zube]: Backlog and removed [zube]: To Triage labels Jul 20, 2020

davidnuzik added this to the v1.20 - Backlog milestone Sep 15, 2020

davidnuzik unassigned brandond Sep 15, 2020

brandond added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Nov 23, 2020

davidnuzik removed the [zube]: Backlog label Feb 20, 2021

fapatel1 closed this as completed Jul 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding a k3s server node from a previous cluster causes 'x509: certificate signed by unknown authority' #2034

Adding a k3s server node from a previous cluster causes 'x509: certificate signed by unknown authority' #2034

dkeightley commented Jul 16, 2020 •

edited by jambajaar

Loading

brandond commented Jul 16, 2020

brandond commented Jul 16, 2020

dverbeek84 commented Apr 18, 2021

fapatel1 commented Jul 29, 2021

briandowns commented Aug 2, 2021

dkeightley commented Mar 30, 2022

zhoub commented Jul 16, 2022

brandond commented Jul 19, 2022

Adding a k3s server node from a previous cluster causes 'x509: certificate signed by unknown authority' #2034

Adding a k3s server node from a previous cluster causes 'x509: certificate signed by unknown authority' #2034

Comments

dkeightley commented Jul 16, 2020 • edited by jambajaar Loading

brandond commented Jul 16, 2020

brandond commented Jul 16, 2020

dverbeek84 commented Apr 18, 2021

fapatel1 commented Jul 29, 2021

briandowns commented Aug 2, 2021

dkeightley commented Mar 30, 2022

zhoub commented Jul 16, 2022

brandond commented Jul 19, 2022

dkeightley commented Jul 16, 2020 •

edited by jambajaar

Loading