-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[epic] Bootstrap data synchronization #3040
Comments
Fixed in etcd so far but not sqlite. Bump to next milestone as this is not critical. |
As far as I can see, the issue still not fixed in etcd:
restart k3s server with --cluster-init
|
Yeah, as noted in our call earlier the cluster-reset-restore path will overwrite the certs on disk with the data from the bootstrap data, but the logic about what we sync and when needs to be completely rethought. |
This is depending on #3015 and @briandowns to link other issue that Hussein is working on |
The PR attached to #3015 should cover this epic, let's move to "To test" when that's landed |
Validated on master-branch commit |
Environmental Info:
K3s Version:
All versions
Node(s) CPU architecture, OS, and Version:
N/A
Cluster Configuration:
N/A
Describe the bug:
The cluster bootstrap data (ca certs, etc) is only written to the datastore once, by the first node, after the initial startup generates the keying material. If any of the certificates expire and are renewed, or are otherwise altered by the end-user, the bootstrap data in the datastore will contain stale data. This will cause problems when the bootstrap data is used by new nodes join the cluster, or when the cluster datastore is restored from backup.
Additionally, the bootstrap data is NEVER written to the datastore when using managed etcd. This means that the complete cluster state cannot be restored from an etcd snapshot.
Steps To Reproduce:
Cert expiry with external DB
Current behavior: Node B comes up with the original, expired certs from the datastore that need to be renewed locally.
Desired behavior: Node B comes up with the renewed certificates as updated by Node A.
Cert expiry with user-provided certs
Current behavior: Certificates are renewed to extend expiry, but are now self-signed. This may break things in other interesting ways?
Desired behavior: K3s fails to start with error indicating that user-provided certificates cannot be renewed.
Cert restoration with external DB
Current behavior: Certificates are restored from the database and the cluster starts up normally
Desired behavior: Certificates are restored from the database and the cluster starts up normally
Cert replacement with external DB
Current behavior: CA certificates and other keying material from the original database are used instead of the certificates from the new cluster's database, breaking things in strange and interesting ways
Desired behavior: CA certificates from new cluster's database are written to disk, and any other downstream keying materials (encryption configuration, ipsec keys, token signing certs, kubeconfig client certs, etc) are regenerated as well.
Cert restoration with managed etcd
Current behavior: New certificates are generated and the cluster fails to start properly (cert errors from kubectl, pods crash, etc)
Desired behavior: Certificates are restored from the datastore via HTTP bootstrap from another node in the cluster - if possible. May need a --cluster-reset --cluster-reset-restore in order to properly extract the correct certs from the etcd datasture.
Kubeconfig restoration with any datastore
/var/lib/rancher/k3s/server/cred/*.kubeconfig
Current behavior: Components fail to start due to missing kubeconfigs, which are only generated if the certs+keys for the relevant kubeconfig are missing.
Desired behavior: Kubeconfigs are regenerated
token encryption rotation with any datastore type
--token=oldpass
--token=newpass
/var/lib/rancher/k3s
Current behavior: Restoration (or connection to current database) fails as the bootstrap data does not match, and new CA certificates and keying material are generated.
Desired behavior: Bootstrap data is encrypted with new key when the token is changed, and subsequent restores or reconnections to the external datastore properly load the previous CA certs and keying material.
Additional context / logs:
Is preventing #2902 from actually working
Related to behavior described in #3015
Related to secrets encryption rotation needed for #3407
The text was updated successfully, but these errors were encountered: