Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

K3s does not ensure that certificates on disk match values from from cluster bootstrap data #3015

Closed
brandond opened this issue Mar 3, 2021 · 4 comments
Assignees
Labels
kind/bug Something isn't working kind/internal priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Milestone

Comments

@brandond
Copy link
Member

brandond commented Mar 3, 2021

Environmental Info:
K3s Version:
K3s v1.20 (affects all versions)

Node(s) CPU architecture, OS, and Version:
N/A

Cluster Configuration:
Embedded sqlite or external SQL database

Describe the bug:
K3s does not properly restore cluster certificates and other encryption configuration when starting up.

  • When using sqlite, bootstrap data is never extracted from the datastore. If files are missing from disk, new ones are generated during the startup process.
  • When using an external SQL datastore, bootstrap data is only extracted if the files do not exist on disk. If a server has files on disk that differ from those in the cluster bootstrap data, it DOES NOT overwrite the local files, and instead joins the cluster with incorrect keying materials and starts writing data / signing certs that other nodes will not be able to decrypt or trust.

Steps To Reproduce:
sqlite:

  • Install K3s using sqlite
  • stop k3s; rm -rf /var/lib/rancher/server/tls;
  • start k3s; note errors accessing cluster with kubectl; pods crashing

external sql:

  • install K3s with datastore endpoint pointing at SQL datastore; bring up single-node cluster
  • stop K3s; drop table on SQL datastore and restore from a different cluster
  • start k3s; note errors accessing cluster with kubectl; pods crashing

Expected behavior:
K3s ensures that the on-disk certs and other encryption materials match the bootstrap data retrieved from the datastore when starting up, so that it always uses the correct data.

Actual behavior:
K3s starts, but pods crash and clients experience errors since the certs on disk that the apiserver is using do not match those used to sign service account secrets and other thing embedded in the datastore.

Additional context / logs:
@briandowns recently fixed this for etcd restore, but we need to also handle it for other backends.

@brandond brandond changed the title K3s does K3s does not ensure that certificates on disk match values from from cluster bootstrap data Mar 3, 2021
@brandond brandond added this to the 1.21 initial release milestone Mar 3, 2021
@brandond brandond added the kind/bug Something isn't working label Mar 3, 2021
@davidnuzik davidnuzik modified the milestones: v1.21.0+k3s1, v1.21.1+k3s1 Apr 14, 2021
@davidnuzik
Copy link
Contributor

davidnuzik commented May 27, 2021

This will need to get fixed to support rancher/rke2-docs#59 which will likely be a documentation effort.

@davidnuzik
Copy link
Contributor

Will need a pull-thru into RKE2

@davidnuzik
Copy link
Contributor

Needed for #3226 in July timeframe.

@davidnuzik davidnuzik added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Jun 23, 2021
@davidnuzik davidnuzik modified the milestones: v1.21.3+k3s1, v1.21.4+k3s1 Jul 7, 2021
@rancher-max
Copy link
Contributor

Bootstrap functionality has been validated according to our test plan on master branch using commitid 8271d98a766b060463bc73ef66c5085b5797b4cc

Validated all of the below on single node (sqlite) and multinode (externalsql), as well as a few cases with embedded etcd:

  • Remove certs dir
  • Remove a single file from certs dir
  • Modify a file in certs dir
  • Join all servers at the same time
  • Change datastore -- one server initialized against datastore1, second initialized against datastore2, change second to use datastore1, uses the certs from that.

The TLDR design of this is: Trust the certs from the DB. If certs on disk do not match, have user remove the unmatching files so that k3s can pull the proper certs from DB.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working kind/internal priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

No branches or pull requests

6 participants