Restoring snapshot without pki bundle or rkestate file #1336

galal-hussein · 2019-05-08T21:28:45Z

For etcd restore to work correctly the certificates generated originally for the cluster must be present, in rke 0.1 we saved the certs as pki.bundle.tar.gz and in 0.2 we moved to cluster.rkestate file, we expect the state file to be present during the restore, however we also save a copy of the state as a configmap with the name full-cluster-state which we might fetch from etcd snapshot directly, the steps to implement this might include:

1- restore the etcd snapshot on any node from the cluster.yml
2- fetch the state using etcd client
3- write the state locally on disk
4- continue with restoration

gz#6680

The text was updated successfully, but these errors were encountered:

maggieliu · 2020-02-28T00:05:50Z

Discussed Solution:

Bundle in RKE state file & Cluster.yml file in the snapshot going forward
When restoring:
- User can pass in an option for RKE state file location, which will overwrite the one bundled in the snapshot.
- Check if the certs is valid or not. If it's expired, prompt for certs rotation

superseb · 2020-08-13T15:19:02Z

The issue here is that the way files are deployed in RKE (by providing the file contents in an environment variable to the container, and echo-ing this variable to a file) will hit a limit of the size of the environment variable in case of 3 nodes with custom certs signed using cfssl. This can also be reproduced by just having a lot of nodes. This needs to be rewritten to launch a container and copy the state file to the container.

soumyalj · 2020-08-18T01:33:10Z

Re-opened as executables are not available for the rcs v1.2.0-rc7 and v1.1.5-rc6.

soumyalj · 2020-08-18T23:45:36Z

Tested using v1.1.5-rc5 .

Created a 3 node cluster using custom certs with the steps below

rke cert generate-csr --cert-dir <custom-dir-path> 

rke up —custom-certs —cert-dir <custom-dir-path>

rke etcd snapshot-save —name snp1 —config cluster.yml

rke state file is not present in the snapshot zip folder

root@soumyacheck1:/opt/rke/etcd-snapshots# ls
backup  snp1.zip
root@soumyacheck1:/opt/rke/etcd-snapshots# cd backup/
root@soumyacheck1:/opt/rke/etcd-snapshots/backup# ls
snp1
root@soumyacheck1:/opt/rke/etcd-snapshots/backup#

Tested with v1.2.0-rc7

Created a 3 cluster using custom certs with the same steps above using v1.2.0-rc7, the rke state file is present in the snapshot folder.

root@ip-172-31-10-199:/opt/rke/etcd-snapshots# ls
backup  etc  snp1.zip
root@ip-172-31-10-199:/opt/rke/etcd-snapshots# cd etc/
root@ip-172-31-10-199:/opt/rke/etcd-snapshots/etc# ls
kubernetes
root@ip-172-31-10-199:/opt/rke/etcd-snapshots/etc# cd kubernetes/
root@ip-172-31-10-199:/opt/rke/etcd-snapshots/etc/kubernetes# ls
snp1.rkestate
root@ip-172-31-10-199:/opt/rke/etcd-snapshots/etc/kubernetes#

Performed a restore using custom certs with the below command:

./rke_darwin-amd64-v120rc7 etcd snapshot-restore --name snp2 --custom-certs cert-dir /Users/soumya/rke/120certs --config cluster.yml

Restore fails with the below error though the pem key kube-etcd-abcd.pem is present in the cert directory

INFO[0044] Initiating Kubernetes cluster                
FATA[0044] Failed to validates certificates from dir [./cluster_certs]: Failed to find etcd [kube-etcd-abcd] Certificate or Key

Restore succeeds when the flag --custom-certs cert-dir is not specified.

sangeethah · 2020-08-19T00:16:17Z

Reopening the issue for the following reasons:

In v1.1.5-rc5 , we dont see the fix for rke state file not being present in the snapshot zip folder when testing with 3 node etcd cluster created with custom cert.
In v1.2.0-rc7 , able to validate that the rke state file is present in the snapshot zip folder when testing with 3 node etcd cluster created with custom cert

But not able to restore using custom certs .

sangeethah · 2020-08-19T16:57:11Z

#2191

soumyalj · 2020-08-19T18:55:43Z

Tested using v1.1.5-rc7 .

Create a 3 node cluster using custom certs with the steps below:

rke cert generate-csr --cert-dir <custom-dir-path> 

rke up —custom-certs —cert-dir <custom-dir-path>

rke etcd snapshot-save —name snp1 —config cluster.yml

Verified rke state file is present in the snapshot zip folder

root@ip-172-31-10-250:/opt/rke/etcd-snapshots# unzip snp1.zip 
Archive:  snp1.zip
warning:  stripped absolute path spec from /backup/snp1
  inflating: backup/snp1             
warning:  stripped absolute path spec from /etc/kubernetes/snp1.rkestate
  inflating: etc/kubernetes/snp1.rkestate

Perform a restore specifying the custom certs as below:

./rke_darwin-amd64-v115rc7 etcd snapshot-restore --custom-certs --cert-dir /Users/soumya/rke/newfincerts --name snp1 --config clusterbkp.yml

Restore is successful. rkestate file is extracted from the snapshot. Logs below:

NFO[0002] Successfully started [etcd-extract-statefile] container on host [abcd] 
INFO[0002] Waiting for [etcd-extract-statefile] container to exit on host [efgh] 
INFO[0002] Waiting for [etcd-extract-statefile] container to exit on host [hijk] 
INFO[0002] Removing container [etcd-extract-statefile] on host [abcd], try #1 
INFO[0002] [remove/etcd-extract-statefile] Successfully removed container on host [abcd] 
INFO[0002] State file is successfully extracted from snapshot [snp1] 
INFO[0002] Restoring etcd snapshot snp1

Restore is also successful without specifying the custom certs

./rke_darwin-amd64-v115rc7 etcd snapshot-restore --name snp1 --config clusterbkp.yml

soumyalj · 2020-08-19T20:06:41Z

Re-tested with v1.2.0-rc7

Created a 3 node cluster using custom certs with the steps below

rke cert generate-csr --cert-dir <custom-dir-path> 

rke up —custom-certs —cert-dir <custom-dir-path>

rke etcd snapshot-save —name snp1 —config cluster.yml

Verified rke state file is present in the snapshot zip folder
Perform a restore specifying the custom certs as below:

./rke_darwin-amd64-v120rc7 etcd snapshot-restore --name snp1 --custom-certs --cert-dir /Users/soumya/rke/certs --config clusterfin.yml

Restore is successful. Statefile is extracted from the zip folder

INFO[0001] Starting container [etcd-extract-statefile] on host [abcd], try #1 
INFO[0002] Successfully started [etcd-extract-statefile] container on host [abcd] 
INFO[0002] Waiting for [etcd-extract-statefile] container to exit on host [abcd] 
INFO[0002] Waiting for [etcd-extract-statefile] container to exit on host [abcd] 
INFO[0002] Removing container [etcd-extract-statefile] on host [abcd], try #1 
INFO[0002] [remove/etcd-extract-statefile] Successfully removed container on host [abcd] 
INFO[0002] State file is successfully extracted from snapshot [snp1] 
INFO[0002] Restoring etcd snapshot snp1

Restore without specifying custom certs is also functional

./rke_darwin-amd64-v120rc7 etcd snapshot-restore --name snp1 --config clusterfin.yml

galal-hussein added the kind/enhancement label May 8, 2019

deniseschannon added this to the v0.3.0 milestone May 8, 2019

deniseschannon added the [zube]: Backlog label May 9, 2019

galal-hussein added the priority/1 label Aug 7, 2019

deniseschannon modified the milestones: v0.3.0, v0.3.x - Backlog Aug 9, 2019

deniseschannon modified the milestones: v0.3.x - Backlog, v1.0.0 Oct 8, 2019

deniseschannon modified the milestones: v1.0.0, v1.1.x Oct 24, 2019

deniseschannon added internal [zube]: To Triage and removed priority/1 [zube]: Backlog labels Oct 30, 2019

deniseschannon added [zube]: Backlog and removed [zube]: To Triage labels Nov 7, 2019

deniseschannon modified the milestones: v1.1 - Rancher v2.4, v1.1.x - Rancher v2.4.x Feb 11, 2020

maggieliu added [zube]: Backlog and removed [zube]: To Triage labels Feb 18, 2020

maggieliu added [zube]: Next Up and removed [zube]: Backlog labels Mar 3, 2020

maggieliu modified the milestones: v1.1.x - Rancher v2.4.x, v1.1 - Rancher v2.4.2 Mar 9, 2020

deniseschannon added the [zube]: Next Up label Mar 9, 2020

superseb mentioned this issue Aug 14, 2020

Change file copy method for state file #2201

Merged

superseb added [zube]: Review and removed [zube]: Working labels Aug 14, 2020

superseb mentioned this issue Aug 17, 2020

[release/v1.1] Change file copy method for state file #2202

Merged

superseb added [zube]: Waiting for RC and removed [zube]: Review labels Aug 17, 2020

soumyalj added the [zube]: Reopened label Aug 18, 2020

zube bot removed the [zube]: To Test label Aug 18, 2020

This was referenced Aug 18, 2020

pin golang.org/x/sys due to breaking API change #2204

Merged

[release/v1.1] pin golang.org/x/sys due to breaking API change #2205

Merged

maggieliu added [zube]: To Test and removed [zube]: Reopened labels Aug 18, 2020

soumyalj added [zube]: Reopened and removed [zube]: To Test labels Aug 18, 2020

maggieliu added [zube]: To Test and removed [zube]: Reopened labels Aug 19, 2020

soumyalj closed this as completed Aug 21, 2020

zube bot added [zube]: Done and removed [zube]: To Test labels Aug 21, 2020

superseb mentioned this issue Aug 27, 2020

Add cluster state file to snapshots created by etcd-rolling-snapshots #2222

Closed

zube bot removed the [zube]: Done label Nov 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Restoring snapshot without pki bundle or rkestate file #1336

Restoring snapshot without pki bundle or rkestate file #1336

galal-hussein commented May 8, 2019 •

edited by patrick0057

Loading

maggieliu commented Feb 28, 2020

superseb commented Aug 13, 2020

soumyalj commented Aug 18, 2020

soumyalj commented Aug 18, 2020

sangeethah commented Aug 19, 2020

sangeethah commented Aug 19, 2020

soumyalj commented Aug 19, 2020 •

edited

Loading

soumyalj commented Aug 19, 2020

Restoring snapshot without pki bundle or rkestate file #1336

Restoring snapshot without pki bundle or rkestate file #1336

Comments

galal-hussein commented May 8, 2019 • edited by patrick0057 Loading

maggieliu commented Feb 28, 2020

superseb commented Aug 13, 2020

soumyalj commented Aug 18, 2020

soumyalj commented Aug 18, 2020

sangeethah commented Aug 19, 2020

sangeethah commented Aug 19, 2020

soumyalj commented Aug 19, 2020 • edited Loading

soumyalj commented Aug 19, 2020

galal-hussein commented May 8, 2019 •

edited by patrick0057

Loading

soumyalj commented Aug 19, 2020 •

edited

Loading