-
Notifications
You must be signed in to change notification settings - Fork 583
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added retry for snapshot and s3 upload, verify etcd running on host #2952
Added retry for snapshot and s3 upload, verify etcd running on host #2952
Conversation
0e2f14b
to
f0b0b31
Compare
Super nit: In the desc, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments to be addressed
Added option to quiet noisy container logs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but will approve after the CI failure is fixed.
@annablender @jiaqiluo we may have to figure something out here, because |
286ec62
to
1b67efc
Compare
LGTM. Please squash commits before merging |
rancher/rancher#37639
rke-tools
), so rke has been updated to reflect this. Additionally, rke will now fall back to using the snapshot on the node. This may change if rancher is to separate local and s3 backups for rke1 like rke2, but will have considerations regarding retention (although more than likely we will track retention separately for local/s3 snapshots much like how we are for failed snapshots (Needs Discussion)). There are already considerations regarding the desync between the rancher etcdbackup objects, and the actual backups in both s3 and on the nodes, and a solutation to backpopulate these objects similar to rke2 may actually be desired here in order to minimize the risk of a user restoring an impossible snapshot.IsContainerRunning
toDoesContainerExist
which is what was previously being checked; addedIsContainerRunning
for when you really want to make sure a container is running (likeetcd
before taking a snapshot). There are most likely several places whereIsContainerRunning
or an equivalent check on a specific container would be useful that checking if it exists (I expect there would likely be some breakages today if some containers were manually stopped), but I considered this out of the scope of this PR because there are already significant changes and identifying which containers would benefit is likely more trouble than it's worth.Depends on rancher/rke-tools#153 and subsequent KDM update.