WIP: ETCD-295: Automated Backups of Etcd #1370

hasbro17 · 2023-03-21T00:26:14Z

Enhancement proposal for enabling automated backups of etcd in Openshift.
See:
https://issues.redhat.com/browse/ETCD-295
https://issues.redhat.com/browse/ETCD-81
https://issues.redhat.com/browse/OCPBU-252

/hold
For further iteration on outstanding TODOs.

openshift-ci-robot · 2023-03-21T00:26:17Z

@hasbro17: This pull request references ETCD-295 which is a valid jira issue.

In response to this:

Enhancement proposal for enabling automated backups of etcd in Openshift.
See:
https://issues.redhat.com/browse/ETCD-295
https://issues.redhat.com/browse/ETCD-81
https://issues.redhat.com/browse/OCPBU-252

/hold
For further iteration on outstanding TODOs.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

tjungblu · 2023-03-21T12:08:30Z

enhancements/etcd/automated-backups.md

+
+To enable automated backups a new singleton CRD can be used to specify the backup schedule and other configuration as well as monitor the status of the backup operations. 
+
+A new controller in the cluster-etcd-operator would then reconcile this CRD to save backups locally per the specified configuration.


super nitpick, I think we should make the existing UpgradeBackupController more general instead of writing an entirely new controller.

edit: that's also covered below - thanks!

tjungblu · 2023-03-21T12:18:24Z

enhancements/etcd/automated-backups.md

+        // e.g retain 500mb of backups, or discard backups older than a month
+        // If we want to extend this API in the future we should make retention
+        // a type so we can add more optional fields later
+    RetentionCount int `json: "retentionCount"`


if we already have good ideas on other types of retention, maybe add a struct for this? I know it's TP MVP, but let's not introduce something we have to completely change in v1alpha2

if we already have good ideas on other types of retention, maybe add a struct for this? I know it's TP MVP, but let's not introduce something we have to completely change in v1alpha2

Agreed, you want to get close on your first attempt and this looks ripe for discriminated union.

tjungblu · 2023-03-21T12:21:24Z

enhancements/etcd/automated-backups.md

+    // TODO: Validation on min and max durations.
+        // The max should be smaller than the scheduled interval
+        // for the next backup else the alert will never be generated
+    MaxDurationToCompleteBackup string `json: "maxDurationWithoutBackup"`


// The max should be smaller than the scheduled interval

do you think that's easy to get out of the cron syntax?

I would maybe leave this out for now and just degrade the operator like we're currently doing in UpgradeBackupController. Taking a snapshot shouldn't take longer than a minute.

I have no preference for a timeout as long as the operator reports degraded.

tjungblu · 2023-03-21T12:29:34Z

enhancements/etcd/automated-backups.md

+
+To enforce the specified schedule we can utilize [CronJobs](https://kubernetes.io/docs/concepts/workloads/controllers/cron-jobs/) to run backup pods.
+
+**TODO:** If load balancing between control-plane nodes is a requirement then it needs to be determined how we can achieve that with the Job or Pod spec.


actually, that's an interesting issue we wouldn't have with a single mounted PVC.
Let's say I have RetentionCount = 3, how do we know where to find and delete the respective snapshot on which CP node?

jewzaam · 2023-03-24T14:03:07Z

enhancements/etcd/automated-backups.md

+
+The procedure also requires gaining a root shell on a control plane
+node. Shell access to OpenShift control plane nodes access is generally
+discouraged due to the potential for affecting the reliability of the


Beyond discouraged, it is not possible for hosted control planes in a managed environment where hosted cluster admin does not administer the env in which the hosted control plane (and etcd) is running.

jewzaam · 2023-03-24T14:05:21Z

enhancements/etcd/automated-backups.md

+
+- One-time cluster backup can be triggered without requiring a root shell
+- Scheduled backups can be configured after cluster installation
+- Backups are saved locally on the host filesystem of control-plane nodes


How does this work for hosted control planes?

can the admin in the hosted cluster, who doesn't manage the control plane, initiate this backup?

if they can what protections are in place to ensure backups do not impact the hosting node in a way that may negaively impact other control plane pods hosted on the node where the backup is saved?

I expect this enhancement to cover only self-hosted. Hosted control planes (hypershift), took separate ownership of etcd by developing their own operator. (see https://github.com/openshift/hypershift/blob/a0377ae793b20fb565ac8852ee4db1e5bb4780da/control-plane-operator/controllers/hostedcontrolplane/etcd/reconcile.go#L1). Hypershift will have to developer their own distinct backup and restore mechanism.

The problem-space for a self-hosted cluster without external management is very different than a hosted cluster with external management. Both in terms of observability of the situation and in terms of cluster state reconciliation.

"Separate ownership" today, doesn't mean it needs to be separate forever, does it? It seems like it would be possible to move towards a unified approach. For example, in a hosted control plane world, it's not clear to me which persona(s) would have which access to backups. Possibly the Cluster Service Consumer, if we lump backup restoration in as similar to "drive upgrades". Or possibly the Cluster Service Provider, if we lump backup restoration in under "Has cluster admin management". But either way, I'd expect etcd backup/restore tooling to live on the management cluster and not leak out to the Cluster Instance ... roles. Which would argue for a namespaced CRD, so a management cluster could have these resources in each of its per-hosted-cluster namespaces.

interesting take, I would rather see more separation instead of more unification. For example, we rebased etcd earlier this year and discovered it broke a customer cluster. checkout etcd-io/etcd#15062

We didn't even know that Hypershift was using openshift/etcd as their etcd. They are certainly not using CEO, which makes reusing the backup CRD moot as they would have to implement their own business logic for backups anyway.

CEO is also currently a singleton in the cluster (don't think we want to change that either), so namespacing that CRD would make little sense in OCP.

jewzaam · 2023-03-24T14:07:33Z

enhancements/etcd/automated-backups.md

+- Save cluster backups to PersistentVolumes or cloud storage
+  - This would be a future enhancement or extension to the API


This may be required for adoption in managed openshift and hosted control planes to mitigate possible risks. For automated backups to be useful for managed it should be usable by the end customer. But the end customer is not managing the control plane and do not have control on the configuration of the host volume, so there is a risk we may not be able to accept depending on how much impact those backups have.

jewzaam · 2023-03-24T14:08:14Z

enhancements/etcd/automated-backups.md

+- Have automated backups enabled by default with cluster installation
+- Save cluster backups to PersistentVolumes or cloud storage
+  - This would be a future enhancement or extension to the API
+- Automate cluster restoration


Is this expected as a future enhancement? Not stated here as in the prior bullet.

jewzaam · 2023-03-24T14:08:52Z

enhancements/etcd/automated-backups.md

+### User Stories
+
+- As a cluster administrator I want to initiate a cluster backup without requiring a root shell on a control plane node so as to minimize the risk involved
+- As a cluster administrator I want to schedule recurring cluster backups so that I have a recent cluster state to recover from in the event of quorum loss (i.e. losing a majority of control-plane nodes)


How many backups are retained when there is this schedule?

deads2k · 2023-03-27T22:38:58Z

enhancements/etcd/automated-backups.md

+
+### API Extensions
+
+A new CRD `EtcdBackup` will be introduced to the API group `config.openshift.io` with the version `v1alpha1`. The CRD would be feature-gated until the prerequisite e2e test has an acceptable pass rate. 


I think Backup with a region for etcd and possibly a future region for other items. Velero, PVs, somethng else.

deads2k · 2023-03-27T22:40:23Z

enhancements/etcd/automated-backups.md

+    // The default "" means no backup is triggered
+    // +kubebuilder:default:=""
+    // +optional
+    Reason string `json:"reason"`


In our existing operators, we have a forceRedeploymentReason or some such, right? Given your need to track backups, it may be more practical to have an object like the Insights collection that indicates a desire to run a backup "now".

deads2k · 2023-03-27T22:41:10Z

enhancements/etcd/automated-backups.md

+    // Default: ""
+    // TODO: Define how we do validation for the format (e.g validation webhooks)
+    // and the limits on the frequency to disallow unrealistic schedules (e.g */2 * * * * every 2 mins)
+    // TODO: What is the behavior if the backup doesn't complete in the specified interval


having an object like Insights collection helps here too, since it gives an "obvious" item to delete. Versus finding a specific instance of a job in a managed namespace.

deads2k · 2023-03-27T22:41:21Z

enhancements/etcd/automated-backups.md

+      // every day at 3am: 0 3 * * *
+    // Setting to an empty string "" means disabling scheduled backups
+    // Default: ""
+    // TODO: Define how we do validation for the format (e.g validation webhooks)


try CEL first

deads2k · 2023-03-27T22:41:28Z

enhancements/etcd/automated-backups.md

+
+### API Extensions
+
+A new CRD `EtcdBackup` will be introduced to the API group `config.openshift.io` with the version `v1alpha1`. The CRD would be feature-gated until the prerequisite e2e test has an acceptable pass rate. 


glad to see alpha here.

deads2k · 2023-03-27T22:43:40Z

enhancements/etcd/automated-backups.md

+```
+
+**TODO:** How do we feature gate the entire type?
+For a field within a non-gated type it seems to be the tag [`// +openshift:enable:FeatureSets=TechPreviewNoUpgrade`](https://github.com/openshift/api/blob/master/example/v1/types_stable.go#L33). No such tag visible on the [whole type gated example](https://github.com/openshift/api/blob/1957a8d7445bf2332f027f93a24d7573f77a0dc0/example/v1alpha1/types_notstable.go#L19).


example of a gate on an entire type here: https://github.com/openshift/api/blob/master/example/v1alpha1/0000_50_notstabletype-techpreview.crd.yaml

deads2k · 2023-03-27T22:43:56Z

enhancements/etcd/automated-backups.md

+
+    // Conditions represents the observations of the EtcdBackup's current state.
+    // TODO: Identify different condition types/reasons that will be needed.
+    // +optional


in the actual PR, you'll need the SSA tags.

deads2k · 2023-03-27T22:44:15Z

enhancements/etcd/automated-backups.md

+    // EtcdBackup. It corresponds to the EtcdBackup's .metadata.generation that the condition was set based upon.
+    // This lets us know whether the latest condition is stale.
+    // +optional
+    ObservedGeneration int64 `json:"observedGeneration,omitempty"`


This hasn't been done in our config API before. I suggest not starting now.

enhancements/etcd/automated-backups.md

deads2k · 2023-03-27T22:46:33Z

enhancements/etcd/automated-backups.md

+
+#### Execution method
+
+Create a backup pod that runs the backup script on a designated master node to save the backup file on the node's host filesystem:


the job construct seems like a good choice here. I don't object to a pod, but are you sure you don't want a job instead?

deads2k · 2023-03-27T22:49:43Z

enhancements/etcd/automated-backups.md

+  - If significant, should we avoid selecting the leader to reduce the risk of overloading the leader and triggering a leader election?
+  - We can find the leader via the [endpoint status](https://etcd.io/docs/v3.5/dev-guide/api_reference_v3/#message-statusresponse-apietcdserverpbrpcproto) but the leader can change during the course of the backup or we may not have the option of filtering the leader out if the backup pod is unschedulable on non-leader nodes (e.g low disk space). 
+- Other than the leader IO considerations, when picking where to run and save the backup, do we need to load balance or round-robin across all control-plane nodes?
+  - Leaving it to the scheduler to schedule the backup pod (or having the operator save on the node it's running on) is simpler, but we could end up with all/most backups being saved on a single node. In a disaster recovery scenario we could lose that node and the backups saved.


I think a force assignment policy of some kind is a good choice.

deads2k · 2023-03-27T22:50:45Z

enhancements/etcd/automated-backups.md

+
+### Test Plan
+
+Prior to merging the backup controller implementation, an e2e test would be added to openshift/origin to test the validity of the backups generated from the automated backups feature. The pass rate on this test will feature gate the API until we pass an acceptable threshold to make it available by default.


Gating by techpreview means you can merge your impl at will.

deads2k · 2023-03-27T22:52:37Z

enhancements/etcd/automated-backups.md

+
+### Test Plan
+
+Prior to merging the backup controller implementation, an e2e test would be added to openshift/origin to test the validity of the backups generated from the automated backups feature. The pass rate on this test will feature gate the API until we pass an acceptable threshold to make it available by default.


promoting to "accessible-by-default" will be allowed once there is an acceptable history of passing restore

deads2k · 2023-03-27T22:53:40Z

enhancements/etcd/automated-backups.md

+
+#### Tech Preview -> GA
+
+- TBD


this is known. Track record of reliable restore. I propose 99% on all platforms as a starting point since 99% of our tests currently pass at this threshold and consequences of failed restore is extremely high.

hasbro17 · 2023-04-03T00:54:28Z

Some updates:

Removed MaxDurationToCompleteBackup alert in favor of simply making the backup controller go degraded if it notices a backup job didn’t complete in some default timeout.
Replaced RetentionCount int with a discriminated union field RetentionPolicy which supports retention by count or size. Leaves us open to adding more (by time) later.
Added some CEL validation rules for API spec.
Clarified that non-self hosted clusters like Hypershift are out of scope for the enhancement
Added a section on backup timeout failure and setting a degraded condition.
Graduation criteria states the feature gate and 99% pass rate requirement.
Expanded the Test Plan section a little bit with how we want to validate a successful restore.

The most significant change however is splitting up the original EtcdBackup API and controller into two different APIs and controllers so we can better track the status for individual backup requests without lumping them into conditions on the config API (#1370 (comment)):

EtcdBackupRequest for handling one time backup requests so we can better track the status of multiple past backups
EtcdBackupSchedule for configuring the schedule for recurring backups which outsources the actual backup requests via EtcdBackupRequest

For now another look at the updated Workflow and API sections would be helpful to know whether the two APIs work better for configuration and status tracking. @deads2k

I still have some other ToDos to follow up. Primarily #1370 (comment) to figure out on how we would enforce the retention policy for backup files spread across all nodes but will update in another follow up.

Removed `MaxDurationToCompleteBackup` alert in favor of simply making the backup controller go degraded if it notices a backup job didn’t complete in some default timeout. Replaced RetentionCount int with a discriminated union field RetentionPolicy which supports retention by count or size. Leaves us open to adding more (by time) later. Added some CEL validation rules for API spec Clarified that non-self hosted clusters like Hypershift are out of scope for the enhancement Added a section on backup timeout failure and setting a degraded condition. Graduation criteria states the feature gate and 99% pass rate requirement. Expanded the Test Plan section a little bit with how we want to validate a successful restore. Split up the original EtcdBackup API and controller into two different APIs and controllers EtcdBackupRequest for handling one time backup requests so we can better track the status of multiple past backups EtcdBackupSchedule for configuring the schedule for recurring backups which outsources the actual backup requests via EtcdBackupRequest

openshift-ci · 2023-06-22T00:16:20Z

@openshift-bot: Closed this PR.

In response to this:

Rotten enhancement proposals close after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Reopen the proposal by commenting /reopen.
Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Exclude this proposal from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

hasbro17 · 2023-06-22T19:23:56Z

/reopen
/remove-lifecycle rotten

Going to update with the changes from the API PR so far.

openshift-ci · 2023-06-22T19:24:10Z

@hasbro17: Reopened this PR.

In response to this:

/reopen
/remove-lifecycle rotten

Going to update with the changes from the API PR so far.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot · 2023-06-22T19:24:12Z

@hasbro17: This pull request references ETCD-295 which is a valid jira issue.

In response to this:

Enhancement proposal for enabling automated backups of etcd in Openshift.
See:
https://issues.redhat.com/browse/ETCD-295
https://issues.redhat.com/browse/ETCD-81
https://issues.redhat.com/browse/OCPBU-252

/hold
For further iteration on outstanding TODOs.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci · 2023-06-22T19:24:36Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign pacevedom for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci · 2023-06-22T19:31:52Z

@hasbro17: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/markdownlint	`f1030d8`	link	true	`/test markdownlint`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

dhellmann · 2023-06-22T21:33:00Z

@hasbro17 This design is linked to a jira ticket that is in progress. Is the design complete? Who's sign-off do you need to merge this PR?

openshift-bot · 2023-07-21T01:15:19Z

Inactive enhancement proposals go stale after 28d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle stale.
Stale proposals rot after an additional 7d of inactivity and eventually close.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle stale

openshift-bot · 2023-07-28T08:45:35Z

Stale enhancement proposals rot after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Rotten proposals close after an additional 7d of inactivity.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

hasbro17 · 2023-07-31T18:39:27Z

@dhellmann My apologies, I didn't get a chance to update this. We've had some design changes over on the API PR that we've already merged (openshift/api#1482) and we're almost finishing up the implementation.

So the design for 4.14 is complete but I need to document the API changes back here and update the proposal for the follow ups and future goals. Will ping the reviewers once I have that done.

/remove-lifecycle rotten

openshift-bot · 2023-08-29T01:15:13Z

Inactive enhancement proposals go stale after 28d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle stale.
Stale proposals rot after an additional 7d of inactivity and eventually close.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle stale

openshift-bot · 2023-09-05T08:46:06Z

Stale enhancement proposals rot after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Rotten proposals close after an additional 7d of inactivity.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

openshift-bot · 2023-09-13T00:15:30Z

Rotten enhancement proposals close after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Reopen the proposal by commenting /reopen.
Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Exclude this proposal from closing again by commenting /lifecycle frozen.

/close

openshift-ci · 2023-09-13T00:16:03Z

@openshift-bot: Closed this PR.

In response to this:

Rotten enhancement proposals close after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Reopen the proposal by commenting /reopen.
Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Exclude this proposal from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

dhellmann · 2023-09-15T13:55:21Z

(automated message) This pull request is closed with lifecycle/rotten. The associated Jira ticket, ETCD-81, has status "Testing". Should the PR be reopened, updated, and merged? If not, removing the lifecycle/rotten label will tell this bot to ignore it in the future.

dhellmann · 2023-09-22T12:52:10Z

(automated message) This pull request is closed with lifecycle/rotten. The associated Jira ticket, ETCD-81, has status "Testing". Should the PR be reopened, updated, and merged? If not, removing the lifecycle/rotten label will tell this bot to ignore it in the future.

dhellmann · 2023-10-06T13:25:52Z

(automated message) This pull request is closed with lifecycle/rotten. The associated Jira ticket, ETCD-81, has status "Testing". Should the PR be reopened, updated, and merged? If not, removing the lifecycle/rotten label will tell this bot to ignore it in the future.

dhellmann · 2023-10-13T13:34:40Z

(automated message) This pull request is closed with lifecycle/rotten. The associated Jira ticket, ETCD-81, has status "Testing". Should the PR be reopened, updated, and merged? If not, removing the lifecycle/rotten label will tell this bot to ignore it in the future.

Automated Backups of Etcd

e53a045

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Mar 21, 2023

openshift-ci bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Mar 21, 2023

openshift-ci bot requested review from dmage and dustymabe March 21, 2023 00:26

tjungblu reviewed Mar 21, 2023

View reviewed changes

jewzaam reviewed Mar 24, 2023

View reviewed changes

deads2k reviewed Mar 27, 2023

View reviewed changes

enhancements/etcd/automated-backups.md Show resolved Hide resolved

deads2k reviewed Mar 27, 2023

View reviewed changes

hasbro17 force-pushed the automated-backups-for-etcd branch from 7e20250 to 5ccb1a1 Compare April 8, 2023 01:25

hasbro17 force-pushed the automated-backups-for-etcd branch from 5ccb1a1 to 4e5b439 Compare April 8, 2023 01:32

openshift-ci bot reopened this Jun 22, 2023

openshift-ci bot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jun 22, 2023

openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 21, 2023

openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 28, 2023

openshift-ci bot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jul 31, 2023

openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 29, 2023

openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Sep 5, 2023

openshift-ci bot closed this Sep 13, 2023

hasbro17 mentioned this pull request Oct 3, 2023

Automated Backups of Etcd #1486

Merged

dhellmann removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Oct 13, 2023


		To enable automated backups a new singleton CRD can be used to specify the backup schedule and other configuration as well as monitor the status of the backup operations.

		A new controller in the cluster-etcd-operator would then reconcile this CRD to save backups locally per the specified configuration.


		To enforce the specified schedule we can utilize [CronJobs](https://kubernetes.io/docs/concepts/workloads/controllers/cron-jobs/) to run backup pods.

		TODO: If load balancing between control-plane nodes is a requirement then it needs to be determined how we can achieve that with the Job or Pod spec.

		- Save cluster backups to PersistentVolumes or cloud storage
		- This would be a future enhancement or extension to the API


		### API Extensions

		A new CRD `EtcdBackup` will be introduced to the API group `config.openshift.io` with the version `v1alpha1`. The CRD would be feature-gated until the prerequisite e2e test has an acceptable pass rate.


		#### Execution method

		Create a backup pod that runs the backup script on a designated master node to save the backup file on the node's host filesystem:


		### Test Plan

		Prior to merging the backup controller implementation, an e2e test would be added to openshift/origin to test the validity of the backups generated from the automated backups feature. The pass rate on this test will feature gate the API until we pass an acceptable threshold to make it available by default.

WIP: ETCD-295: Automated Backups of Etcd #1370

WIP: ETCD-295: Automated Backups of Etcd #1370

Conversation

hasbro17 commented Mar 21, 2023

openshift-ci-robot commented Mar 21, 2023 • edited by openshift-ci bot Loading

tjungblu Mar 21, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

deads2k Mar 24, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hasbro17 commented Apr 3, 2023

openshift-ci bot commented Jun 22, 2023

hasbro17 commented Jun 22, 2023

openshift-ci bot commented Jun 22, 2023

openshift-ci-robot commented Jun 22, 2023 • edited by openshift-ci bot Loading

openshift-ci bot commented Jun 22, 2023

openshift-ci bot commented Jun 22, 2023

dhellmann commented Jun 22, 2023

openshift-bot commented Jul 21, 2023

openshift-bot commented Jul 28, 2023

hasbro17 commented Jul 31, 2023

openshift-bot commented Aug 29, 2023

openshift-bot commented Sep 5, 2023

openshift-bot commented Sep 13, 2023

openshift-ci bot commented Sep 13, 2023

dhellmann commented Sep 15, 2023

dhellmann commented Sep 22, 2023

dhellmann commented Oct 6, 2023

dhellmann commented Oct 13, 2023

openshift-ci-robot commented Mar 21, 2023 •

edited by openshift-ci bot

Loading

tjungblu Mar 21, 2023 •

edited

Loading

deads2k Mar 24, 2023 •

edited

Loading

openshift-ci-robot commented Jun 22, 2023 •

edited by openshift-ci bot

Loading