Skip to content

Latest commit

 

History

History
171 lines (111 loc) · 8.05 KB

resource-deletion-with-approval.md

File metadata and controls

171 lines (111 loc) · 8.05 KB
Error in user YAML: (<unknown>): did not find expected key while parsing a block mapping at line 1 column 1
---
title: Neat-enhancement-idea
authors:
- "@alexmt"
  sponsors:
- TBD
  reviewers:
- "@jessesuen"
- TBD
  approvers:
- "@jessesuen"
- TBD

creation-date: 2020-04-19
last-updated: 2020-04-19

---

Neat Enhancement Idea

Support manual approval for pruning and deleting Kubernetes resources during application syncing/deletion.

Summary

Introduce Kubernetes resource-level annotations that require manual user approval using Argo CD UI/CLI/API before the resource is pruned or deleted. The annotations should be respected while Argo CD attempts to synchronize or delete the application.

Motivation

We’ve seen cases where Argo CD deleted Kubernetes resources due to a bug or misconfiguration.​ Examples include corrupted data in Redis, user errors (1, 2) and bug in the automation on top of Argo CD. These examples don’t mean Argo CD is not reliable; however, there are cases where misbehavior is catastrophic, and erroneous deletion is not acceptable. Examples include the app-of-apps pattern where Argo CD is used to manage itself, or namespaces in production clusters.

Goals

The goals of a proposal ares:

Allow developers to mark resources that require manual approval before application deletion.

Developers should be able to add an annotation to resources that require manual approval before deletion. The annotation should be respected by Argo CD when it attempts to delete the application.

Allow developers to mark resources that require manual approval before pruning

Developers should be able to add an annotation to resources that require manual approval before pruning. The annotation should be respected by Argo CD when it attempts to prune extra resources while syncing the application.

Non-Goals

Implement automatic self check while deleting resources

We've made our best effort to implement corrected behavior, and as of now, we are not aware of any bugs that cause erroneous deletion. The goal of this proposal is to provide a safety net for cases where deletion is not acceptable.

Proposal

It is proposed to introduce two new sync options for Argo CD applications: Prune=confirm and Delete=confirm. Options would protect resources from accidental deletion during cascading application deletion as well as during sync operations.

Introduce confirm option for Prune sync option.

Argo CD already supports argocd.argoproj.io/sync-options: Prune=false sync option that prevents resource deletion while syncing the application. This, however, is not ideal since it prevents implementing fully automated workflows that include resource deletion.

In order to improve the situation, we propose to introduce confirm option for Prune sync option. When confirm option is set, Argo CD should pause the sync operation before deleting any app resources and wait for the user to confirm the deletion. The confirmation can be done in a very friendly way using Argo CD UI, CLI or API.

  • Sync Operation status. I suggest not to introduce new sync operation states to avoid disturbing the existing automation around syncing (CI pipelines, scripts etc). If Argo CD is waiting for the operation state should remain Progressing. Once the user confirms the deletion, the operation should resume.
  • Sync Waves. The sync wave shuold be "paused" while Argo CD is waiting for the user to confirm the deletion. No difference from waiting for the resource to became healthy.

Introduce confirm option for Delete sync option.

Similarly to Prune sync option we need to introduce confirm value for Delete sync option: argocd.argoproj.io/sync-options: Delete=confirm. The confirm option should pause the sync operation before deleting any app resources and wait for the user to confirm the deletion. The confirmation can be done in a very friendly way using Argo CD UI, CLI or API.

Friendly prunning/deletion manual approval

Since we know Argo CD is often used to implement fully automated developer workflows that include resource deletion, the deletion approval process should be as painless as possible. This way, platform administrators can instruct end users to apply the new prune/delete option to resources that require special care without significantly disturbing the developer experience.

In both cases where Argo CD requires manual approval, the user should be able to approve the deletion using Argo CD UI, CLI, or API. The approval process should be as simple as possible and should not require the user to understand the internals of Argo CD.

New requiresDeletionApproval resource field in application status

A new field requiresDeletionApproval should be added to the status.resources list items. The field should be set to true when the resource deletion approval is required.

  - health:
      status: Healthy
    kind: Service
    name: guestbook-ui
    namespace: default
    status: OutOfSync
    version: v1
    requiresPruning: true
    requiresDeletionApproval: true # new field that indicates that deletion approval is required

The Argo CD UI, CLI should visualize the requiresDeletionApproval field so that the user can easily discover which resources require manual approval.

Approve deletion resource action

The Argo CD UI, CLI should bundle the Approve Deletion resource action that would allow the user to approve the deletion. The action should patch the resource with the argocd.argoproj.io/deletion-approved: true annotation. Once annotation is applied the Argo CD should proceed with the deletion.

The main reason to use the action is that we can reuse existing RBAC to control who can approve the deletion.

UI/CLI Convinience to approve all resources

The Argo CD UI should provide a convinient way to approve resources that require manual approval. The existing user interface will provide a button that allows end user execute the Approve Deletion action and approve resources one by one. In addition to the single resource approval, the UI should provide a way to approve all resources that require manual approval. The new button should execute the Approve Deletion action for all resources that require manual approval.

Argo CD CLI would no need changes since existing argocd app actions run command allows to execute an action against multiple resources.

Require deletion approval notification

The default Argo CD notification catalog should include a trigger and notification template that notifies the user when deletion approval is required. The notification template should include a list of resources that require approval.

Declarative approval

The user should be able to approve resource deletion without using the UI or CLI by manually adding the argocd.argoproj.io/deletion-approved: true annotation to the resource.

Use cases

Add a list of detailed use cases this enhancement intends to take care of.

Use case 1:

As a developer, I would like to mark resources that require manual pruning approval so I can prevent the accidental deletion of critical resources.

Use case 2:

As a developer, I would like to mark resources that require manual deletion approval so I can prevent the accidental deletion of critical resources.

Security Considerations

The resource approval would require a mechanism to control who can approve the deletion. The proposal to use resource-level actions solves this problem and allows us to reuse the existing RBAC model.

Risks and Mitigations

None.

Upgrade / Downgrade Strategy

In case of rollback to the previous version the sync option would be ignored and the resources would be deleted as before.

Open Issues

The proposal would require end users to learn about the new behavior and adjust their workflows. It includes a set of enhancements aimed at minimizing the impact on end users.

Alternatives

None.