WIP: Add yaml manifest to cleanup all Antrea resources #307

antoninbas · 2020-01-10T23:41:08Z

This is still very much a work-in-progress, I'm opening this PR to
gather feedback on the approach.

For someone deleting Antrea, the steps would be as follows:

kubectl delete -f <path to antrea.yml>
kubectl apply -f <path to antrea-cleanup.yml>
check that job has completed with kubectl -n kube-system get jobs
kubectl delete -f <path to antrea-cleanup.yml>

The cleanup manifest creates a DaemonSet that will perform the necessary
deletion tasks on each Node. After the tasks have been completed, the
"status" is reported to the cleanup controller through a custom
resource. Once the controller has received enough statuses (or after a
timeout of 1 minutes) the controller job completes and the user can
delete the cleanup manifest.

Known remaining items:

place cleanup binaries (antrea-cleanup-agent and
antrea-cleanup-controller) in a separate docker image to avoid
increasing the size of the main Antrea docker image
generate manifest with kustomize?
find a way to test this as part of CI?
update documentation
additional cleanup tasks: as of now we only take care of deleting the
OVS bridge
place cleanup CRD in non-default namespace
use kubebuilder instead of the code generator directly (related to
Use kubebuilder 1.16 instead of the code generator directly for CRDs #16); we probably want to punt this task to a future PR.

See #181

antrea-bot · 2020-01-10T23:41:20Z

Thanks for your PR.
Unit tests and code linters are run automatically every time the PR is updated.
E2e, conformance and network policy tests can only be triggered by a member of the vmware-tanzu organization. Regular contributors to the project should join the org.

The following commands are available:

/test-e2e: to trigger e2e tests.
/skip-e2e: to skip e2e tests.
/test-conformance: to trigger conformance tests.
/skip-conformance: to skip conformance tests.
/test-networkpolicy: to trigger networkpolicy tests.
/skip-networkpolicy: to skip networkpolicy tests.
/test-all: to trigger all tests.
/skip-all: to skip all tests.

These commands can only be run by members of the vmware-tanzu organization.

This is still very much a work-in-progress, I'm opening this PR to gather feedback on the approach. For someone deleting Antrea, the steps would be as follows: * `kubectl delete -f <path to antrea.yml>` * `kubectl apply -f <path to antrea-cleanup.yml>` * check that job has completed with `kubectl -n kube-system get jobs` * `kubectl delete -f <path to antrea-cleanup.yml>` The cleanup manifest creates a DaemonSet that will perform the necessary deletion tasks on each Node. After the tasks have been completed, the "status" is reported to the cleanup controller through a custom resource. Once the controller has received enough statuses (or after a timeout of 1 minutes) the controller job completes and the user can delete the cleanup manifest. Known remaining items: * place cleanup binaries (antrea-cleanup-agent and antrea-cleanup-controller) in a separate docker image to avoid increasing the size of the main Antrea docker image * generate manifest with kustomize? * find a way to test this as part of CI? * update documentation * additional cleanup tasks: as of now we only take care of deleting the OVS bridge * place cleanup CRD in non-default namespace * use kubebuilder instead of the code generator directly (related to antrea-io#16); we probably want to punt this task to a future PR. See antrea-io#181

jianjuns · 2020-01-18T00:54:50Z

@antoninbas: do you think we can reuse the AgentInfo and ControllerInfo CRD for the cleanup task status, instead of adding another CRD?

Also two other thoughts:

We need to provide a way to clean up node when K8s is down. Should we consider antctl for that?
We might not be able to do cleanup with a DS/Pod for Windows. In that case, do you assume some powershell scripts to start the cleanup service?
Thinking if it is easier to let the Agent perform the cleanup in that case.

jianjuns · 2020-01-18T05:54:06Z

We might not be able to do cleanup with a DS/Pod for Windows. In that case, do you assume some powershell scripts to start the cleanup service?
Thinking if it is easier to let the Agent perform the cleanup in that case.

Thought over. For Windows, we should use powershell to install Controllers and Agents, and so it makes sense to use powershell to uninstall and start cleanup. But in that case, we need not coordination between cleanup-agent and cleanup-controller, but just need one-off execution for cleaning up both Controller and Agent resources.
@wenyingd might have a better idea.

antoninbas · 2020-03-16T22:05:28Z

@jianjuns FYI, I'm planning on starting work again on this PR this week and address these 2 comments from you:

try to reuse existing CRD API groups to report status back to deletion controller
provide "local" cleanup mechanism on a single Node when K8s is down (though antctl)

jianjuns · 2020-03-17T00:30:25Z

@jianjuns FYI, I'm planning on starting work again on this PR this week and address these 2 comments from you:

Sounds great!

* try to reuse existing CRD API groups to report status back to deletion controller

* provide "local" cleanup mechanism on a single Node when K8s is down (though antctl)

Will your cleanup DS also executes antctl?

antoninbas · 2020-03-17T00:53:59Z

Will your cleanup DS also executes antctl?

My idea was that they would share their implementation in pkg/agent/cleanup, but that the DS would not invoke antctl directly.

antrea-bot · 2020-07-29T11:27:20Z

Can one of the admins verify this patch?

github-actions · 2021-07-25T00:18:44Z

This PR is stale because it has been open 180 days with no activity. Remove stale label or comment, or this will be closed in 180 days

vmwclabot added the cla-not-required label Jan 10, 2020

antoninbas force-pushed the antrea-cleanup branch from 98f77e0 to fb46399 Compare January 10, 2020 23:47

antoninbas mentioned this pull request Jan 20, 2020

Deleting Antrea should do a complete cleanup #181

Closed

jianjuns mentioned this pull request Jan 30, 2020

support various pod traffic encapsulation mode #340

Merged

Base automatically changed from master to main January 26, 2021 00:00

github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 25, 2021

github-actions bot closed this Dec 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Add yaml manifest to cleanup all Antrea resources #307

WIP: Add yaml manifest to cleanup all Antrea resources #307

antoninbas commented Jan 10, 2020

antrea-bot commented Jan 10, 2020

jianjuns commented Jan 18, 2020

jianjuns commented Jan 18, 2020

antoninbas commented Mar 16, 2020

jianjuns commented Mar 17, 2020

antoninbas commented Mar 17, 2020

antrea-bot commented Jul 29, 2020

github-actions bot commented Jul 25, 2021

WIP: Add yaml manifest to cleanup all Antrea resources #307

WIP: Add yaml manifest to cleanup all Antrea resources #307

Conversation

antoninbas commented Jan 10, 2020

antrea-bot commented Jan 10, 2020

jianjuns commented Jan 18, 2020

jianjuns commented Jan 18, 2020

antoninbas commented Mar 16, 2020

jianjuns commented Mar 17, 2020

antoninbas commented Mar 17, 2020

antrea-bot commented Jul 29, 2020

github-actions bot commented Jul 25, 2021