Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling of cleanup job errors should be improved #877

Closed
amisevsk opened this issue Jun 27, 2022 · 0 comments · Fixed by #879
Closed

Handling of cleanup job errors should be improved #877

amisevsk opened this issue Jun 27, 2022 · 0 comments · Fixed by #879
Assignees
Milestone

Comments

@amisevsk
Copy link
Collaborator

amisevsk commented Jun 27, 2022

Description

Recently DWO began watching PVC cleanup jobs for errors and reporting them as failures in workspace cleanup. However, a side-effect of this detection is that it can result in DevWorkspaces unnecessarily being stuck in a terminating state in the event that a cleanup job encounters a transient error that later resolves:

  1. DevWorkspace is deleted, cleanup job is created
  2. Cleanup job encounters an error, workspace is set to Errored state
  3. Error in cleanup job is resolved, job runs successfully
  4. Finalizer is not cleared as we don't check errored workspaces

This is a significant issue, as unlike the DevWorkspace startup case (where a DevWorkspace can just be restarted), there's no way to clear the errored status from a DevWorkspace. As a result, users must check the cleanup jobs status, notice that it completed successfully, and then remove the finalizer from the DevWorkspace manually.

How To Reproduce

Not easy to reproduce as it requires a transient error in the cluster, but the recent encounter was a few workspaces that were stuck terminating due to CreateContainerError errors in the cleanup job. This seems to have been due to some temporary issue on the cluster as all the jobs had been completed and event history had been cleared by the time it was noticed.

Additional context

@amisevsk amisevsk self-assigned this Jun 27, 2022
@amisevsk amisevsk mentioned this issue Jul 4, 2022
3 tasks
@amisevsk amisevsk added this to the v0.15.x milestone Jul 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant