Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add finalizing status and controller to restore workflow #7183

Closed
allenxu404 opened this issue Dec 6, 2023 · 4 comments · Fixed by #7317 or #7377
Closed

Add finalizing status and controller to restore workflow #7183

allenxu404 opened this issue Dec 6, 2023 · 4 comments · Fixed by #7317 or #7377
Assignees
Milestone

Comments

@allenxu404
Copy link
Contributor

allenxu404 commented Dec 6, 2023

Describe the problem/challenge you have
Currently with the introduction of async operation, once all items have been restored, the status will change to either WaitingForPluginOperations or WaitingForPluginOperationsPartiallyFailed from the InProgress status. It then transitions directly to a terminal status once all plugin operations finish.

We also need a new Finalizing status to the restore workflow. This would be entered after all items and plugin operations have completed as how backup workflow works. Its purpose would be to perform any wrap-up work before transitioning the restore to a terminal status.

Describe the solution you'd like
We could introduce a new Finalizing status to the restore workflow the way backup does. A finalizing controller could then handle this status phase.

A finalizing controller could perform any generic wrap-up tasks including: validating all data restored correctly, cleaning up temporary resources, final logging/reporting. It's not only applicable for plugin operations.

Anything else you would like to add:

Environment:

  • Velero version (use velero version):
  • Kubernetes version (use kubectl version):
  • Kubernetes installer & version:
  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release):

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

  • 👍 for "The project would be better with this feature added"
  • 👎 for "This feature will not enhance the project in a meaningful way"
@sseago
Copy link
Collaborator

sseago commented Dec 7, 2023

What work will be done during this phase? When the async operations were initially added, we needed the Finalizing state as there were some specific workflow steps that needed to be done then, most importantly updating the backup tarball with updated item yaml for the list of items that async plugins had indicated that needed update after completion of operations. Is there some task that's needed for restore workflow that's not being done now which needs to be done after all async operations are completed? We only need to add this new phase if there's necessary work to do at this point.

@sseago
Copy link
Collaborator

sseago commented Dec 7, 2023

In other words, before we decide to add a new phase, we need to identify one or more features we need to add to the Restore workflow that can't be easily handled with the current phases/controllers. If we then decide we need a new phase (and new controller) to implement the feature(s), then we can move forward with this.

@shubham-pampattiwar
Copy link
Collaborator

shubham-pampattiwar commented Dec 11, 2023

@allenxu404 What kind of "wrap-up" tasks are we gonna perform in this new finalizing state ? May be listing them or giving some examples will throw some light on the issue at hand.

@allenxu404
Copy link
Contributor Author

@sseago @shubham-pampattiwar We are currently facing issue #6435 which may require implementation in the Finalizing status.

Issue #6435 indicates that some custom settings(including labels, reclaim policy) on restored PVs are lost because those restored PVs are newly dynamically provisioned. With the introduction of VolumeInfo metadata(#7070), we can address it by patching the PVs' custom settings back using VolumeInfo metadata.

To achieve this, we need to wait for all the target PVCs and PVs to be restored and bound which includes waiting for all plugin operations to finish for data mover and CSI snapshot case then we can proceed to patch the PVs. At present, there is no suitable opportunity to achieve this without adding the Finalizing status.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment