-
Notifications
You must be signed in to change notification settings - Fork 39.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minimizing need to delete provisioned volume #65100
Conversation
Signed-off-by: Serguei Bezverkhi <sbezverk@cisco.com>
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: sbezverk Assign the PR to them by writing The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/sig storage |
@sbezverk: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
There are several issues in this PR:
If we have troubles with in-tree volumes (and I haven't noticed any), I would propose to add exponential backoff to writing PV, now it tries 5x with 10 sec sleep in between. |
@jsafrane Would it be acceptable solution if I find a way to to encrypt |
@@ -22,6 +22,8 @@ import ( | |||
"strings" | |||
"time" | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be no empty line.
ctrl.eventRecorder.Event(claim, v1.EventTypeWarning, events.ProvisioningFailed, strerr) | ||
return | ||
volRecovered := false | ||
if claim.ObjectMeta.Annotations[annVolumeAlreadyProvisioned] == "yes" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if annotation,ok :=claim.ObjectMeta.Annotations[annVolumeAlreadyProvisioned]; ok && annotation== "yes" {
A little suggestion. :=)
It looks extremely ugly to me. You bring security somewhere it should not be. It opens whole new can on worms, e.g. you need a way how to prevent reply attacks on different PVC. IMO, internal provisioners are quite fine as they are now. And the external ones have wide variety of ways how to fix themselves, starting from increasing the timeout or by using CRDs in their own namespace. Note that deleting unwanted PVs is quite complex. Current code prefers deleting volumes to save space and keep the volumes in storage backend in sync with PVs. This PR may leave orphan volumes in the storage backend without PVs for them in case user deletes PVC between provisioning retries. Both ways have its pros and cons. |
@jsafrane Thanks for the comments. One question though, should the external provisioner be in sync with the logic of the in-tree pv-controller? If they behave differently (with suggested changes to only external provisioner), it might result different experience for a user. I am not sure if it was a goal to provide seamless user experience whether they use in-tree or out of tree controllers. |
I don't think putting data on the PVC is a good idea. a couple more fundamental questions:
|
@liggitt It should not, (really depends on CSI driver implementation) but even for |
We did that in the first release of alpha dynamic provisioning and it has proven to be unreliable and error prone. For example, this PV will get bound to a PVC and scheduler will schedule pods that use this PVC, assuming it has complete topology labels (which it does not have) and A/D controller will try to attach it or kubelet will try to mount it. Sure, we could extend PV controller not to bind to such PVs, but it IMO breaks API. |
Another problem we had: such incomplete PVs go through admission plugin. We have a plugin that fills topology labels and it got confused by PVs with no real volumes behind. We can fix that easily, however, it again shows that it's API change and users would need to change their admission handlers too. |
does requiring CreateVolume to be idempotent resolve the need to delete the created volume if writing the PV encounters an error and the provision needs to be requeued? |
@liggitt makes a good point. If CreateVolume is idempotent then we shouldn't need to delete the volume if PV creation fails. I'm not sure if in-tree Provision() is idempotent, but at least CSI should be. |
In-tree provisioner calls
After 5., no provisioning is called, because there is nothing to provision - Kubernetes does not need the PV at this time. If Kubernetes did not delete the volume at 4., the volume would be never deleted. |
I expected something like this:
|
To handle the case where both PVC and PV are missing, the provisioner would need to keep an in-memory cache of created volumes, which would be lost on restarts. |
Then we're back to persisting local state prior to calling CreateVolume. The PV object seems the most coherent object to do that on. |
And we're back at API breakage. Until now, PVs were only fully provisioned volumes, ready for binding and scheduling. With PVs for not fully provisioned volumes we need to change at least PV controller (not to bind PVC until provisioning is complete), scheduler (to wait for the PV to be fully provisioned and get topology labels in case someone force-binds PVC) and kubelet (to do the same when pod is scheduled directly, e.g. by DaemonSet). There is unknown number of external components that may need this change too, and IMO this counts as API breakage. |
@jsafrane imho forcing storage backend to create/delete/create/delete volumes as a result of API server related issues is not right, each subsystem should deal with its issues internally, minimizing exposure to other subsystems. If changing API, not sure why it is consider breaking, makes api subsystem more robust, I think it should be explored. |
Yeah this is not worth breaking pv api over, it is complicated enough, adding another phase before I believe the motivation for this pr is ultimately kubernetes-csi/external-provisioner#68. This supposed issue of PVs failing to save and wasting storage backend create API calls is just a symptom of the true problem kubernetes-csi/external-provisioner#68 causing the API server throttling in the first place. Otherwise we have 0 evidence that a PV failing to save 5 times in a row is a common enough occurrence that we need to change this code. |
@sbezverk: PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@wongma7 should this be closed ? |
@vladimirvivien yes |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
@fejta-bot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Current behaviour of PV controller, if by some reason PV object cannot be saved by API server is to delete just provisioned volume. It is not desired behaviour because provision/de-provision operations could be long for some storage backends. This PR changes this behaviour and proposes to store successfully provisioned volume information as a annotation in PVC object. On PV object save failure, the controller will re-attempt to bind PVC to PV, but it will not need to provision a volume since it has already been done. It needs just to retrieve PV definition from PVC's annotation. On any error, PV controller will switch to old logic.
Signed-off-by: Serguei Bezverkhi sbezverk@cisco.com