-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add proposal for transfer of Storage resources, such as PersistentVolumeClaims or VolumeSnapshots #2326
Conversation
a1da930
to
444dcf3
Compare
keps/sig-storage/2324-volume-snapshot-namespace-transfer/README.md
Outdated
Show resolved
Hide resolved
name: foo | ||
namespace: source-namespace | ||
kind: VolumeSnapshot | ||
approvalName: approve-foo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does a transfer request know approvalName
? does this gets filled later on by snapshot controller?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO it's part of the agreement. User who gives the snapshots must pass this name to the user who accepts the snapshots, outside of Kubernetes API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, I'm checking OpenStack, which has very similar concept called volume transfer between projects, and here OpenStack actually generates a token, that the current owner of the volume (old project) must give to the new owner (new project).
The KEP should note that the name should be at least something that the users can't guess. Using a random string in the example would be helpful.
requestName: transfer-foo | ||
``` | ||
|
||
When matching `StorageTransferRequest` and `StorageTransferApproval` resources are detected, the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should describe what the matching criteria is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The matching criteria has been included, and was updated based on the meeting today. As of now the StorageTransferApproval
indicates the storage source, and a match is determined only between the StorageTransferRequest
and StorageTransferApproval
values. It no longer validates that the source matches between the two.
targetName: bar | ||
``` | ||
|
||
`StorageTransferApproval` resource will be created in the target namespace: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@huffmanca Could you explain more about the StorageTransferApproval
and StorageTransferRequest
, do they need to be created manually?
If yes, IMO, it's a little complicated to use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current design requires that these objects be manually created. It's possible that users could script creation of them if they plan to transfer resources en masse; however, we intentionally need a user from each namespace to "sign off" on the transfer.
namespace: target-namespace | ||
spec: | ||
source: | ||
name: foo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the user he/she can access the target namespace, but can not access the source namespace, how he/she can get the volumesnapshot object he/she wants to transfer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the proposal it says "provided that both the source and target namespace have a user that has agreed to the transfer." By creating objects both in the source and target namespace we ensure this agreement actually happened. So users can't steal snapshots from other namespaces and at the same time they can't "push" their snapshots to namespaces that don't want it.
@huffmanca can you change "VolumeSnapshot" in the title to "PVC or VolumeSnapshot"? |
name: transfer-foo | ||
namespace: target-namespace | ||
spec: | ||
source: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's really in the source namespace, you can use TypedLocalObjectReference
with APIGroup
, Kind
and Name
. The namespace is implied from the namespace of StorageTransferRequest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't we also use an ObjectReference
in both the request and approval, and ensure that these match?
Looking between #2110 and this to match things up and was thinking, maybe it would make sense to consolidate to a single KEP that proposes a "csi transfer" feature. That way efforts can be focused on a single KEP and controller. Initial thoughts on this it seems like it might be feasible. |
444dcf3
to
6ea44f6
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: huffmanca The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
6ea44f6
to
c059943
Compare
I've updated this KEP, but this is attempting to continue the work that you and mhenricks have been leading, since the most recent KEP has stalled. The goal would be to have a generic (for storage) API that can be used for both PVCs and VolumeSnapshots, at least to start with. I agree that we're looking to create a single controller for both. |
keps/sig-storage/2324-volume-snapshot-namespace-transfer/README.md
Outdated
Show resolved
Hide resolved
spec: | ||
approvalName: xyji6OwynW | ||
targetName: bar | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth adding something like a token here that's required on the target side, I believe @jsafrane touched on this already. Even if it was just an optional field that could be specified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've included this in both resources' spec. I think it best if the names were human readable, the token required, and automatically generated.
keps/sig-storage/2324-volume-snapshot-namespace-transfer/README.md
Outdated
Show resolved
Hide resolved
keps/sig-storage/2324-volume-snapshot-namespace-transfer/README.md
Outdated
Show resolved
Hide resolved
keps/sig-storage/2324-volume-snapshot-namespace-transfer/README.md
Outdated
Show resolved
Hide resolved
- Binds the VolumeSnapshotContent to the newly created VolumeSnapshot. If this was a dynamically provisioned VolumeSnapshotContent, then it adjusts the VolumeSnapshotContent.Spec.Source to point to the newly created | ||
VolumeSnapshot. | ||
- Deletes the source VolumeSnapshot. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"transfering" and relabeling snapshots has always been something that made me kinda nervous. For a number of storage backends that use things like cow type snapshots or linked reference to versions this introduces a pretty big problem. The problem is that you could now have a snapshot in ns: foo that's actually linked to a snapshot in ns: bar. (eg sequential snapshots are linked: snap-'n' --> snap-3 --> snap-2 --> snap-1 --> Volume) The two namespaces aren't aware of this linkage on the back end and neither can delete resources. So what happens when I transfer snap-2?
I know some have proposed that backends handle the reference counting in this case and do a sort of 'soft delete' but that doesn't seem great because "somebody" is paying for that storage that they think they deleted.
To transfer snapshots I don't have a great alternative proposal, other than create a PVC from a snapshot and transfer that instead. I'm probably overly paranoid about this though and I don't think there are any other folks that have this concern.
c059943
to
a2380fb
Compare
a2380fb
to
99c00b2
Compare
Thanks @huffmanca for the clarification on some of those points. If #2110 is being combined here a lot of that content should probably be merged in here. The one other thing I was getting at was the level of detail around the workflow here https://github.com/kubernetes/enhancements/pull/2110/files#diff-82617246992480a1d7acd7c0041813170f4e01cb996df54b3530e2db74ed9bcfR95 was really useful. Other than that I'm obviously in favor of the feature :) Really happy to see you've picked up the effort. |
@huffmanca @j-griffith @gnufied @jsafrane I'm thinking about the other approach to achieve this feature. I believe that this simplifies the Spec and avoid declaring the desired state of a certain resource in different multiple places. What do you think of it? Details are as follows: The fields to be added are
For example, let's assume that user A has PVC User operations will be:
Controller behaviors will be:
Above is just a rough idea that doesn't consider cancel or error cases. Also, it doesn't cover snapshot. However, we will be able to consider more in detail, once we agree that this approach looks good (I hope that this approach can be easily applied to snapshot, like adding similar fields in |
Implemented PoC codes for further discussion. Console logs are as follows:
|
I read through the previous discussions and noticed that I'm just trying to roll back the discussion here by my previous two comments. Do we really need to introduce new generic CRDs for transfer, instead of just adding fields for transfer to specific APIs that support this feature? I think:
|
|
||
For VolumeSnapshots, this performs the following: | ||
- Creates a VolumeSnapshot in the target namespace. | ||
- Binds the VolumeSnapshotContent to the newly created VolumeSnapshot. If this was a dynamically provisioned VolumeSnapshotContent, then it adjusts the VolumeSnapshotContent.Spec.Source to point to the newly created |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VolumeSnapshotContent.Spec.Source points to a VolumeHandle or a SnapshotHandle. It does not point to a VolumeSnapshot. It is VolumeSnapshotContent.Spec.VolumeSnapshotRef that points to a VolumeSnapshot. Also VolumeSnapshotContent.Spec.VolumeSnapshotRef is immutable so you cannot rebind it.
The suggested approach is to create a new VolumeSnapshotContent, point Source to the existing SnapshotHandle, and point VolumeSnapshotRef to the new VolumeSnapshot.
CC @yuxiangqian
these resources between namespaces, allowing users to conveniently move these resources without manual creation | ||
of each independent object (VolumeSnapshot, VolumeSnapshotContent). | ||
|
||
This extends the [API for PVC Namespace Transfer](https://github.com/kubernetes/enhancements/pull/2110/files) to also apply for VolumeSnapshots. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this KEP only handles VolumeSnapshot namespace transfer. The KEP on PVC (#2110) is closed. Is there anyone working on PVC namespace transfer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, based on the tile of this KEP, I closed #2110 but can of course reopen. I am still very interested in this feature getting into k8s.
VolumeSnapshot. | ||
- Deletes the source VolumeSnapshot. | ||
|
||
The external-snapshotter will need to be updated to remove immutability of the following fields: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should remove the immutability of these fields. Instead a new VolumeSnapshotContent object should be created.
CC @yuxiangqian
@mkimuram In your POC, you are proposing to add a Transfer.Source and Transfer.Destination field in the PVC spec. At the end of the transfer, will a new PVC be created with the name and namespace based on the Transfer.Destination field? |
No. New PVC is manually created by a user who have access to the namespace. It can be regarded as the same action for the receiver to create a |
I hope that this diagram will help understand my intention. Note that while PVC's |
We had a meeting to discuss this today. @Elbehery summarized all previous efforts and the two options. Meeting notes are here: https://docs.google.com/document/d/17H1k4lqdtJwZSjNRaQue-FhMhyk14JA_MoURpoxha5Q/edit# Meeting recording is here: https://www.youtube.com/watch?v=7VMp8Asf_P8 |
- `r.spec.acceptName == a.metadata.name` | ||
- `r.spec.token == a.spec.requestToken` | ||
|
||
Once matching is complete, a controller will begin the transfer process. For PVCs, this performs the following: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if
- two
StorageTransferRequest
s are created for the same resource andStorageTransferAccept
for eachStorageTransferRequest
is created almost at the same time, - two
StorageTransferAccept
s for the sameStorageTransferRequest
are created almost at the same time in different namespaces, - two
StorageTransferRequest
s that have the samesource.kind
andtargetName
are created, andStorageTransferAccept
s for both of them are created in the same namespace, - a
StorageTransferRequest
is deleted before transfer completes, - a
StorageTransferAccept
is deleted before transfer completes, - a resource with the same criteria (namespace, name, and so on) in
StorageTransferRequest
is created after a transfer for the intended resource is completed (do users need to ensure thatStorageTransferRequest
andStorageTransferAccept
are deleted after a successful transfer)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
two StorageTransferRequests are created for the same resource and StorageTransferAccept for each StorageTransferRequest is created almost at the same time,
two StorageTransferAccepts for the same StorageTransferRequest are created almost at the same time in different namespaces,
two StorageTransferRequests that have the same source.kind and targetName are created, and StorageTransferAccepts for both of them are created in the same namespace,
I think these are all basically the same issue. The transfer controller will process the first match it encounters and ignore others. An annotation on the source can probably help us with the ignore case. For applications with possibility of contention like this would probably be wise to have a standard naming scheme for requests/accepts and handle collisions that way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a StorageTransferRequest is deleted before transfer completes,
a StorageTransferAccept is deleted before transfer completes,
I imagine a pending and running phase for a transfer. During the pending phase (waiting for no pods using the source, etc) a transfer is easily aborted. Once a transfer is running, things should happen pretty quickly and finalizars can make sure the process completes. However, if an error is encountered in the running phase, the transfer can be aborted and the system remains in the current state (no rollback)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a resource with the same criteria (namespace, name, and so on) in StorageTransferRequest is created after a transfer for the intended resource is completed (do users need to ensure that StorageTransferRequest and StorageTransferAccept are deleted after a successful transfer)?
Requests/Accepts should be deleted by the user or perhaps only an admin so that proper auditing occurs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for sharing your ideas. Can this information be added to the KEP so that we can discuss more on further corner cases?
On changing the spec, below points would be worth considering.
- What information is needed in annotations to mark PVC as transferring?
(One annotation that has identities for bothStorageTransferRequest
andStorageTransferAccept
, or two separate annotations for each of them?) - Do we need to add finalizer to both
StorageTransferRequest
andStorageTransferAccept
to have a chance to properly remove the annotations on a transferring PVC for cancelation? - Should we consider adding
targetNamespace
toStorageTransferRequest.Spec
?
(Current spec seems to allow one StorageTransferRequest to have multiple target namespace candidates, as a result, above annotation logic is likely to become complex. If we change the spec this way, we can focus on one annotation that has identity for onlyStorageTransferRequest
.) - Do
StorageTransferRequest
andStorageTransferAccept
need to haveStatus
fields to manage a pending and running phase?
(And possibly, should we consider adding completed phase to avoid running multiple times after a successful transfer?) - Can we consider a way to rollback even after it becomes running phase?
Also, an example for the further corner cases is how to block/work with operations by other controllers, like provisioning, attaching, snapshotting, resizing, etc ... (Do we just make other controllers understand the annotations or consider adding status fields to PVC for this purpose? And if we choose the latter, should information in above annotations also be moved to status fields?)
As for updating secretRefs for PV, I've implemented a prototype on top of the previous POC commits. The idea is to modify a PV based on its SC's template before it is bound to the destination PV (This means that it only works well for dynamically provisioned case, not for manually provisioned case). Please check the last two commits of this branch. Current codes, or the last commit, blindly allow updating PV's secretRefs. However, we will be able to add fields, like transferring stautus/conditions or previousPVC, to PV and only allow updating secretRefs when such fields have expected values. Or we may be able to find a good timing that some of the PV's fields can be used to identify that the PV is clearly transferring. Tested with below commands, please also double check:
|
- Wait for all Pods to stop using the PVC source-namespace\foo | ||
- Bind The PersistentVolume to target-namespace\bar. | ||
- Create the PVC target-namespace\bar by copying the spec of source-namespace\foo and setting spec.volumeName appropriately. | ||
- Reset the PersistentVolume reclaim policy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How to reset it? Who holds the original policy and where it is stored?
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
There is a follow-up PR #2849, closing this one |
@jsafrane: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Attempts to continue the work done by @j-griffith in 643 and 1112 and @mhenriks in 1555 and 2110 to create a generic API for transferring storage resources across namespaces.
xref #2324
/sig storage