Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cephfs: use shallow volumes for the ROX accessModes #3603

Closed
Tracked by #3336
Madhu-1 opened this issue Jan 10, 2023 · 23 comments · Fixed by #3651
Closed
Tracked by #3336

cephfs: use shallow volumes for the ROX accessModes #3603

Madhu-1 opened this issue Jan 10, 2023 · 23 comments · Fixed by #3651
Assignees
Labels
component/cephfs Issues related to CephFS keepalive This label can be used to disable stale bot activiity in the repo

Comments

@Madhu-1
Copy link
Collaborator

Madhu-1 commented Jan 10, 2023

Describe the feature you'd like to have

What new functionality do you want?

This is not a new functionality rather asking to make the cephfs shallow volumes as default feature for ROX volumes.

Currently ROX volumes need to be created with a new storageclass. This is not very useful for DR usecase. In a cluster with many subvolumesgroup and filesystem, we cannot ask admin/user to create exact copy of storageclasses with extra parameter backingSnapshot: "true". In most of the backup system/Data movers the operators/admin will create a clones from a snapshot and move the data from source to the destination. In most of the cases the source volumes is created only for the read access to move the data.

What is the value to the end user? (why is it a priority?)

The end user need not to care about creating a new storageclass to get shallow volumes feature.

How will we know we have a good solution? (acceptance criteria)

Current cephfs clone is not something useful in a cluster with many volumes and subvolumes. Even as its a ROX accessMode
the user is asking to give a volume with a readonly access. Instead of creating a clones (full copy) like today we can just create a shallow volume and give it to the user.

@gman0 @ShyamsundarR @BenamarMk @humblec @nixpanic any thoughts?

@Madhu-1 Madhu-1 added the component/cephfs Issues related to CephFS label Jan 10, 2023
@BenamarMk
Copy link

I like it.
When using VolSync, Ramen automatically generates a StorageClass with the backingsnapshot: "true" setting. While this eliminates the need for the user to create the StorageClass, it also leads to it being left behind as a result when DR is no longer needed.

@dymurray
Copy link

Additionally, OADP/Velero uses a data mover to copy snapshots to object storage which requires creating PVCs from snapshots to drive volsync and it would be very nice to not have to dynamically create a storageclass at backup time if we are specifying ROX. Big +1 to adding this feature.

@gman0
Copy link
Contributor

gman0 commented Jan 10, 2023

Some context on why we currently rely on backingSnapshot: "true" parameter to use this feature:

  • At the time of implementing it, this was the least breaking way of introducing the functionality. Defaulting to snapshot backed volumes when ROX is set changes the current behavior, and also takes away the possibility for users to actually create a clone if they wish to have one. Of course this could be amended by having some "I really want a clone despite ROX" parameter, but this was not a way to go during the first iteration of the feature.

  • In the design document, it says:

    Without [backingSnapshot: "true"] we wouldn't be able to distinguish between a regular volume that just happens to have a read-only access mode, and a volume that references a snapshot.

    However this is no longer true, because:

    • For volumes with provisionVolume: "true", BackingSnapshotID image metadata is used to distinguish between snapshot-backed volumes and volumes that just happen to be ROX.
    • For volumes with provisionVolume: "false", NodeStageVolumeRequest.volume_context must contain backingSnapshotID parameter anyway, which can be used for that same purpose.

I think that now as a separate iteration we can go forward with moving snapshot backed volumes to default when ROX mode is set (we still may need to keep a way for users to override this default and create a clone if the user wishes to create one).

@ShyamsundarR
Copy link
Contributor

I think that now as a separate iteration we can go forward with moving snapshot backed volumes to default when ROX mode is set (we still may need to keep a way for users to override this default and create a clone if the user wishes to create one).

Ack! Overall a useful feature to default to, with the above consideration.

@Rakshith-R
Copy link
Contributor

We need to handle snapshot procedure of the (newly proposed shallow snapshot backed) ROX PVC too.

@Madhu-1
Copy link
Collaborator Author

Madhu-1 commented Jan 11, 2023

We need to handle snapshot procedure of the (newly proposed shallow snapshot backed) ROX PVC too.

We need to think about whether we should allow these operations or not. For me, enabling it isn't a good idea as no data will be changed in the subvolume as it's only a snapshot and used as readonly but let's hear it from others as I might be missing a good use case.

  • Create a snapshot of a ROX PVC? Is it useful
  • Create one more ROX clone of a ROX PVC? Is it useful

@nixpanic
Copy link
Member

nixpanic commented Jan 11, 2023

* Create a snapshot of a ROX PVC? Is it useful

Not sure, I do not see a benefit in doing this.

* Create one more ROX clone of a ROX PVC? Is it useful

Possibly?

  1. tie lifecycle of the clone with an app deployment
  2. It might help with distributing the load on the Ceph cluster (need to ask a Ceph expert).

Consider asking recommendations on the Ceph devel mailinglist.

@Madhu-1
Copy link
Collaborator Author

Madhu-1 commented Jan 16, 2023

* Create a snapshot of a ROX PVC? Is it useful

Not sure, I do not see a benefit in doing this.

* Create one more ROX clone of a ROX PVC? Is it useful

Possibly?

  1. tie lifecycle of the clone with an app deployment

may be useful for statefulset not sure but we can check on this one.

  1. It might help with distributing the load on the Ceph cluster (need to ask a Ceph expert).

Still, it's a single snapshot mounted by multiple application pods not sure it's possible to distribute the load on the ceph cluster if it was a clone /subvolume it was possible.

Consider asking recommendations on the Ceph devel mailinglist.

good to check with cephfs team about it.

@Madhu-1
Copy link
Collaborator Author

Madhu-1 commented Jan 25, 2023

  • Create a snapshot of a ROX PVC? Is it useful

After discussing with @Rakshith-R, one use-case is that if the snapshot is accidentally deleted from the kubernetes cluster and only the ROX clone exists, having this feature will allow us to create one more copy of the snapshot for backup purposes.

@Madhu-1
Copy link
Collaborator Author

Madhu-1 commented Jan 25, 2023

Still, it's a single snapshot mounted by multiple application pods not sure it's possible to distribute the load on the ceph cluster if it was a clone /subvolume it was possible.

@mchangir @kotreshhr any idea about this

@Rakshith-R
Copy link
Contributor

I think for shallow RO/ROX clone to be completely transparent to the user, we need to support all the operations we currently support for a regular clone.

For a snapshot-> we can just increase ref count.
For a pvc-pvc clone-> we can restore directly from the snapshot as we do for restore PVC.

Maybe we can have a feature gate while we put together these different parts too?

@Madhu-1
Copy link
Collaborator Author

Madhu-1 commented Jan 25, 2023

For a pvc-pvc clone-> we can restore directly from the snapshot as we do for restore PVC.

Yes, but this will be for all other accessMode than ROX. if it's for ROX we need to increase the ref count but for now will block it and allow it if there is any use case.

@Madhu-1
Copy link
Collaborator Author

Madhu-1 commented Feb 22, 2023

reopening as not all usecases are addressed.

@Madhu-1 Madhu-1 reopened this Feb 22, 2023
@Rakshith-R Rakshith-R added this to the release-v3.8.1 milestone Mar 1, 2023
@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the wontfix This will not be worked on label Mar 31, 2023
@dymurray
Copy link

dymurray commented Apr 3, 2023

I believe the wontfix label should be removed.

@Madhu-1 Madhu-1 added keepalive This label can be used to disable stale bot activiity in the repo and removed wontfix This will not be worked on labels Apr 3, 2023
@Madhu-1
Copy link
Collaborator Author

Madhu-1 commented Apr 3, 2023

I believe the wontfix label should be removed.

@dymurray yes, i removed it, we are keeping it for some enhancements we already have the requested feature in CephCSI 3.8.0

@riya-singhal31
Copy link
Contributor

riya-singhal31 commented May 10, 2023

This PR will close the snapshot enhancement feature.
updates - #3717

@Madhu-1 Madhu-1 assigned Madhu-1 and unassigned riya-singhal31 May 22, 2023
@Rakshith-R Rakshith-R removed this from the release-v3.8.1 milestone Jun 20, 2023
@Rakshith-R Rakshith-R added this to the release-v3.10.0 milestone Jun 20, 2023
Madhu-1 added a commit to Madhu-1/ceph-csi that referenced this issue Aug 30, 2023
Add support to create RWX clone from the
ROX clone, in ceph no subvolume clone is
created when ROX clone is created from a
snapshot just a internal ref counter is
added. This PR allows creating a RWX clone
from a ROX clone which allows users to create
RW copy of PVC where cephcsi will identify
the snapshot created for the ROX volume and
creates a subvolume from the CephFS snapshot.

updates: ceph#3603

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Madhu-1 added a commit to Madhu-1/ceph-csi that referenced this issue Aug 31, 2023
Add support to create RWX clone from the
ROX clone, in ceph no subvolume clone is
created when ROX clone is created from a
snapshot just a internal ref counter is
added. This PR allows creating a RWX clone
from a ROX clone which allows users to create
RW copy of PVC where cephcsi will identify
the snapshot created for the ROX volume and
creates a subvolume from the CephFS snapshot.

updates: ceph#3603

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
mergify bot pushed a commit that referenced this issue Aug 31, 2023
Add support to create RWX clone from the
ROX clone, in ceph no subvolume clone is
created when ROX clone is created from a
snapshot just a internal ref counter is
added. This PR allows creating a RWX clone
from a ROX clone which allows users to create
RW copy of PVC where cephcsi will identify
the snapshot created for the ROX volume and
creates a subvolume from the CephFS snapshot.

updates: #3603

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
iPraveenParihar pushed a commit to iPraveenParihar/ceph-csi that referenced this issue Sep 6, 2023
Add support to create RWX clone from the
ROX clone, in ceph no subvolume clone is
created when ROX clone is created from a
snapshot just a internal ref counter is
added. This PR allows creating a RWX clone
from a ROX clone which allows users to create
RW copy of PVC where cephcsi will identify
the snapshot created for the ROX volume and
creates a subvolume from the CephFS snapshot.

updates: ceph#3603

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
@haslersn
Copy link

haslersn commented Sep 14, 2023

I think this issue can be closed as completed.

@Madhu-1
Copy link
Collaborator Author

Madhu-1 commented Sep 15, 2023

For a snapshot-> we can just increase ref count.

Not yet, above still need to be implemented for feature completion

@Madhu-1
Copy link
Collaborator Author

Madhu-1 commented Oct 10, 2023

After further analysis, Am planning to not support the snapshot of the ROX PVC for now due to the below technical difficulties

  • It gets complicated to recreate the reference omap of the already existing snapshot
    when we get a new CreateSnapshot Request to cover this corner case. This can also
    lead to unwanted bookkeeping.
  • There are changes that users can create a chain of resources like (pvc->snapshot->rox clone->snapshot->rox clone->snapshot)
    and can lead to unwanted bugs.
  • It also increases the code complexity for Delete operations
  • We cannot take a new snapshot on the subvolume which is the parent of the snapshot
    because we dont get the expected result as the content on the subvolume is not the same
    as the snapshot.

I am closing this issue as we can have the below workaround if someone wants a snapshot of the ROX volume

Option 1:

  • Create a RWX clone from the ROX PVC
  • Create a snapshot of the RWX PVC and use it.

Option 2:

  • Create a static volume snapshot from the omap data

@nixpanic @Rakshith-R thoughts?

@Rakshith-R
Copy link
Contributor

After further analysis, Am planning to not support the snapshot of the ROX PVC for now due to the below technical difficulties

  • It gets complicated to recreate the reference omap of the already existing snapshot
    when we get a new CreateSnapshot Request to cover this corner case. This can also
    lead to unwanted bookkeeping.
  • There are changes that users can create a chain of resources like (pvc->snapshot->rox clone->snapshot->rox clone->snapshot)
    and can lead to unwanted bugs.
  • It also increases the code complexity for Delete operations
  • We cannot take a new snapshot on the subvolume which is the parent of the snapshot
    because we dont get the expected result as the content on the subvolume is not the same
    as the snapshot.

I am closing this issue as we can have the below workaround if someone wants a snapshot of the ROX volume

Option 1:

  • Create a RWX clone from the ROX PVC
  • Create a snapshot of the RWX PVC and use it.

Option 2:

  • Create a static volume snapshot from the omap data

@nixpanic @Rakshith-R thoughts?

Looks good to me.

I think we can add Option 1 in issue description and link the issue in the log where are blocking this request.

@nixpanic
Copy link
Member

Option 1 sounds good to me as well.

I wonder who (or what) would want to make a snapshot of a ROX PVC. It is not like data changes there, so a snapshot is not needed at all (just backup/read the original ROX PVC). There probably are certain workflows that do it, but I assume these are rare.

@Madhu-1
Copy link
Collaborator Author

Madhu-1 commented Oct 11, 2023

Option 1 sounds good to me as well.

I wonder who (or what) would want to make a snapshot of a ROX PVC. It is not like data changes there, so a snapshot is not needed at all (just backup/read the original ROX PVC). There probably are certain workflows that do it, but I assume these are rare.

Yes that's correct and it mostly am thinking some backup tools when backing up the namespace will also try to do it but its tool problem i would say.

@nixpanic Thanks

am closing the issue for now.

@Madhu-1 Madhu-1 closed this as completed Oct 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/cephfs Issues related to CephFS keepalive This label can be used to disable stale bot activiity in the repo
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants