Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-3294: Add proposal for provisioning volumes from cross-namespace snapshots #3295

Merged
merged 10 commits into from
Oct 6, 2022

Conversation

mkimuram
Copy link
Contributor

@mkimuram mkimuram commented May 4, 2022

  • One-line PR description: Provision volumes from cross-namespace snapshots
  • Other comments:

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory labels May 4, 2022
@k8s-ci-robot k8s-ci-robot added the sig/storage Categorizes an issue or PR as relevant to SIG Storage. label May 4, 2022
@xing-yang
Copy link
Contributor

/milestone v1.25

@k8s-ci-robot k8s-ci-robot added this to the v1.25 milestone May 13, 2022

#### (b) as a separate populator

There will be two approaches to implement as a separate populator:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to implement this as a separate populator in the alpha stage to avoid making external-provisioner more complicated. When moving from alpha to beta, we can re-evaluate to see if it is better to move the logic into the external-provisioner.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer implementing by (a), because implementing (b)(2) will introduce a really complex codes as explained (it requires duplicating provisioner's provisioning codes). But, let me consider a bit more, if we can make it simple even with (b)(2).

Also, I would like to get feedback from others.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bswartz @xing-yang @Elbehery For further discussions, I've implemented cross-namespace snapshot populator for csi-hostpath driver in kubernetes-csi/lib-volume-populator#31 . This is using (b)(1) approach.
For (b)(2) approach, we may need to change CSI spec to add a new CSI call to populate the already provisioned volume, instead of only allowing creating pre-populated volume in CreateVolume (Please also see comment here).

Any ideas on how we proceed for alpha?

I think that we have below choices:

  1. Only define API
  2. Define API and provide an example populator (for csi-hostpath) ... (b)(1) approach
  3. Define API and provide common populator, which would require CSI spec change ... (b)(2) approach
  4. Define API and provide a reference implementation as provisioner ... (a) approach

I still prefer 4 personally, but we may be able to choose 1 and 2 for alpha because there will be no reason to block defining API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to implement this as a separate populator in the alpha stage to avoid making external-provisioner more complicated. When moving from alpha to beta, we can re-evaluate to see if it is better to move the logic into the external-provisioner.

@xing-yang If above intends to just avoid adding codes to existing external-provisioner and it doesn't require to use lib-volume-populator to implement another controller, we may be able to fork the external-provisioner and make it only for VolumeSnapshotLink (This will be a kind of (b)(2) approach, which isn't technically a "populator", but a "provisioner").

I've also implemented an prototype like this on top of the prototype of (a) approach. The last two commits are for the change from it.

Only the code change is to disable provisioning from pvc and snapshot like this. Then, we can make it a different binary/container. The step required to consume it is to just deploy the container in the same pod as the provisioner, like this.

WDYT? @bswartz @Elbehery

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the KEP in implementation section (and added section on CR conflict risk). PTAL

BTW, thank you for your feedback via chat @bswartz I hope that I've added enough information.
Also, thank you for letting me know that you're under review via chat @Elbehery .

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer implementing in the external-provisioner with a feature gate, which is what we've done with other features. The benefits compared to the other approaches I can see are 1) each driver does not have to implement their own populator 2) we don't have to manage releasing another component and merging them later.

- [x] Other
- Describe the mechanism:
- Will enabling / disabling the feature require downtime of the control
plane? No.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

external-provisioner needs to be restarted if we set the feature flag to true or false?
--cross-namespace-snapshot=true


- To specify a non-standard API as a `DataSourceRef` of a PVC, [AnyVolumeDataSource feature](https://kubernetes.io/blog/2021/08/30/volume-populators-redesigned/) is used,
- To specify a cross-namespace `VolumeSnapshot`, a new `VolumeSnapshotLink` CRD is introduced (Please also see [API](#api)),
- To restrict only allowed `VolumeSnapshot` to be consumed from other namespaces, [`ReferencePolicy` CRD](https://gateway-api.sigs.k8s.io/v1alpha2/references/spec/#gateway.networking.k8s.io/v1alpha2.ReferencePolicy) is used,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you talked to sig-networking regarding using this CRD? Since it is still Alpha, we should let them know we are using it and make sure the usage is supported.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not yet. Thank you for pointing it out.

@robscott @hbagdi @youngnick

In this KEP, we are thinking about using ReferencePolicy to allow cross-namespace reference of VolumeSnapshot for PersistentVolumeClaim. Could you review this KEP from ReferencePolicy view point?

(As I heard in the KubeCon presentation on Gateway API, there seems a plan in renaming this object, which would be one of the biggest risks).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for checking in with us! I've created a PR that would rename ReferencePolicy to ReferenceGrant as we'd considered earlier. If/when that gets in, I'd expect the resource to be quite stable going forward.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for sharing the current status. I will add a dependency to the PR and only use ReferenceGrant for this feature.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ReferencePolicy -> ReferenceGrant rename has gone through, so using ReferenceGrant is definitely the way to go.

@mkimuram mkimuram force-pushed the issue/3294 branch 2 times, most recently from 3111d48 to a55e9d6 Compare May 31, 2022 19:47
Copy link
Contributor Author

@mkimuram mkimuram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xing-yang Thank you for your review. Updated the KEP. PTAL


- To specify a non-standard API as a `DataSourceRef` of a PVC, [AnyVolumeDataSource feature](https://kubernetes.io/blog/2021/08/30/volume-populators-redesigned/) is used,
- To specify a cross-namespace `VolumeSnapshot`, a new `VolumeSnapshotLink` CRD is introduced (Please also see [API](#api)),
- To restrict only allowed `VolumeSnapshot` to be consumed from other namespaces, [`ReferencePolicy` CRD](https://gateway-api.sigs.k8s.io/v1alpha2/references/spec/#gateway.networking.k8s.io/v1alpha2.ReferencePolicy) is used,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not yet. Thank you for pointing it out.

@robscott @hbagdi @youngnick

In this KEP, we are thinking about using ReferencePolicy to allow cross-namespace reference of VolumeSnapshot for PersistentVolumeClaim. Could you review this KEP from ReferencePolicy view point?

(As I heard in the KubeCon presentation on Gateway API, there seems a plan in renaming this object, which would be one of the biggest risks).


#### (b) as a separate populator

There will be two approaches to implement as a separate populator:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer implementing by (a), because implementing (b)(2) will introduce a really complex codes as explained (it requires duplicating provisioner's provisioning codes). But, let me consider a bit more, if we can make it simple even with (b)(2).

Also, I would like to get feedback from others.

@mkimuram
Copy link
Contributor Author

/retest

@mkimuram
Copy link
Contributor Author

@wojtek-t Thank you for self-assigning to this KEP. I should have done it. Also, thank you for quickly fixing the PRR-approvers-check issue. Now, all the tests passed.

@xing-yang xing-yang added the api-review Categorizes an issue or PR as actively needing an API review. label Jun 14, 2022
@xing-yang
Copy link
Contributor

/assign @thockin

@mkimuram
Copy link
Contributor Author

@ttakahashi21, can you update the milestone on this PR from 1.25 to 1.26?

Updated milestones.

In addition, there will be cases that `ReferenceGrant` may be created/deleted/re-created while `VolumeSnapshotLink` is being handled, however, no inconsistent behavior will be expected, as described below.

1. No `ReferenceGrant` for a `VolumeSnapshotLink` exists when the `VolumeSnapshotLink` is handled, and the `ReferenceGrant` is created later
A controller won't use the `VolumeSnapshot` until the `ReferenceGrant` that allows the access is created.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will controllers rescan/resync any pending volumesnapshotlinks in relevant namespaces any time a referencegrant mentioning VolumeSnapshotLink/VolumeSnapshot objects is created/updated?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. In the current KEP, the controller keeps checking until the right ReferenceGrant is found.
This behavior would be good from a UX viewpoint, because users are allowed to create ReferenceGrant later.
On the other hand, from a performance viewpoint, it may be a waste of resource. So, we can return error of provisioning volume and stop retrying,if the right ReferenceGrant isn't found in the first check.

WDYT?

Based on the decision, I will update "Populator implementation" section to make it clear.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the decision, I will update "Populator implementation" section to make it clear.

@liggitt Updated the KEP to clearly describe that ReferenceGrant check continues until it is found.

- Add description on "ReferenceGrant API change" in "Risks and Mitigations" section and mark it a beta blocker
- Allow not specifying namespace in `VolumeSnapshotLink` and provisioning from it without `ReferenceGrant`
- Clearly describe that `ReferenceGrant` check continues until it is found
- Add description on core API change to allow specifying namespace in "Alternatives" section
@mkimuram
Copy link
Contributor Author

mkimuram commented Oct 4, 2022

@msau42 @xing-yang Updated the KEP. PTAL

Note that I've marked "ReferenceGrant API change" as a beta blocker in the hope that we can make a progress in this release.
However, I understand that we may decide to wait until the ReferenceGrant to move to the core API to avoid unnecessary changes in the API and the provisioner codes.

Copy link
Member

@johnbelamaric johnbelamaric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it is out-of-tree (in CSI mostly?), is that correct? The PRR looks fine to me.

@xing-yang
Copy link
Contributor

This looks like it is out-of-tree (in CSI mostly?), is that correct? The PRR looks fine to me.

@johnbelamaric yes, this is out-of-tree in kubernetes-csi repo.

@johnbelamaric
Copy link
Member

This looks like it is out-of-tree (in CSI mostly?), is that correct? The PRR looks fine to me.

@johnbelamaric yes, this is out-of-tree in kubernetes-csi repo.

Ok, once the SIG approves I will add the PRR approval.


#### Conflict on installing `VolumePopulator` CR for `VolumeSnapshotLink` across CSI drivers

This feature requires installing `VolumePopulator` CR for `VolumeSnapshotLink` and is enabled per CSI driver basis.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a minor wording suggestion. This section explains why enabling per CSI driver is not a good choice. So I think we should state that up front along with the recommendation that it needs to be enabled on a per-cluster basis.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It only happens for VolumeSnapshotLink approach, so moved to cons of VolumeSnapshotLink approach in alternative section.

### Populator implementation

The populator logic can be implemented either [(a) inside the existing CSI external-provisioner](#a-inside-the-existing-csi-external-provisioner) or [(b) as a separate populator](#b-as-a-separate-populator).
Cluster admins can choose which implementation to be used per CSI driver basis.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually when the feature goes GA, we typically remove the feature gate so it is enabled always and can't be disabled. Would the separate populator approach work in that case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@msau42
Is this csi-provisioner code getting feature gates from the cluster where it runs? And are you suggesting provisioner approach to also use the same mechanism to enable/disable this feature? If both answers are yes, populator approach can't provide a similar way to enable/disable this feature per cluster basis, because it relies on whether populator for the driver exists.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to use CrossNamespaceSourceProvisioning feature gate.

We expect no non-infra related flakes in the last month as a GA graduation criteria.
-->

- Verify that PV is provisioned from VS in other namespace and bound to PVC if allowed by ReferenceGrant: <link to test coverage>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also test cases for within the same namespace using SnapshotLink.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

-->

- [x] Metrics
- Metric name: TBD (Need to discuss if existing metrics for "claims" queue is sufficient).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want a new metric that can distinguish the datasource type

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added cross_namespace_persistentvolumeclaim_provision_total and cross_namespace_persistentvolumeclaim_provision_failed_total.
Intended to align names here.

- GET `VolumeSnapshotLink` API call:
- originating component(s): CSI external-provisioner
- this API call is triggered once in each [`Provision` call](https://github.com/kubernetes-csi/external-provisioner/blob/master/pkg/controller/controller.go#L719) in CSI external-provisioner when `VolumeSnapshotLink` is referenced from PVC.
- GET(LIST) `ReferenceGrant` API call:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use informers to reduce the number of get/list calls?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. Can we rely completely on informers or should we get the object once after it is found in informers as described in the current KEP?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there is a strong need to also do a GET. It would make revoking a grant faster, but even if you do a GET there is still a race.

- Change PersistentVolumeClaim API instead of adding VolumeSnapshotLink API
- Use CrossNamespaceSourceProvisioning feature gate instead of command line flag
- Add tests for the same namespace cases
- Add scale tests as a beta blocker
- Add metrics for cross namespace provision source
- Update estimation of API calls in cosideration of informer cache
@mkimuram
Copy link
Contributor Author

mkimuram commented Oct 6, 2022

@msau42 Updated the KEP. PTAL

Note that we can't set DataSourceRef from DataSourceRef2 in case 6, because namespace field can't be set to DataSourceRef.
Impact of not setting fields in this case, needs to be considered.

2. Allow both `DataSourceRef` field and `DataSourceRef2` fields to coexist, but only allow specifying either of them or not specifying at all
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should start with this approach initially. Once the ReferenceGrant model has been more proven then I think we can consider merging the two.


1. If DataSource and DataSourceRef are both set; reject

For this case, `DataSourceRef2` could be renamed like `CrossNamespaceDataSourceRef`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try to find examples in other APIs to find a better name. We can debate on exact naming later :-)

- "@mhenriks"
- "@jsafrane"
- "@bswartz"
approvers:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add my name here to reflect the current state of reviewers.

@msau42
Copy link
Member

msau42 commented Oct 6, 2022

/lgtm
/approve

Just some minor cosmetic comments remaining.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 6, 2022
@johnbelamaric
Copy link
Member

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: johnbelamaric, mkimuram, msau42

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 6, 2022
@k8s-ci-robot k8s-ci-robot merged commit 7beea93 into kubernetes:master Oct 6, 2022
@k8s-ci-robot k8s-ci-robot modified the milestones: v1.25, v1.26 Oct 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-review Categorizes an issue or PR as actively needing an API review. approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
Status: API review completed, 1.26
Development

Successfully merging this pull request may close these issues.

Provision volumes from cross-namespace snapshots