Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Pooled PersistentVolumeClaims #3417

Closed
bioball opened this issue Oct 20, 2020 · 12 comments
Closed

Feature Request: Pooled PersistentVolumeClaims #3417

bioball opened this issue Oct 20, 2020 · 12 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@bioball
Copy link
Contributor

bioball commented Oct 20, 2020

Feature request

There should be a way to select a PersistentVolumeClaim from a pool as the workspace binding when creating PipelineRuns and TaskRuns.

Ideally, there should be a way to dynamically grow the pool size; if there are no PVC's available in the pool, a new one gets created dynamically and added to the pool. This implies that there should be some way to expire these PVC's as well.

Use case

There's a couple use-cases that I can think of:

  1. Running multiple PipelineRuns in parallel for the same Pipeline, where each PipelineRun receives a volume that persists across runs (like a cache). Today, this is not possible unless mounted using ReadWriteMany or ReadOnlyMany (assuming the storage backend supports it), or if a separate entity outside of Tekton manages the pooling.
  2. Eliminating the number of volumes being created and destroyed. Today, a common way to use volumes is to use a volumeClaimTemplate, which will create a PVC at the start of the run, and delete the PVC then the pod gets deleted. Using pooled PersistentVolumeClaims solves the following problems:
    • I might want to keep around my runs for historical purposes, but I don't want to keep the PVC around because it takes up space.
    • Creating/deleting a PVC incurs extra load on the Kubernetes API and the storage backend. I can reduce this load by simply re-attaching an existing PVC.
@bioball bioball added the kind/feature Categorizes issue or PR as related to a new feature. label Oct 20, 2020
@ghost
Copy link

ghost commented Oct 20, 2020

Worth noting that this feature request was also recently opened against Argo: argoproj/argo-workflows#4130

I've tried searching for "kubernetes pvc pool" and "kubernetes storage pool" but haven't found anything. I wonder if this would also be worth looking at as a platform feature and raising with the k8s team.

@bioball
Copy link
Contributor Author

bioball commented Oct 20, 2020

I've tried searching for "kubernetes pvc pool" and "kubernetes storage pool" but haven't found anything. I wonder if this would also be worth looking at as a platform feature and raising with the k8s team.

That makes sense. Perhaps the ideal solution would look something like Kubernetes having a new Resource representing a pooled PVC, and there being a workspace binding to the PipelineRun. Something like the following:

---
apiVersion: v1
kind: PooledPersistentVolumeClaim
metadata:
  name: my-pvc-pool
spec:
  accessMode:
    - ReadWriteOnce
  resources:
    requests:
      storage: 8Gi
---
apiVersion: tekton.dev/v1beta1
kind: PipelineRun
  spec:
    pipelineRef:
      name: my-pipeline-run
    workspaces:
      - name: my-workspace
         pooledPersistentVolumeClaim:
           claimName: my-pvc-pool

Without native k8s support for this, perhaps the other approaches here are:

  1. Introduce a new CRD managed by Tekton representing a pooled PVC
  2. Support custom workspace bindings (similar to the custom task design) so Tekton users can bring their own pooled PVC provider

Curious about @skaegi's thoughts per our Slack conversation in https://tektoncd.slack.com/archives/CLCCEBUMU/p1603199756169700?thread_ts=1603139560.165600&cid=CLCCEBUMU

@ghost
Copy link

ghost commented Oct 20, 2020

Support custom workspace bindings (similar to the custom task design) so Tekton users can bring their own pooled PVC provider

This would be desirable for other use cases as well. We've had requests to support more Workspace types for example, and this could be one way to do that. It could also open the door to Workspace types that aren't Volume-backed, such as using GCS / S3 buckets instead.

@jlpettersson
Copy link
Member

jlpettersson commented Oct 20, 2020

You generally would pool the backing storage for PVs and then write a storage provisioner to create the PVs dynamically.

I agree with this. (from the slack thread).

Storage pooling for PVCs is what cloud providers already do for this, in the layer under PVC/PV.

@bioball
Copy link
Contributor Author

bioball commented Oct 21, 2020

Storage pooling for PVCs is what cloud providers already do for this, in the layer under PVC/PV.

I'm definitely no expert of PV's. By the layer under PVC/PV's, are you referring to storage classes? And if so, how would this work? Are they able to provision stateful volumes? E.g. can I request a new PV that retains its file system from the last time I used it?

@ghost ghost mentioned this issue Oct 22, 2020
@skaegi
Copy link
Contributor

skaegi commented Oct 22, 2020

I guess you could do some really clever re-use of PVCs but that is essentially re-doing what a storage provisioner does. It might even be possible to write a storage provisioner that re-uses another storage provisioner's PVs (or underlying storage) but I do not know of active work in that area but that could be cool ;)

In our world we solved the problem a little differently. We found that in our provider managed clusters that the PVs allocated when using the "default" storage-class (and all the other storage-classes) were ridiculously too slow and expensive. They're generally designed for 500G+ of storage and double digit IOPs and can take minutes to allocate. Being cheap and wanting good performance we wrote a "local" provisioner that does pseudo-dynamic provisioning. Our integration to use it as the backing storage for workspaces is a bit messy but some of the work that @jlpettersson did really helps. Maybe this would too -- #2595 (comment)

A few weeks back I wondered aloud if maybe Tekton could optionally package a basic storage provisioner like ours (e.g. new experimental project) as otherwise using "workspaces" is painful/expensive, but never took it further.

@tekton-robot
Copy link
Collaborator

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale with a justification.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 20, 2021
@impl
Copy link
Contributor

impl commented Feb 3, 2021

Hey friends,

We've had a need for this for quite some time and I've finally decided to take a stab at implementing it here: https://github.com/puppetlabs/pvpool. It doesn't use the approach of layering storage provisioners because some of those APIs just didn't seem to fit the model well (for example, it can't support static binding and reclaimPolicy could be very confusing). Instead, you create a Checkout object referencing a particular pool of PVs, and the controller will hand you over a PVC as soon as it can. It sounds quite similar to the initial feature request description in this issue -- I'd love any feedback on whether it seems to fit your use case, @bioball.

I haven't taken a look at what we're going to need to do to integrate it with Tekton (we've been using mutating admission webhooks to, ehm, make this process "easy" because I'm behind on learning the new APIs), but it will be on my plate in the next few weeks. Hopefully there are minimal (or no) changes to Tekton needed to get this working -- either way I'll follow up with some additional thoughts as I add this into our product.

Feel free to reach out to me on Slack too and I'd be happy to discuss further!

@tekton-robot
Copy link
Collaborator

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 5, 2021
@impl
Copy link
Contributor

impl commented Mar 25, 2021

We rolled the implementation of this out in our product in the last week or so, so I thought I'd close the loop on this. Because we have a "supervisor" controller that creates PipelineRuns for us, the implementation here was actually quite straightforward using the workspaces feature.

Basically, we end up with something like this:

  • Owner (happens to be a ConfigMap in our case)
    • pvpool.puppet.com/v1alpha1, kind=Checkout
      spec.claimName=some-generated-pvc-name
    • tekton.dev/v1beta1, kind=PipelineRun
      spec.workspaces=[{name: tools, persistentVolumeClaim: {claimName: some-generated-pvc-name, readOnly: true}}]

And that's it! Bind it through the pipeline to tasks as needed. Also, our PVCs are ROX so we turn off the affinity assistant. But otherwise it "just works," which is really nice.

@tekton-robot
Copy link
Collaborator

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen with a justification.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/close

Send feedback to tektoncd/plumbing.

@tekton-robot
Copy link
Collaborator

@tekton-robot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen with a justification.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/close

Send feedback to tektoncd/plumbing.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

5 participants