Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support use-case of building sources with re-usable cache #3097

Closed
quintesse opened this issue Aug 12, 2020 · 17 comments
Closed

Support use-case of building sources with re-usable cache #3097

quintesse opened this issue Aug 12, 2020 · 17 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@quintesse
Copy link

Feature request

Be able to build sources using a writable volume shared between tasks in a pipeline, while at the same time having a different writable volume for caching downloaded artifacts (Maven artifacts, NPMs, etc).

Use case

Right now with the Affinity Assistant it's not possible anymore to have multiple writable volumes and the way the docs put it it seems that it is considered to be a somewhat exotic requirement. But wanting to keep downloaded artifacts around so future builds will complete more quickly seems a very common thing to want to do.

So ideally we'd want two workspaces, one to build the sources and one to hold the downloaded artifacts.

Now, from different sources we've seen it mentioned that you can use different sub-paths within the same volume, and while that's definitely a solution it seems that this forces the Task to do clean-up of the build environment itself. So instead of getting a clean setup each time we basically get a possibly dirty environment which might affect the build process. That seems less than ideal.

Being able to strictly separate the (throw-away) build environment and the download cache makes everything much more controllable and less complicated.

But honestly, for the build environment we really aren't even interested in using PVCs, but we're forced to do so because emptyDir can't be used between Tasks in Pipelines. And yes we can use VolumeClaimTemplates but they still result in the volume staying around until the PipelineRun gets removed, which might be much longer than necessary.

@quintesse quintesse added the kind/feature Categorizes issue or PR as related to a new feature. label Aug 12, 2020
@quintesse
Copy link
Author

NB: related pull-request with information about issues related to the Affinity Assistant and multiple writable volumes: #2885

@vdemeester
Copy link
Member

@quintesse thx for the issue and feedback.

Being able to strictly separate the (throw-away) build environment and the download cache makes everything much more controllable and less complicated.

+:100:

This is something multiple customers did bring back to us on the affinity-assistant. I think we should have done the feature flag the opposite (aka enabling it) but that's too late for changing that now.

That said, I think we should be able to give the opportunity for the user to disable the affinity assistant per *Run (or Pipeline I think ?). The user should be able to decide to disable the affinity assistant, by using an annotation for example. That way, it's enabled by default and can be disable by the user ad-hoc (on specific runs) or on the whole installation.

/cc @imjasonh @sbwsg @jlpettersson @afrittoli

@jlpettersson
Copy link
Member

Being able to strictly separate the (throw-away) build environment and the download cache makes everything much more controllable and less complicated.

+💯

+💯

This is something multiple customers did bring back to us on the affinity-assistant. I think we should have done the feature flag the opposite (aka enabling it) but that's too late for changing that now.

My opinion is that Tekton should not leave users with problem A Or problem B, but instead solve both problems, or at least solve and guide the user on how to solve it the Tekton way.

That said, I think we should be able to give the opportunity for the user to disable the affinity assistant per *Run (or Pipeline I think ?). The user should be able to decide to disable the affinity assistant, by using an annotation for example. That way, it's enabled by default and can be disable by the user ad-hoc (on specific runs) or on the whole installation.

I don't see how that makes sense? Only some runs can be deadlocked because the pods ended up in different AZs?

@vdemeester
Copy link
Member

vdemeester commented Aug 13, 2020

@jlpettersson so as of today, with the affinity assistant enabled, I cannot add two different PVC to my TaskRun, even if those are ReadWriteMany, even if I (the user) know it's gonna work in my setup. Having a way to say "disable the affinity assistant" for a run, let the user deal with it, without affecting the default behavior.

As it is today, the affinity assistant prevent @quintesse (and others) use case. All I am suggesting here is to be able to allow this use case (with proper warning and documentation on disabling the assistant) while still having the default behavior in most case. It is possible to disable the affinity assistant for the whole pipeline instance, effictively letting the user deal with it — what is proposed here is to allow the user to keep the default behavior (affinity assistant enabled) and disabling it per run (at his own risks).

@jlpettersson
Copy link
Member

@jlpettersson so as of today, with the affinity assistant enabled, I cannot add two different PVC to my TaskRun,

Thats true.

even if those are ReadWriteMany, even if I (the user) know it's gonna work in my setup. Having a way to say "disable the affinity assistant" for a run, let the user deal with it, without affecting the default behavior.

But if you use a ReadWriteMany volume in a regional cluster - you will have the same problem and need an affinity assistant - but less strict. #3052 can probably be a better direction for such features.

But if you use a ReadWriteMany volume in a zonal cluster - you can disable the affinity assistant for all workload

There is a third case maybe, if using ReadWriteMany volumes in a zonal cluster and also want to use other access modes e.g. ReadWriteOnce for cost-optimizations maybe? I don't know... but there is certainly many different cases, currently the intention was to solve as many problems as possible for the most common cases and with little effort. I definitely think #3052 is an improvement... that can allow for more configuration - but it is still a distributed system - and a scheduler will most likely enforce more or less the same constraints, unfortunately.

As it is today, the affinity assistant prevent @quintesse (and others) use case.

Zonal volumes is always a problem in a regional cluster. And Volumes within a single zone is always a problem for parallelism. There might be other solutions than this. The one discussed most is to fetch cached dependencies from a remote location (possibly a cache? or bucket). But this is indeed a difficult problem to handle for any build system that use a cluster for its workload - I wish there is a better solution for it.

@vdemeester
Copy link
Member

But if you use a ReadWriteMany volume in a regional cluster - you will have the same problem and need an affinity assistant - but less strict. #3052 can probably be a better direction for such features.

But if you use a ReadWriteMany volume in a zonal cluster - you can disable the affinity assistant for all workload

Right, but as of today, a user who wish doesn't run a zonal cluster (using something else than gke or cloud that support and enable those), and who doesn't have the power to disable the affinity assistant, cannot have a PVC for his/her cache and another PVC for the source(s). Adding a way to disable the affinity assistant per pipeline, opt-in (through an annotation), helps that particular use case, without changing the default setup. By opting in, the user is explicitely saying "I know what I am doing"

I tend to agree that #3052 can help in the long run. What I am proposing here is a simple workaround that doesn't, imho, prevent any future improvements.

@quintesse
Copy link
Author

Thanks for all the feedback/discussion on this issue!

But I want to make sure that what @vdemeester suggests can be considered an actual solution that can be used by all users/customers. Because I'm not looking for just a way to solve my problem, I'm trying to create the official Tekton Task that will be used and promoted by a productized Quarkus. I need to be sure that we can recommend this solution to users wanting to build their Quarkus apps without too many ifs and buts.

The other issue is that creating temporary volumes using VolumeClaimTemplates seems to make using Tekton more difficult than it needs to be. But I'm guessing that's because the underlying K8s doesn't give any better tools for setting up shared ephemeral storage in a very easy way, right?

@jlpettersson
Copy link
Member

jlpettersson commented Aug 13, 2020

Just so it is clear.

As it is today, the affinity assistant prevent @quintesse (and others) use case.

I have the exact same problem, e.g for

caching downloaded artifacts (Maven artifacts, NPMs, etc).

so it is not a use case I omit.

But my opinion is that this need to be handled slightly different in a kubernetes environment, e.g. fetching it from remote location. Similar to how it is documented in other cloud build system, e.g. Google Cloud Build - caching directories ...nothing unique for Tekton. But yeah, there may be solutions that work for some users and others that only work for other users...

@quintesse
Copy link
Author

But my opinion is that this need to be handled slightly different in a kubernetes environment, e.g. fetching it from remote location.

Just to be clear, in your opinion we shouldn't be trying to cache "locally" but only use remote fetching of dependencies (possibly using local proxies to speed things up)?

@jlpettersson
Copy link
Member

But my opinion is that this need to be handled slightly different in a kubernetes environment, e.g. fetching it from remote location.

Just to be clear, in your opinion we shouldn't be trying to cache "locally" but only use remote fetching of dependencies (possibly using local proxies to speed things up)?

yes, that is kind of what I meant... but I am not sure I am aware of the "best" solution...

  • Single Node System: In this case, it is relatively easy(?) to cache locally - an example is Jenkins
  • Cluster System: By nature more difficult - there are many alternatives ... but if you try to compete in latency with a single node - it may be difficult. One way is to only use a single node at a time ... but it is a bit against why many use a cluster. Or accept the larger latency and try to get other benefits from running in a cluster - e.g. high availability and scalability....

what solution to use in a cluster environment may depend on priorities ... latency? scalability? automatization? high-availability? You can certainly get the same latency in a cluster environment as in a single node environment - but you may give up from other cluster environment benefits... its about trade-offs (in my opinion).

@jlpettersson
Copy link
Member

To summarize.

Tekton uses a cluster for its workload, this has benefits in terms of scalability and high availability compared to a Single Node CI/CD-system. As a consequence of that...

caching in a cluster system is indeed a problem. Some parts of the problem:

  • Cache on a Node is not enough, as multiple Nodes is in use.
  • Cache on a volume is not enough, if a multi-datacenter cluster is used (as volumes is typically only available within a dc).
  • Cache accessible over the network (e.g. http) is viable.
    • Cache accessible over network can be done with pull semantics (e.g. request) - easier to implement, e.g. http request
    • Cache accessible over network can be done with push semantics (e.g. broadcasted to nodes) - harder to implement - but would allow for quick access for the workload. Volume size and how to implement it in a Kubernetes environment may be challenging.

@tekton-robot
Copy link
Collaborator

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.

/lifecycle stale

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 11, 2020
@vdemeester
Copy link
Member

/remove-lifecycle stale

@tekton-robot tekton-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 12, 2020
@tekton-robot
Copy link
Collaborator

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale with a justification.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 10, 2021
@tekton-robot
Copy link
Collaborator

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 12, 2021
@tekton-robot
Copy link
Collaborator

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen with a justification.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/close

Send feedback to tektoncd/plumbing.

@tekton-robot
Copy link
Collaborator

@tekton-robot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen with a justification.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/close

Send feedback to tektoncd/plumbing.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

4 participants