-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
resource hoarding for steps of a taskrun #4347
Comments
Hey @Fabian-K, Thanks for the issue.
This is not a "choice" we have, this is how Kubernetes and pods are working, and.. we can only try our best around this.
Right, I did update some documentation (mainly around LimitRange) but I think I missed this on the update the part of the Task and steps. But there is one thing that is sure, we do not (want to) modify the user "asks". If a user ask for 1 CPU for each step, we "require" 1 CPU for each container (aka we are not trying to get smart here). One reasoning here is : some user may be used to Pod and Containers, and will want to have (in your example) 2 CPU for this Task, and others may want (as you want), 1 CPU for it ; it's very tricky to get it right for everybody, so the idea, at least on the step part of it, is to treat things specified by the user as non-modifiable. That said, a way forward here would be to let the user specify resource limits/requirement at the Task level instead — then you would be able to say "however it is done on steps, I want this task to hold 1 CPU for the whole Task". |
Hi @vdemeester, thanks for the fast feedback! To me, the "smart" behavior of 0.26.0 made sense because when there are two steps requesting each 1 CPU, because they run sequentially in tekton, it is sufficient for the pod to only "request" 1 CPU and not 2. Yes, both containers still run at the same time but "when not being active", the steps should not use many resources - and by adjusting other requests to 0 by tekton, the overall requested resources of the pod matched. Because of the sequential execution, each step had the amount of CPU available that it requested. The behavior of tekton 0.28.2 kind of "blocks" 1 CPU that could be used for other workloads. Specifying the resource requests on the task level sounds like a good idea! And also let me add: the new behavior of being "stupid" and not adjusting the requests is fine - I was just surprised by it because I did not notice the behavior change in the release notes and suddenly after the upgrade taskruns could not be scheduled any more because they effectively requested much more resources than before and the nodes were not big enough 😅 |
Yeah, re-reading the release-note.. I think I also did a poor job highlighting the effect on resource request (without the LimitRange use case in mind), sorry 😅 😓 . |
So if I understand correctly, starting with 0.28 the expected behavior is "simple" aka don´t change/update any requests that are present. With that, this is not a bug and can be closed. I think however It should make sense to update the docs as the following is not valid anymore? Lines 166 to 174 in f2e25cf
|
Yes, we need to fix the docs (so.. I consider it as a "doc" bug 😛 ) |
@lbernick: You can't close an active issue/PR unless you authored it or you are a collaborator. |
/assign |
/close |
@lbernick: You can't close an active issue/PR unless you authored it or you are a collaborator. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Hi, we noticed the following issue after upgrading from v0.26.0 to v0.28.2
Expected Behavior
When creating a taskrun with multiple steps where the steps have resource requests defined, tekton should not "hoard resources". The created pod should only request the maximum of the step containers and not the sum of all step containers. This is also described in https://github.com/tektoncd/pipeline/blob/main/docs/tasks.md#specifying-resources with "The CPU, memory, and ephemeral storage resource requests will be set to zero (also known as BestEffort), or, if specified, the minimums set through LimitRanges in that Namespace, if the container image does not have the largest resource request out of all container images in the Task. This ensures that the Pod that executes the Task only requests enough resources to run a single container image in the Task rather than hoard resources for all container images in the Task at once."
Actual Behavior
Tekton does hoard resources in 0.28.2
Steps to Reproduce the Problem
TaskRun
Resulting Pod using Tekton v0.28.2 - unexpected resource hoarding
Resulting Pod using Tekton v0.26.0 - expected behavior, no resource hoarding
Additional Info
v1.21.1
v0.28.2
The text was updated successfully, but these errors were encountered: