Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revamp how pipeline supports Limitranges #4176

Merged
merged 1 commit into from
Sep 15, 2021

Conversation

vdemeester
Copy link
Member

@vdemeester vdemeester commented Aug 20, 2021

Changes

Up until now, pipeline support for LimitRange is rather limited and
confusing and can lead to inconsistency:

  • It is not applied to InitContainers
  • It zero-out user requests to keep the max and assign to one step,
    which can lead to invalid Pod definition.
  • It uses only Min and never reads Default and DefaultRequest
  • It also doesn't support MaxLimitRequestRatio

This commits aims to fix that by adding more support to LimitRange.
Note that, to understand some of the choice, some assumption on how
LimitRange works is required.

On Pod Containers:

  • Requests are not enforced. If the node has more resource available than the request, the
    container can use it.
  • Limits on the other hand, are a hard stop. A container going over the limit, will be
    killed.

It is thus important to get Requests right, to allow scheduling.

Requests and limits can come from both Containers and Init Containers.

  • For init containers, the max of each type is taken
  • For containers, it sums all requests/limits for each containers

This means, if you got the following:

  • initContainer1 : 1 CPU, 100m memory
  • initContainer2 : 2 CPU, 200m memory
  • container1 : 1 CPU, 50m memory
  • container2 : 2 CPU, 250m memory
  • container3 : 3 CPU, 500m memory

The computation will be:

  • CPU : 2 (max init containers) + 6 (sum of containers) = 8 CPU
  • Memory: 200m (max init containers) + 800m (sum of containers) =
    1000m (1G)

LimitRange enforce (mutates the Pod) some Limits and Requests (using
Default and DefaultRequest) and validate those (Min, Max and
MaxLimitRequestRatio). They are applied by namespace, and it is
possible to have multiple LimitRange in a namespace.

The way Limits and Requests works in Kubernetes is because it is assumed that all
containers run in parallel (which they do — except in tekton with some hack), and init
container run before, each one after the others.

That assumption — running in parallel — is not really true in Tekton. They do all start
together (because there is no way around this) but the /entrypoint hack/ is making sure they
actually run in sequence and thus there is always only one container that is actually
consuming some resource at the same time.

This means, we need to handle limits, request and LimitRanges in a /non-standard/ way. Let's
try to define that. Tekton needs to take into account all the aspect of the LimitRange :
the min/max as well as the default. If there is no default, but there is min/max, Tekton
need then to set a default value that is between the min/max. If we set the value too low,
the Pod won't be able to be created, similar if we set the value too high. But those
values are set on containers, so we have to do our own computation to know what request to
put on each containers. To add to the complexity here, we also need to support
MaxLimitRequestRatio, which is just adding complexity on top of something complex. That
said, ideally, if we take the default correctly, we should be able to have support for
MaxLimitRequestRatio for free.

This commits tries to add support for this, by computing the minimum
request to apply that satisfy the LimitRange(s), applying them to
Containers as well InitContainers.

Note: If there is multiple LimitRange in the namespace, Tekton tries
to make the best out of it but if they are conflicting with each
other (a Max on one that is smaller than the Min on the other),
its the user responsability.

Signed-off-by: Vincent Demeester vdemeest@redhat.com

/kind feature

Opening this early to get feedback on the approach as this is kind of a breaking change, but it fixes all the problems users are facing today without adding new problems 😛

What is still to do:

  • Support multiple LimitRange in the namespace (by creating a *virtual LimitRange`)
  • Support MaxLimitRequestRatio in *computations
  • Update the documentation around LimitRange (most likely which a whole new file just about that)
  • Pontetial fixing unit and end-to-end tests

/cc @tektoncd/core-maintainers @danielhelfand

Submitter Checklist

As the author of this PR, please check off the items in this checklist:

  • Docs included if any changes are user facing
  • Tests included if any functionality added or changed
  • Follows the commit message standard
  • Meets the Tekton contributor standards (including
    functionality, content, code)
  • Release notes block below has been filled in or deleted (only if no user facing changes)

Release Notes

Support fully LimitRange by taking into account the `Min`, `Max`, `Default` and `DefaultRequest` fields as well as the `MaxLimitRequestRatio`) to define the coherent Requests and Limits on the Pod.

@vdemeester vdemeester added this to the Pipelines v0.28 milestone Aug 20, 2021
@tekton-robot tekton-robot requested review from danielhelfand and a team August 20, 2021 14:53
@tekton-robot tekton-robot added kind/feature Categorizes issue or PR as related to a new feature. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Aug 20, 2021
@@ -680,7 +681,7 @@ func (c *Reconciler) createPod(ctx context.Context, tr *v1beta1.TaskRun, rtr *re
EntrypointCache: c.entrypointCache,
OverrideHomeEnv: shouldOverrideHomeEnv,
}
pod, err := podbuilder.Build(ctx, tr, *ts)
pod, err := podbuilder.Build(ctx, tr, *ts, limitrange.NewTransformer(ctx, tr.Namespace, c.KubeClientSet))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea here is to, later refactor other "pod" transformation that do not need Task information to be done outside of the Build function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we pass through a LimitRange lister here instead, so this doesn't end up hitting the real API server on each call?

(Not necessarily blocking, especially since this has been open for so long and it's nearing a release-cut, but something I'd like to push for in a quick followup if it doesn't make it into this PR)

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/pod/limitrange/limitrange.go Do not exist 62.5%
pkg/pod/limitrange/transformer.go Do not exist 87.5%
pkg/pod/pod.go 86.9% 90.9% 4.0

@ghost
Copy link

ghost commented Aug 23, 2021

This looks good to me (save for the TODOs that still need to be filled in) and I like the large body of test cases you've added. The docs would definitely help (you coule probably copy/paste your PR description into a doc and it would already be a big improvement over what we have now :D)

One code nit: I think we should either stick with a "stateful builder" pattern (adding a PodTransforms slice to the podconver.Builder type or we should utilize the func-of-funcs style that you've added to podconvert.Builder.Build() for all the settings being set (and remove the explicit Builder struct type completely). I don't think that object should be configured both ways - feels a tad over complicated given that the configuration is all used as part of a single transaction (TaskRun -> Pod conversion). Either approach is fine, but I'd prefer not to have both in one type.

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/pod/limitrange/limitrange.go Do not exist 92.9%
pkg/pod/limitrange/transformer.go Do not exist 87.5%
pkg/pod/pod.go 86.9% 90.9% 4.0

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/pod/limitrange/limitrange.go Do not exist 92.9%
pkg/pod/limitrange/transformer.go Do not exist 87.5%
pkg/pod/pod.go 86.9% 90.9% 4.0

@vdemeester
Copy link
Member Author

One code nit: I think we should either stick with a "stateful builder" pattern (adding a PodTransforms slice to the podconver.Builder type or we should utilize the func-of-funcs style that you've added to podconvert.Builder.Build() for all the settings being set (and remove the explicit Builder struct type completely). I don't think that object should be configured both ways - feels a tad over complicated given that the configuration is all used as part of a single transaction (TaskRun -> Pod conversion). Either approach is fine, but I'd prefer not to have both in one type.

I tend to agree, but at the same time, I didn't want to do the full refactoring in the same PR to keep it at a decent size. Would it work if we document the follow-up in an issue to take later (just after this one getting merged) or should I go ahead and do the refactoring here as well ? (in a different commit)

@ghost
Copy link

ghost commented Aug 31, 2021

Yeah I think making it a follow-up is totally fine. No need to do a rewrite as part of this PR.

@vdemeester vdemeester changed the title WIP: Revamp how pipeline supports Limitranges Revamp how pipeline supports Limitranges Sep 2, 2021
@tekton-robot tekton-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 2, 2021
@vdemeester
Copy link
Member Author

I'll mark this as reviewable (pending adding some documentation and fixing the CI). Main reason is that 90% of cases should be covered with just that set of change. Without having an "code" support of maxLimitRequestRatio shouldn't be a problem in 99% of time, thus I think it makes sense to get this in as is as it's a net enhancement and take our time to add support for maxLimitRequestRatio and Pod type of LimitRange later one. I'll create follow-up issue if that make sense 😉

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/pod/limitrange/limitrange.go Do not exist 92.9%
pkg/pod/limitrange/transformer.go Do not exist 96.6%
pkg/pod/pod.go 86.9% 90.9% 4.0

@vdemeester
Copy link
Member Author

Interesting, the unit tests are green locally.. what is happening 🤔

@ghost
Copy link

ghost commented Sep 2, 2021

I'm able to reproduce the failures after pulling the PR, I wonder if there are files that still need to be pushed from your local?

The tests are failing for me with diffs like this:

        taskrun_test.go:1751: Pod spec doesn't match (-want, +got):   v1.PodSpec{
              	Volumes: {{Name: "tekton-internal-workspace", VolumeSource: {EmptyDir: &{}}}, {Name: "tekton-internal-home", VolumeSource: {EmptyDir: &{}}}, {Name: "tekton-internal-results", VolumeSource: {EmptyDir: &{}}}, {Name: "tekton-internal-steps", VolumeSource: {EmptyDir: &{}}}, ...},
              	InitContainers: []v1.Container{
              		{
              			... // 6 identical fields
              			EnvFrom: nil,
              			Env:     nil,
              			Resources: v1.ResourceRequirements{
              				Limits:   nil,
            - 				Requests: nil,
            + 				Requests: v1.ResourceList{},
              			},
              			VolumeMounts:  {{Name: "tekton-internal-tools", MountPath: "/tekton/tools"}},
              			VolumeDevices: nil,
              			... // 11 identical fields
              		},
              	},
              	Containers: []v1.Container{
              		{
              			... // 6 identical fields
              			EnvFrom: nil,
              			Env:     nil,
              			Resources: v1.ResourceRequirements{
              				Limits:   nil,
            - 				Requests: nil,
            + 				Requests: v1.ResourceList{},
              			},
              			VolumeMounts:  {{Name: "tekton-internal-tools", MountPath: "/tekton/tools"}, {Name: "tekton-internal-downward", MountPath: "/tekton/downward"}, {Name: "tekton-creds-init-home-0", MountPath: "/tekton/creds"}, {Name: "tekton-internal-workspace", MountPath: "/workspace"}, ...},
              			VolumeDevices: nil,
              			... // 11 identical fields
              		},
              	},
              	EphemeralContainers: nil,
              	RestartPolicy:       "Never",
              	... // 30 identical fields
              }

Here's the complete list of errored UTs I'm seeing:

--- FAIL: TestReconcile_ExplicitDefaultSA (0.23s)
    --- FAIL: TestReconcile_ExplicitDefaultSA/success (0.11s)
    --- FAIL: TestReconcile_ExplicitDefaultSA/serviceaccount (0.11s)
--- FAIL: TestReconcile_FeatureFlags (0.24s)
    --- FAIL: TestReconcile_FeatureFlags/disable-home-env-overwrite (0.12s)
    --- FAIL: TestReconcile_FeatureFlags/disable-working-dir-overwrite (0.12s)
--- FAIL: TestReconcile (1.15s)
    --- FAIL: TestReconcile/success (0.13s)
    --- FAIL: TestReconcile/serviceaccount (0.13s)
    --- FAIL: TestReconcile/params (0.13s)
    --- FAIL: TestReconcile/taskrun-with-taskspec (0.13s)
    --- FAIL: TestReconcile/success-with-cluster-task (0.12s)
    --- FAIL: TestReconcile/taskrun-with-resource-spec-task-spec (0.13s)
    --- FAIL: TestReconcile/taskrun-with-pod (0.13s)
    --- FAIL: TestReconcile/taskrun-with-credentials-variable-default-tekton-creds (0.13s)
    --- FAIL: TestReconcile/remote-task (0.13s)

go version just in case it's relevant:

go version go1.17 darwin/amd64

@vdemeester
Copy link
Member Author

@sbwsg yeah I am able to reproduce now.. for some reason, my working copy was in a weird state.. all fixed now, gotta fix the tests 😛

Copy link
Member

@danielhelfand danielhelfand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking the time to gather a lot of these LimitRange cases for Tekton. This lgtm as far as the changes.

"k8s.io/client-go/kubernetes"
)

// var zeroQty = resource.MustParse("0")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can remove this comment.

@vdemeester vdemeester force-pushed the limitrange-revamp branch 3 times, most recently from dafd77c to 25ec928 Compare September 3, 2021 11:07
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/pod/limitrange/limitrange.go Do not exist 92.9%
pkg/pod/limitrange/transformer.go Do not exist 96.6%
pkg/pod/pod.go 86.9% 90.9% 4.0

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/pod/limitrange/limitrange.go Do not exist 92.9%
pkg/pod/limitrange/transformer.go Do not exist 96.6%
pkg/pod/pod.go 86.9% 90.9% 4.0

@vdemeester
Copy link
Member Author

ping @tektoncd/core-maintainers

Copy link
Collaborator

@bobcatfish bobcatfish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great docs!!! Thanks for walking through the problem so thoroughly - I had a bit of feedback on some parts that confused me.

What do you think about including an end to end test for this? Maybe it's overkill - but it seems doable so maybe it's worth it? If not an end to end test, maybe a reconciler test?

Ultra nit-pick: i think there's a hanging ) in the release notes 🙏

/approve

- Ephemeral storage

The next question is : how pods with resource and limits are run/scheduled ?
The scheduler *computes* the amount of CPA and memory requests (using **Requests**) and tries to find a node to schedule it.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo maybe? CPA -> CPU


The way Limits and Requests works in Kubernetes is because it is assumed that all containers run in parallel, and init container run before, each one after the others.

That assumption — containers running in parallel — is not true in Tekton. They do all start together (because there is no way around this) **but** the *entrypoint hack* is making sure they actually run in sequence and thus there is always only one container that is actually consuming some resource at the same time.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wonder if we could link to some docs on the entrypoint hack - i'm surprised our developer docs dont have much on this but maybe we could link to https://github.com/tektoncd/pipeline/tree/main/cmd/entrypoint#entrypoint ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I think it would be worth enhancing the docs around what the entrypoint does 👼🏼

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯


That assumption — containers running in parallel — is not true in Tekton. They do all start together (because there is no way around this) **but** the *entrypoint hack* is making sure they actually run in sequence and thus there is always only one container that is actually consuming some resource at the same time.

This means, we need to handle limits, request and LimitRanges in a *non-standard* way. Let's try to define that. Tekton needs to take into account all the aspect of the LimitRange : the min/max as well as the default. If there is no default, but there is min/max, Tekton need then to **set** a default value that is between the min/max. If we set the value too low, the Pod won't be able to be created, similar if we set the value too high. **But** those values are set on **containers**, so we **have to** do our own computation to know what request to put on each containers.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got a bit confused by this part - specifically why we need to set the defaults.

After reading through everything I think the overall reason is that we can't rely on the existing LimitRange mutating webhook, so we pretty much need to explicitly set everything ourselves, is that right?

Maybe you could emphasize that a bit here, something like:

This means, we need to handle limits, request and LimitRanges in a *non-standard* way. Since the existing LimitRange mutating webhook won't take Tekton's requirements into account, Tekton needs to fully control all the values the LimitRange webhook might set...

Not sure if that's clearer, feel free to ignore 🙏

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reading through everything I think the overall reason is that we can't rely on the existing LimitRange mutating webhook, so we pretty much need to explicitly set everything ourselves, is that right?

Yes indeed. If we want allocate resource in relatively optimial manner, we cannot rely on the LimitRange mutating webhook.

default. I think that's also what k8s does, at least in our computation.
- **Default value:** we need to "try" to respect that as much as possible.
- `defaultLimit` but no `defaultRequest`, then we set `defaultRequest` to be same as `defaultLimit`.
- `defaultRequest` but no `defaultlimit`, then we use the `min` limit as the `defaultLimit`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious why we wouldn't use the max as the defaultLimit ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, I think that's a typo / wrong thing here, I'll fix.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because, in a gist, Requests are used for scheduling (so we need the "minimum") and Limits are used for "killing" the running process (so we kinda need the "maximum")

- **containers:** those will be summed at the end, so it gets a bit complex
- a container needs to have request and limits at least at the `min`
- the sum of the container request/limits **should be** as small as possible. This should be
ensured by using the "smallest" possible request on it.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe this answers my question above about why the min is being used as the defaultLimit - i think it would help me understand better if you explained near the beginning of the section that we want the sum to be as small as possible and why

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this was updated - and/or my understanding is just better now, but anyway lgtm 👍 👍 👍

time from the invoked `Task`, Tekton only requests the *maximum* values for CPU, memory, and ephemeral
storage from within each `Step`. This is sufficient as `Steps` only execute one at a time in the `Pod`.
Requests other than the maximum values are set to zero.
time from the invoked `Task`, Tekton will requests the compute values for CPU, memory, and ephemeral
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: tekton will requests -> tekton will request

(in taskruns.md as well)

limitations under the License.
*/
// Package limitrange defines logic for supporting Kubernetes LimitRange for the specific Tekton use cases
package limitrange
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, i like the new package :D

random question and maybe off topic: would it make sense to put this in internal/? I'm wondering if we want to start putting as much code as possible in internal/, i.e. being careful about what code other projects might start depending on?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a fervent defenser of internal package, so yes, I think we should put in under internal to start with and we can move it outside at some point if need be.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only "open question" is pkg/internal/limitrange or internal/pkg/limitrange ?
We don't need to get access to this package in the e2e tests or cmd packages, so I'd rather have pkg/internal but 😛

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

either is fine by me! :D 💃 pkg/internal sgtm - thanks @vdemeester !

// We are adding +1 to the number of container to take into account the init containers.
// The reason to use +1 only is that, k8s tread request on all init container as one (getting the max of all)
nbContainers := len(p.Spec.Containers) + 1
// FIXME(vdemeester) maxLimitRequestRatio to support later
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we link to an issue vs just your username? (otherwise this kinda thing is hard to follow up on later on)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, kept this here but indeed I intended to create an issue about this 👼🏼

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#4230 👼🏼

Up until now, pipeline support for LimitRange is rather limited and
confusing and can lead to inconsistency:
- It is not applied to `InitContainers`
- It zero-out user requests to keep the max and assign to one step,
  which can lead to invalid `Pod` definition.
- It uses only `Min` and never reads `Default` and `DefaultRequest`
- It also doesn't support `MaxLimitRequestRatio`

This commits aims to fix that by adding more support to LimitRange.
Note that, to understand some of the choice, some assumption on how
LimitRange works is required.

On `Pod` Containers:
- *Requests* are not enforced. If the node has more resource available than the request, the
  container can use it.
- *Limits* on the other hand, are a hard stop. A container going over the limit, will be
  killed.

It is thus important to get `Requests` right, to allow scheduling.

Requests and limits can come from both Containers and Init Containers.
- For init containers, the max of each type is taken
- For containers, it sums all requests/limits for each containers

This means, if you got the following:
- initContainer1 : 1 CPU, 100m memory
- initContainer2 : 2 CPU, 200m memory
- container1 : 1 CPU, 50m memory
- container2 : 2 CPU, 250m memory
- container3 : 3 CPU, 500m memory

The computation will be:
- CPU : 2 (max init containers) + 6 (sum of containers) = 8 CPU
- Memory: 200m (max init containers) + 800m (sum of containers) =
1000m (1G)

LimitRange enforce (mutates the `Pod`) some Limits and Requests (using
`Default` and `DefaultRequest`) and validate those (`Min`, `Max` and
`MaxLimitRequestRatio`). They are applied by namespace, and it is
possible to have multiple `LimitRange` in a namespace.

The way Limits and Requests works in Kubernetes is because it is assumed that all
containers run in parallel (which they do — except in tekton with some hack), and init
container run before, each one after the others.

That assumption — running in parallel — is not really true in Tekton. They do all start
together (because there is no way around this) *but* the /entrypoint hack/ is making sure they
actually run in sequence and thus there is always only one container that is actually
consuming some resource at the same time.

This means, we need to handle limits, request and LimitRanges in a /non-standard/ way. Let's
try to define that. Tekton needs to take into account all the aspect of the LimitRange :
the min/max as well as the default. If there is no default, but there is min/max, Tekton
need then to *set* a default value that is between the min/max. If we set the value too low,
the Pod won't be able to be created, similar if we set the value too high. *But* those
values are set on *containers*, so we *have to* do our own computation to know what request to
put on each containers. To add to the complexity here, we also need to support
`MaxLimitRequestRatio`, which is just adding complexity on top of something complex. That
said, ideally, if we take the default correctly, we should be able to have support for
`MaxLimitRequestRatio` for free.

This commits tries to add support for this, by computing the minimum
request to apply that satisfy the `LimitRange`(s), applying them to
`Containers` as well `InitContainers`.

Note: If there is multiple `LimitRange` in the namespace, Tekton tries
to make the best out of it *but* if they are conflicting with each
other (a `Max` on one that is smaller than the `Min` on the other),
its the user responsability.

Signed-off-by: Vincent Demeester <vdemeest@redhat.com>
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/internal/limitrange/limitrange.go Do not exist 92.9%
pkg/internal/limitrange/transformer.go Do not exist 96.6%
pkg/pod/pod.go 87.0% 91.0% 4.0

Copy link
Collaborator

@bobcatfish bobcatfish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

image

- **containers:** those will be summed at the end, so it gets a bit complex
- a container needs to have request and limits at least at the `min`
- the sum of the container request/limits **should be** as small as possible. This should be
ensured by using the "smallest" possible request on it.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this was updated - and/or my understanding is just better now, but anyway lgtm 👍 👍 👍


That assumption — containers running in parallel — is not true in Tekton. They do all start together (because there is no way around this) **but** the *[entrypoint](https://github.com/tektoncd/pipeline/tree/main/cmd/entrypoint#entrypoint) hack* is making sure they actually run in sequence and thus there is always only one container that is actually consuming some resource at the same time.

This means, we need to handle limits, request and LimitRanges in a *non-standard* way. Since the existing LimitRange mutating webhook won't take Tekton's requirements into account, Tekton needs to fully control all the values the LimitRange webhook might set. Let's try to define that. Tekton needs to take into account all the aspect of the LimitRange : the min/max as well as the default. If there is no default, but there is min/max, Tekton need then to **set** a default value that is between the min/max. If we set the value too low, the Pod won't be able to be created, similar if we set the value too high. **But** those values are set on **containers**, so we **have to** do our own computation to know what request to put on each containers.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for updating this! its much more clear to me now 🙇‍♀️

@tekton-robot tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Sep 15, 2021
@bobcatfish
Copy link
Collaborator

hm maybe a flakey test TestExamples/v1alpha1/taskruns/step-script:

      >>> Container step-dollar-signs-allowed:
        unable to retrieve container logs for docker://3bc0a22b998e10ccbb8fd5f9d088798468202ffdd9938c9e5112cda739803325

/test pull-tekton-pipeline-alpha-integration-tests

@tekton-robot tekton-robot merged commit ff860df into tektoncd:main Sep 15, 2021
@imjasonh
Copy link
Member

This merged, but I wanted to reiterate my review comment from above for posterity:

Can we pass through a LimitRange lister here instead, so this doesn't end up hitting the real API server on each call?

(Not necessarily blocking, especially since this has been open for so long and it's nearing a release-cut, but something I'd like to push for in a quick followup if it doesn't make it into this PR)

@vdemeester vdemeester deleted the limitrange-revamp branch September 16, 2021 06:25
vdemeester added a commit to vdemeester/tektoncd-pipeline that referenced this pull request Sep 16, 2021
Using a LimitRange lister here instead, so this doesn't end up hitting
the real API server on each call.

Taking into account a review :
tektoncd#4176 (comment).

Signed-off-by: Vincent Demeester <vdemeest@redhat.com>
vdemeester added a commit to vdemeester/tektoncd-pipeline that referenced this pull request Sep 16, 2021
Using a LimitRange lister here instead, so this doesn't end up hitting
the real API server on each call.

Taking into account a review :
tektoncd#4176 (comment).

Signed-off-by: Vincent Demeester <vdemeest@redhat.com>
vdemeester added a commit to vdemeester/tektoncd-pipeline that referenced this pull request Sep 16, 2021
Using a LimitRange lister here instead, so this doesn't end up hitting
the real API server on each call.

Taking into account a review :
tektoncd#4176 (comment).

Signed-off-by: Vincent Demeester <vdemeest@redhat.com>
vdemeester added a commit to vdemeester/tektoncd-pipeline that referenced this pull request Sep 16, 2021
Using a LimitRange lister here instead, so this doesn't end up hitting
the real API server on each call.

Taking into account a review :
tektoncd#4176 (comment).

Signed-off-by: Vincent Demeester <vdemeest@redhat.com>
vdemeester added a commit to vdemeester/tektoncd-pipeline that referenced this pull request Sep 16, 2021
Using a LimitRange lister here instead, so this doesn't end up hitting
the real API server on each call.

Taking into account a review :
tektoncd#4176 (comment).

Signed-off-by: Vincent Demeester <vdemeest@redhat.com>
vdemeester added a commit to vdemeester/tektoncd-pipeline that referenced this pull request Sep 16, 2021
Using a LimitRange lister here instead, so this doesn't end up hitting
the real API server on each call.

Taking into account a review :
tektoncd#4176 (comment).

Signed-off-by: Vincent Demeester <vdemeest@redhat.com>
vdemeester added a commit to vdemeester/tektoncd-pipeline that referenced this pull request Sep 16, 2021
Using a LimitRange lister here instead, so this doesn't end up hitting
the real API server on each call.

Taking into account a review :
tektoncd#4176 (comment).

Signed-off-by: Vincent Demeester <vdemeest@redhat.com>
vdemeester added a commit to vdemeester/tektoncd-pipeline that referenced this pull request Sep 16, 2021
Using a LimitRange lister here instead, so this doesn't end up hitting
the real API server on each call.

Taking into account a review :
tektoncd#4176 (comment).

Signed-off-by: Vincent Demeester <vdemeest@redhat.com>
vdemeester added a commit to vdemeester/tektoncd-pipeline that referenced this pull request Sep 16, 2021
Using a LimitRange lister here instead, so this doesn't end up hitting
the real API server on each call.

Taking into account a review :
tektoncd#4176 (comment).

Signed-off-by: Vincent Demeester <vdemeest@redhat.com>
vdemeester added a commit to vdemeester/tektoncd-pipeline that referenced this pull request Sep 16, 2021
Using a LimitRange lister here instead, so this doesn't end up hitting
the real API server on each call.

Taking into account a review :
tektoncd#4176 (comment).

Signed-off-by: Vincent Demeester <vdemeest@redhat.com>
vdemeester added a commit to vdemeester/tektoncd-pipeline that referenced this pull request Sep 16, 2021
Using a LimitRange lister here instead, so this doesn't end up hitting
the real API server on each call.

Taking into account a review :
tektoncd#4176 (comment).

Signed-off-by: Vincent Demeester <vdemeest@redhat.com>
vdemeester added a commit to vdemeester/tektoncd-pipeline that referenced this pull request Sep 16, 2021
Using a LimitRange lister here instead, so this doesn't end up hitting
the real API server on each call.

Taking into account a review :
tektoncd#4176 (comment).

Signed-off-by: Vincent Demeester <vdemeest@redhat.com>
tekton-robot pushed a commit that referenced this pull request Sep 16, 2021
Using a LimitRange lister here instead, so this doesn't end up hitting
the real API server on each call.

Taking into account a review :
#4176 (comment).

Signed-off-by: Vincent Demeester <vdemeest@redhat.com>
chenbh pushed a commit to chenbh/pipeline that referenced this pull request Oct 27, 2021
Using a LimitRange lister here instead, so this doesn't end up hitting
the real API server on each call.

Taking into account a review :
tektoncd#4176 (comment).

Signed-off-by: Vincent Demeester <vdemeest@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/feature Categorizes issue or PR as related to a new feature. lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants