Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Design Proposal] TEP-0094: Specifying Resource Requirements at Runtime #560

Merged
merged 1 commit into from
Jan 25, 2022

Conversation

lbernick
Copy link
Member

Add design details and alternative solutions for updating the TaskRun
API to allow users to specify resource requirements
of Task Steps and Sidecars. These changes apply to both
one-shot TaskRuns and those launched via PipelineRuns.

/kind tep

@tekton-robot tekton-robot added the kind/tep Categorizes issue or PR as related to a TEP (or needs a TEP). label Nov 15, 2021
@tekton-robot tekton-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 15, 2021
@skaegi
Copy link
Contributor

skaegi commented Nov 15, 2021

/assign

@skaegi
Copy link
Contributor

skaegi commented Nov 15, 2021

/assign @wlynch
/assign @afrittoli
/assign @jerop

Copy link
Member

@wlynch wlynch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like the idea! Just a few follow ups.

teps/0094-configuring-resources-at-runtime.md Outdated Show resolved Hide resolved
teps/0094-configuring-resources-at-runtime.md Show resolved Hide resolved
teps/0094-configuring-resources-at-runtime.md Outdated Show resolved Hide resolved
teps/0094-configuring-resources-at-runtime.md Show resolved Hide resolved
teps/0094-configuring-resources-at-runtime.md Outdated Show resolved Hide resolved
@tekton-robot tekton-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Nov 15, 2021
@lbernick lbernick requested a review from wlynch November 15, 2021 19:26
@bobcatfish
Copy link
Contributor

/assign

Copy link
Contributor

@bobcatfish bobcatfish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the very thorough proposal @lbernick !!

My main thoughts:

  • I'd like to use named objects instead of a map of strings
  • I'd like to avoid introducing the * syntax if we can - using the named objects I think gives us a bit of flexibility here
  • I prefer the option where the feature is "container overrides" vs. just "resource overrides", but only supporting resource overloading for now

teps/0094-configuring-resources-at-runtime.md Outdated Show resolved Hide resolved
teps/0094-configuring-resources-at-runtime.md Outdated Show resolved Hide resolved
teps/0094-configuring-resources-at-runtime.md Show resolved Hide resolved
teps/0094-configuring-resources-at-runtime.md Show resolved Hide resolved
[Knative Serving conformance](https://github.com/knative/specs/blob/main/specs/serving/knative-api-specification-1.0.md#container)
but not for
[Tekton pipelines conformance](https://github.com/tektoncd/pipeline/blob/main/docs/api-spec.md#step).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice find!

instead of having the `Container` API embedded.
However, this would be a major API change at this point,
for little gain compared to the proposed solution.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to elaborate a bit on the "little gain"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, just spit balling

im also not clear on whether this change would solve the problem completely - maybe it makes sense in some cases to specify resource limits at authoring time as well and you'd still need to override them?

e.g. esp for scenarios within a particular organization, it might make more sense to be able to assume certain resource constraints

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a few thoughts but I don't have many specific examples. It seems like resource requirements is the primary way this causes friction at the moment.

teps/0094-configuring-resources-at-runtime.md Outdated Show resolved Hide resolved
teps/0094-configuring-resources-at-runtime.md Outdated Show resolved Hide resolved
of modifying resource requirements of catalog `Task`s.
While catalog `Task` owners can add resource requirement parameters to their `Task`s,
this clutters `Task`s, and not all `Task`s may be updated.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice analysis! 👍

teps/0094-configuring-resources-at-runtime.md Show resolved Hide resolved
@tekton-robot tekton-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Nov 17, 2021
Copy link
Member

@afrittoli afrittoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, thank you!

I have a couple of questions but nothing major / blocking.
About unnamed steps, I initially though indexes could be an option, but perhaps it's better to avoid that. What we could do it to recommend setting a step name for tasks in the catalog, to avoid have tasks in the catalog that cannot be fully tuned.

About overriding the catch-all setting, that's something we could also add later.

/approve

teps/0094-configuring-resources-at-runtime.md Outdated Show resolved Hide resolved
teps/0094-configuring-resources-at-runtime.md Outdated Show resolved Hide resolved
teps/0094-configuring-resources-at-runtime.md Outdated Show resolved Hide resolved
teps/0094-configuring-resources-at-runtime.md Outdated Show resolved Hide resolved
teps/0094-configuring-resources-at-runtime.md Show resolved Hide resolved
teps/0094-configuring-resources-at-runtime.md Show resolved Hide resolved
@tekton-robot tekton-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 22, 2021
Copy link
Member

@vdemeester vdemeester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting for all step is an interesting and tricky behavior to tackle, and might be confusing to some users. Applying 1CPU/1G to a Task that has 10 steps will mean the task will request at least 10CPU/10G (not counting init containers and/or sidecars). This will have to be very well documented that it is per-step and all steps are summed.

Overall looks good to me. I wonder however how much users want this vs a "Task resource request/limits", but I don't think this would get in the way of such a thing in the future.

Comment on lines +76 to +77
and a mapping of `Sidecar` names to overrides.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has the similar problem that TaskRunSpec, … have in PipelineRun/Pipeline specs : you have to know the "shape" of the task you are using. The "small" problem with that is, your are tied to the Task definition. If the definition is updated and the step changed, you TaskRun / PipelineRun will fail.

abstractions from their implementation (a `Container`) and allows Tekton full control
over what fields are specified at authoring time vs runtime.
However, this would be a major API change at this point.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: this can (and should ?) be done carefully between v1beta1 and v1 if need be (by shrinking what we "show" to the world).

@lbernick
Copy link
Member Author

Setting for all step is an interesting and tricky behavior to tackle, and might be confusing to some users. Applying 1CPU/1G to a Task that has 10 steps will mean the task will request at least 10CPU/10G (not counting init containers and/or sidecars). This will have to be very well documented that it is per-step and all steps are summed.

Thanks! definitely agree.

@lbernick lbernick force-pushed the resources branch 2 times, most recently from 886603f to e4a9ed3 Compare November 29, 2021 15:38
Copy link
Member

@wlynch wlynch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just some small naming bikeshedding

teps/0094-configuring-resources-at-runtime.md Outdated Show resolved Hide resolved
teps/0094-configuring-resources-at-runtime.md Outdated Show resolved Hide resolved
teps/0094-configuring-resources-at-runtime.md Outdated Show resolved Hide resolved
Copy link
Member

@jerop jerop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you @lbernick 🎉

@lbernick
Copy link
Member Author

/hold

I looked into tektoncd/pipeline#2986 (unrelated to the issues listed in this proposal) last week and realized that the way we address that issue may affect our design for runtime configuration for resources. I'm going to update the problem statement for this TEP to address that issue as well.

@tekton-robot tekton-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 20, 2021
lbernick added a commit to lbernick/community that referenced this pull request Dec 20, 2021
This commit expands the scope of TEP-0094 to cover the user experience of
specifying resource requests and limits in Tasks.
Focusing only on Step and Sidecar resource requirements may be too narrow of a scope for this TEP.

This is largely motivated by tektoncd/pipeline#2986, because the solution
to this problem may involve removing the ability to specify Step resource requests.
It doesn't make sense to override Step resource requests in TaskRuns if users shouldn't be able
to specify Step resource requests in the first place.

The scope is also expanded to include parameterizing resource requests based on discussion in
tektoncd#560, around treating resource requirement parameterization
and runtime overrides as "both/and", rather than "either/or". Fixing Task's resource requirement UX
may allow us to get parameterization for free.
lbernick added a commit to lbernick/community that referenced this pull request Dec 20, 2021
This commit expands the scope of TEP-0094 to cover the user experience of
specifying resource requests and limits in Tasks.
Focusing only on Step and Sidecar resource requirements may be too narrow of a scope for this TEP.

This is largely motivated by tektoncd/pipeline#2986, because the solution
to this problem may involve removing the ability to specify Step resource requests.
It doesn't make sense to override Step resource requests in TaskRuns if users shouldn't be able
to specify Step resource requests in the first place.

The scope is also expanded to include parameterizing resource requests based on discussion in
tektoncd#560, around treating resource requirement parameterization
and runtime overrides as "both/and", rather than "either/or". Fixing Task's resource requirement UX
may allow us to get parameterization for free.
@lbernick
Copy link
Member Author

re-scoping in #588

@tekton-robot tekton-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 21, 2021
@tekton-robot tekton-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 12, 2022
@lbernick
Copy link
Member Author

Going to re-open this proposal as the behavior that led me to rescope has already been addressed -- see #588 (comment) for more detail.

/hold cancel

@tekton-robot tekton-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 12, 2022
@tekton-robot tekton-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 13, 2022
@jerop
Copy link
Member

jerop commented Jan 24, 2022

/assign @wlynch

will follow up after API meeting

(ping @skaegi please take a look)

Add design details and alternative solutions for updating the `TaskRun`
API to allow users to specify resource requirements
of `Task` `Step`s and `Sidecar`s. These changes apply to both
one-shot `TaskRun`s and those launched via `PipelineRun`s.
@tekton-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: afrittoli, bobcatfish, jerop, vdemeester, wlynch

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • teps/OWNERS [afrittoli,bobcatfish,jerop,vdemeester,wlynch]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@bobcatfish
Copy link
Contributor

/lgtm

@tekton-robot tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Jan 25, 2022
@tekton-robot tekton-robot merged commit 152667f into tektoncd:main Jan 25, 2022
@lbernick lbernick deleted the resources branch March 3, 2022 21:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/tep Categorizes issue or PR as related to a TEP (or needs a TEP). lgtm Indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
Status: Opened
Development

Successfully merging this pull request may close these issues.