Skip to content

Latest commit

 

History

History
590 lines (494 loc) · 20.5 KB

0094-configuring-resources-at-runtime.md

File metadata and controls

590 lines (494 loc) · 20.5 KB
status title creation-date last-updated authors
implemented
Configuring Resources at Runtime
2021-11-08
2022-03-11
@lbernick

TEP-0094: Configuring Resources at Runtime

Summary

Add runtime configuration options for setting resource requirements of Steps and Sidecars.

Currently, users can specify resource requirements in a Task definition, via the Resources field of each Step, StepTemplate, or Sidecar. However, there is currently no support for modifying these requirements in a TaskRun, whether from a Pipeline or one-shot.

This TEP proposes adding a configuration option to TaskRunSpec and PipelineTaskRunSpec to override any Step or Sidecar resource requirements specified in a Task.

Motivation

Compute resource requirements typically depend on runtime constraints. The following issues contain user requests for being able to modify resource requirements at runtime:

Goals

Add configuration to TaskRunSpec and PipelineTaskRunSpec allowing users to specify resource requirements of Steps or Sidecars defined in a Task.

Non-Goals

  • Ability to override other Step or Sidecar fields in a TaskRun.
  • Ability to specify combined resource requirements of all Steps or Sidecars at Task or Pipeline level. While this may be a valuable feature, it should be considered in a separate proposal.

Use Cases

  • Image or code building Tasks can use different amounts of compute resources depending on the image or source being built.
  • Kubeflow pipelines and other data pipelines may have variable resource requirements depending on the data being processed.
  • Catalog Tasks should be generally reusable in different environments that may have different resource constraints.

Requirements

  • Users can specify Step and Sidecar resource requirements at runtime.
  • Users can specify Step and Sidecar resource requirements for Tasks or Pipelines they don't own, especially those in the Catalog.
  • Users can specify resource requirements for individual Steps and Sidecars.

Proposal

Augment TaskRunSpec and PipelineTaskRunSpec with a mapping of Step names to overrides and a mapping of Sidecar names to overrides.

Design Details

import corev1 "k8s.io/api/core/v1"

type TaskRunStepOverride struct {
  // The name of the Step to override.
  Name string
  // The resource requirements to apply to the Step.
  Resources corev1.ResourceRequirements
}

type TaskRunSidecarOverride struct {
  // The name of the Sidecar to override.
  Name string
  // The resource requirements to apply to the Sidecar.
  Resources corev1.ResourceRequirements
}

type TaskRunSpec struct {
   ...
   // Overrides to apply to Steps in this TaskRun.
   // If a field is specified in both a Step and a StepOverride,
   // the value from the StepOverride will be used.
   StepOverrides []TaskRunStepOverride

   // Overrides to apply to Sidecars in this TaskRun.
   // If a field is specified in both a Sidecar and a SidecarOverride,
   // the value from the SidecarOverride will be used.
   SidecarOverrides []TaskRunSidecarOverride
}

type PipelineTaskRunSpec struct {
  ...
  // Overrides to apply to Steps in this PipelineTaskRun.
  // If a field is specified in both a Step and a StepOverride,
  // the value from the StepOverride will be used.
  StepOverrides []TaskRunStepOverride

  // Overrides to apply to Sidecars in this PipelineTaskRun.
  // If a field is specified in both a Sidecar and a SidecarOverride,
  // the value from the SidecarOverride will be used.
  SidecarOverrides []TaskRunSidecarOverride
}

Example Task and TaskRun

Example Task:

apiVersion: tekton.dev/v1beta1
kind: Task
metadata:
  name: image-build-task
spec:
  steps:
    - name: build
      image: gcr.io/kaniko-project/executor:latest
      command:
        - /kaniko/executor

Example TaskRun:

apiVersion: tekton.dev/v1beta1
kind: TaskRun
metadata:
  name: image-build-taskrun
spec:
  taskRef:
    name: image-build-task
  stepOverrides:
    - name: build
      resources:
        requests:
          memory: 1Gi

Example Pipeline and PipelineRun

Example Pipeline:

apiVersion: tekton.dev/v1beta1
kind: Pipeline
metadata:
  name: image-build-pipeline
spec:
  tasks:
    - name: image-build-task
      steps:
       - name: build
         image: gcr.io/kaniko-project/executor:latest
         command:
           - /kaniko/executor

Example PipelineRun:

apiVersion: tekton.dev/v1beta1
kind: PipelineRun
metadata:
  name: image-build-pipelinerun
spec:
  taskRunSpecs:
    - pipelineTaskName: image-build-task
      stepOverrides:
        - name: build
          resources:
            requests:
              memory: 1Gi

Notes/Caveats

Mapping container overrides to Steps and Sidecars

This TEP proposes mapping container overrides to their corresponding Steps and Sidecars using named subobjects, as recommended by Kubernetes API convention. This strategy is consistent with other parts of the Tekton API, such as the use of PipelineTaskRunSpec to specify TaskRun configuration for a Task in a Pipeline. It also meets the requirement that users can specify resource requirements for individual Steps and Sidecars. An alternative option is to use a map of Step or Sidecar names to container overrides, but this violates Kubernetes API convention.

Steps and Sidecars are treated separately because they have different fields. We may at some point want to override Step fields that are not present in Sidecar, or vice versa. In addition, a Step could share a name with a Sidecar; separating StepOverrides and SidecarOverrides avoids ambiguity in this case. Duplicate names, missing names, or names that don't match Step or Sidecar names will result in the TaskRun being rejected.

Some users may want to have resource requirements apply to every Step or Sidecar, or to unnamed Steps or Sidecars. We could override unnamed Steps or Sidecars based on their indices, but we don't currently guarantee stable indexing of Steps or Sidecars. If a Task added or removed a Step or Sidecar, this could break the corresponding StepOverride. In addition, applying resource requirements to every Step or Sidecar may not be expected or desirable to users. For these reasons, these features will not be supported for the initial version of this proposal. We will instead encourage Task authors to name their Steps and Sidecars. This decision may be revisited based on user feedback.

Merging resource requirements

Step resource requirements can currently be specified in both Task.Step.Resources and Task.StepTemplate.Resources. If resource requirements are specified in both fields, the value present in Task.Step.Resources is used. However, different resource types (e.g. CPU, memory) are considered independently. For example:

apiVersion: tekton.dev/v1beta1
kind: Task
metadata:
  name: image-build-task
spec:
  steps:
    - name: build
      resources:
        requests:
          memory: 500Mi
        limits:
          memory: 800Mi
  stepTemplate:
    resources:
      requests:
        memory: 300Mi
        cpu: 0.5

A resulting TaskRun will have a memory request of 500Mi, a memory limit of 800Mi, and a CPU request of 0.5 CPU units. (If the StepTemplate specifies a resource request/limit and the Step does not, the value from the StepTemplate will be used as long as it does not result in a request > limit. If the resulting request is greater than the limit, Kubernetes will reject the resulting pod.)

This proposal adds a third way to specify Step resource requirements: TaskRun.StepOverrides[].Resources. TaskRun.StepOverrides[].Resources will override Task.Step.Resources in the same way that that field overrides Task.StepTemplate.Resources. Using the example Task defined above, consider the following TaskRun:

apiVersion: tekton.dev/v1beta1
kind: TaskRun
metadata:
  name: image-build-taskrun
spec:
  taskRef:
    name: image-build-task
  stepOverrides:
    - name: build
      resources:
        requests:
          memory: 700Mi

The TaskRun will have a memory request of 700Mi (from TaskRun.StepOverrides[0].Resources), and a memory limit of 800Mi (from TaskRun.Step[0].Resources). It will have a CPU request of 0.5 CPU units and no CPU limit, as this configuration was specified in the Task.StepTemplate.Resources and not overridden by the Task.Step.Resources or TaskRun.StepOverrides[0].Resources.

Risks and Mitigations

This proposal may increase confusion for users. It may not be obvious which of Task.Step.Resources, Task.StepTemplate.Resources, and TaskRun.StepOverrides[].Resources takes precedence (see Merging resource requirements). In addition, StepOverrides[].Resources and SidecarResources may be confused with PipelineResources. Lastly, users may not understand that Task resource requirements are the sum of the Step resource requirements. This risk can be mitigated via documentation.

User Experience

The current workaround for lack of this feature is to write a new Task for each set of resource constraints, as described in this comment from a buildah user.

This proposal moves environment-specific configuration into TaskRun definitions, allowing Task definitions to be reused. We may want to add an option to the CLI to specify resource requirements when starting a TaskRun via tkn task start, but this is not necessary for the initial implementation.

Test Plan

Unit tests should suffice for this feature, covering the following cases:

  • Tasks with no resource requirements specified
  • Tasks with resource requirements that are partially overridden by the TaskRun
  • Tasks with resource requirements that are fully overridden by the TaskRun
  • TaskRuns with resource requirements launched by themselves and from PipelineRuns.

Examples should be included for overriding resource requirements (e.g. the example used in Merging resource requirements).

Design Evaluation

Reusability

This proposal increases reusability of Tasks and Pipelines by allowing environment-specific execution requirements to be updated at runtime.

Simplicity

The proposed solution contains the minimum number of features that meet the specified requirements, compared to the listed alternatives.

Flexibility

This proposal increases Tekton's flexibility by giving users more options to modify Tasks. There isn't a clear strategy for implementing this functionality via a plugin system.

Conformance

Tekton aims to minimize Kubernetes-specific features in its API. However, the usage of ResourceRequirements is necessary for this feature, as a result of the decision to directly embed Container in the Task API.

Container resource requirements are required for Knative Serving conformance but not for Tekton pipelines conformance. Therefore, StepResources and SidecarResources should also not be required for Tekton conformance.

Drawbacks

In an ideal world, Tasks would not contain fields that are tied to runtime requirements. Tasks might be more reusable if Step and Sidecar were fully Tekton-owned, instead of having the Container API embedded. Updating the Task API to use Tekton-owned structs for Step and Sidecar unties these abstractions from their implementation (a Container) and allows Tekton full control over what fields are specified at authoring time vs runtime. However, this would be a major API change at this point. In addition, untying the Task API from runtime considerations does not change the need to specify resource requirements in the TaskRun.

Alternatives

Allow TaskRuns to patch arbitrary fields of Tasks

Syntax Option 1: Overriding via TaskSpec

Example task.yaml:

apiVersion: tekton.dev/v1beta1
kind: Task
metadata:
  name: image-build-task
spec:
  steps:
    - name: build
      image: gcr.io/kaniko-project/executor:latest
      command:
        - /kaniko/executor

Example taskrun.yaml:

apiVersion: tekton.dev/v1beta1
kind: TaskRun
metadata:
  name: image-build-taskrun
spec:
  taskRef:
    name: image-build-task
  taskSpec:
    steps:
      - name: build
        resources:
          requests:
            memory: 1Gi

Syntax Option 2: JSONPath

Introduce JSONPath syntax to TaskRunSpec and PipelineTaskRunSpec to allow these structs to override any Task field, via a "path" key and a value.

Example task.yaml:

apiVersion: tekton.dev/v1beta1
kind: Task
metadata:
  name: image-build-task
spec:
  steps:
    - name: build
      image: gcr.io/kaniko-project/executor:latest
      command:
        - /kaniko/executor

Example taskrun.yaml:

apiVersion: tekton.dev/v1beta1
kind: TaskRun
metadata:
  name: image-build-taskrun
spec:
  taskRef:
    name: image-build-task
  patches:
    - path: taskRef.steps[0].resources.requests.memory
      value: 1Gi

This solution is not the proposed solution because it does not align with the design principle "Tekton should contain only the bare minimum and simplest features needed to meet the largest number of CI/CD use cases." Additional pros and cons are as follows:

Pros:

  • Allows resource requirements to be specified for each Step and Sidecar at runtime.
  • Increases reusability of Tasks by allowing catalog Tasks to be modified.
  • Replacement of any other Task field comes for free, although this is a non-goal.
  • JSONPath is familiar syntax for some developers.

Cons:

  • Could be too flexible, allowing spec modifications we don’t want to support. No clear use case for the additional flexibility compared to the proposed solution.
  • Duplicates parameterization functionality for the Task fields that support it.
  • May set a precedent for supporting this syntax in other parts of Tekton API.
  • Unclear what Tekton should be responsible for supporting, as opposed to existing tools like kustomize.

Allow resource requirements to be parameterized

Add Step.Resources and Sidecar.Resources to the list of fields supporting variable replacement.

Background

Pipelines supports variable replacement for several string fields. Non-string fields, or string fields with additional validation, cannot currently be parameterized, because values like “$(params.foo)” can’t be unmarshalled from JSON into the corresponding Go structs. In the case of resource requirements, only strings like “100Mi” are accepted by the custom unmarshalling function used for resource Quantities.

Implementation

Supporting variable replacement for resources could be accomplished by replacing Step.Container.Resources and Sidecar.Container.Resources with a Tekton-defined struct, for example:

import corev1 "k8s.io/api/core/v1"

type Step struct {
   corev1.ContainerResources ResourceRequirements
}

type ResourceRequirements struct {
   Limits ResourceList
   Requests ResourceList
}

type ResourceList map[corev1.ResourceName]string

The above example overrides Container.Resources, but variable substitution could also be implemented by adding a new field of type ResourceRequirements to Step and Sidecar rather than overriding an existing one:

import corev1 "k8s.io/api/core/v1"

type Step struct {
   corev1.ContainerResourceRequirements ResourceRequirements
}

type ResourceRequirements struct {
   Limits ResourceList
   Requests ResourceList
}

type ResourceList map[corev1.ResourceName]string

Example Task

apiVersion: tekton.dev/v1beta1
kind: Task
metadata:
  name: image-build-task
spec:
  params:
    - name: RESOURCE_MEMORY_REQUEST
      type: string
      default: 1Gi
  steps:
    - name: build
      image: gcr.io/kaniko-project/executor:latest
      command:
        - /kaniko/executor
      resources:
        requests:
          memory: $(params.RESOURCE_MEMORY_REQUEST)

Design Evaluation

This solution is not the proposed solution because it does not meet the requirement of modifying resource requirements of catalog Tasks. While catalog Task owners can add resource requirement parameters to their Tasks, this clutters Tasks, and not all Tasks may be updated. However, we may choose to implement this feature in addition to the proposed solution.

Additional pros and cons are as follows:

Pros:

  • Allows resource requirements to be specified for each step at runtime.
  • Re-uses existing API concepts; consistent with strategy for parameterizing other container fields such as Args.

Cons:

  • Requires updating CLI and dashboard usages of Pipelines client libraries. Specifically, overriding Step.Resources and Sidecar.Resources means that old versions of the CLI and dashboard will break when used with the new version of Task.

Treat some parameter names as special cases

Allow parameter names to “patch” parts of the Task spec, for example:

apiVersion: tekton.dev/v1beta1
kind: Task
metadata:
  name: image-build-task
spec:
  params:
    - name: steps[0].resources.requests.memory
      default: 1Gi
  steps:
    - name: build
      image: gcr.io/kaniko-project/executor:latest
      command:
        - /kaniko/executor

Alternatively, parameter names such “RESOURCE_MEMORY_REQUEST” and “RESOURCE_MEMORY_LIMIT” could be treated as special cases.

This solution is not the proposed solution because it does not meet the requirement of modifying resource requirements of catalog Tasks (unless we allow TaskRuns to use parameters that aren't defined in Tasks). Additional pros and cons are as follows:

Pros:

  • Allows resource requirements to be specified for each step at runtime.
  • Easy to add variable replacement for other fields if needed, although this is a non-goal.

Cons:

  • Prevents free naming of parameters, and could break existing TaskRuns that have parameters named in this way.

Implementing pull requests: