From a509831095500beb8feb673734317b5849a54540 Mon Sep 17 00:00:00 2001 From: Christie Wilson Date: Mon, 8 Mar 2021 16:54:59 -0500 Subject: [PATCH] =?UTF-8?q?Start=20brainstorming=20options=20=F0=9F=A7=A0?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This commit adds possible solutions to the problem described in TEP-0044, including references to solutions in other TEPS (46 & 54). I was hoping to merge the problem statement before starting to talk about solutions but it seems like the problem statement is too abstract to get enough traction, and meanwhile folks have been opening more TEPs with related proposals in the meantime, so hopefully starting to list the options here will help us move the discussion forward. I'm hoping we can merge the problem + possible options without needing to decide on which one(s) we want to pursue. --- ...couple-task-composition-from-scheduling.md | 834 +++++++++++++++++- teps/README.md | 2 +- 2 files changed, 832 insertions(+), 4 deletions(-) diff --git a/teps/0044-decouple-task-composition-from-scheduling.md b/teps/0044-decouple-task-composition-from-scheduling.md index b4accda94..66b7b8575 100644 --- a/teps/0044-decouple-task-composition-from-scheduling.md +++ b/teps/0044-decouple-task-composition-from-scheduling.md @@ -2,7 +2,7 @@ status: proposed title: Decouple Task Composition from Scheduling creation-date: '2021-01-22' -last-updated: '2021-02-08' +last-updated: '2021-03-10' authors: - '@bobcatfish' --- @@ -19,6 +19,8 @@ authors: - [Requirements](#requirements) - [References (optional)](#references-optional) - [PipelineResources](#pipelineresources) +- [Design Details](#design-details) +- [Alternatives](#alternatives) ## Summary @@ -35,7 +37,7 @@ schedule execution. This means that choices made around Task design (e.g. creating a Task that encapsulates a git clone and a separate Task to run go unit tests) directly impact the performance and overhead involved in executing the Tasks. For example if the git clone task wants to share data with the unit test task, beyond a simple -[result](https://github.com/tektoncd/pipeline/blob/master/docs/pipelines.md#using-results), it needs to +[result](https://github.com/tektoncd/pipeline/blob/master/docs/pipelines.md#using-results), you'll need to provision a PVC or do some other similar, cloud specific storage, to [make a volume available](https://github.com/tektoncd/pipeline/blob/master/docs/workspaces.md#specifying-volumesources-in-workspaces) that can be shared between them, and running the second Task will be delayed by the overhead of scheduling a second pod. @@ -46,7 +48,9 @@ that can be shared between them, and running the second Task will be delayed by Tasks together and have control over the scheduling overhead (i.e. pods and volumes required) at authoring time in a way that can be reused (e.g. in a Pipeline) - Add some of [the features we don't have without PipelineResources](https://docs.google.com/document/d/1KpVyWi-etX00J3hIz_9HlbaNNEyuzP6S986Wjhl3ZnA/edit#) - to Tekton Pipelines (without requiring use of PipelineResources), specifically **Task adapters/specialization** + to Tekton Pipelines (without requiring use of PipelineResources), specifically the first feature listed in + [the doc](https://docs.google.com/document/d/1KpVyWi-etX00J3hIz_9HlbaNNEyuzP6S986Wjhl3ZnA/edit#heading=h.gi1d1dikb39u): + **Task adapters/specialization** ### Non-Goals @@ -130,6 +134,830 @@ Where they differ: address the above use case; for example in PipelineResources, you can have a storage "output" but if the steps fail, the "output" pipelineresource will not run +## Design details + +TBD - currently focusing on enumerating and examining alternatives before selecting one or more ways forward. + +## Alternatives + +Most of these options are not mutually exclusive: + +* [Task composition in Pipeline Tasks](#task-composition-in-pipeline-tasks) +* [Update PipelineResources to use Tasks](#update-pipelineresources-to-use-tasks) +* [Automatically combine Tasks based on workspace use](#automagically-combine-tasks-based-on-workspace-use) +* [Introduce scheduling rules to Pipeline](#introduce-scheduling-rules-to-pipeline) +* [PipelineRun: emptyDir](#pipelinerun-emptydir) +* [Controller configuration](#controller-level) +* [Within the Task](#within-the-task) +* [Remove distinction between Tasks and Pipelines](#remove-distinction-between-tasks-and-pipelines) +* [Custom Pipeline](#custom-pipeline) +* [Create a new Grouping CRD](#create-a-new-grouping-crd) +* [Rely on the Affinity Assistant](#rely-on-the-affinity-assistant) +* [Custom scheduler](#custom-scheduler) +* [Support other ways to share data (e.g. buckets)](#support-other-ways-to-share-data-eg-buckets) +* [Focus on workspaces](#focus-on-workspaces) + +Most of the solutions above involve allowing more than one Task to be run in the same pod, and those proposals all share +the following pros & cons. + +Pros: +* Making it possible to execute a Pipeline in a pod will also pave the way to be able to support use cases such as + [local execution](https://github.com/tektoncd/pipeline/issues/235) + +Cons: + +* Increased complexity around requesting the correct amount of resources when scheduling (having to look at the + requirements of all containers in all Tasks, esp. when they run in parallel) +* Requires re-architecting and/or duplicating logic that currently is handled outside the pods in the controller + (e.g. passing results between Tasks and other variable interpolation) + + +### Task composition in Pipeline Tasks + +In this option we make it possible to express Tasks which can be combined together to run sequentially as one pod. + +In the following example, 3 Tasks will be combined and run in one pod sequentially: + +1. `git-clone` +2. `just-unit-test` +3. `gcs-upload` + +```yaml +apiVersion: tekton.dev/v1beta1 +kind: Pipeline +metadata: + name: build-test-deploy +spec: + params: + - name: url + value: https://github.com/tektoncd/pipeline.git + - name: revision + value: v0.11.3 + workspaces: + - name: source-code + - name: test-results + tasks: + - name: run-unit-tests + taskRef: + name: just-unit-tests + workspaces: + - name: source-code + - name: test-results + init/before: + - taskRef: git-clone + params: + - name: url + value: $(params.url) + - name: revision + value: $(params.revision) + workspaces: + - name: source-code + workspace: source-code + finally/after: + - taskRef: gcs-upload + params: + - name: location + value: gs://my-test-results-bucket/testrun-$(taskRun.name) + workspaces: + - name: data + workspace: test-results +``` + +The `finally/after` Task(s) would run even if the previous steps fail. + +Pros: +* Is an optional addition to the existing types (doesn't require massive re-architecting) +* We have some initial indication (via PipelineResources) that this should be possible to do +* Maintains a line between when to use a complex DAG and when to use this functionality since this is only sequential + (but the line is fuzzy) + +Cons: +* Only helps us with some scheduling problems (e.g. doesn't help with parallel tasks or finally task execution) +* What if you _don't_ want the last Tasks to run if the previous tasks fail? + * Not clear how we would support more sophisticated use cases, e.g. if folks wanted to start mixing `when` expressions + into the `before/init` and/or `finally/after` Tasks +* If you want some other Task to run after these, you'll still need a workspace/volume + separate pod +* What if you want more flexibility than just before and after? (e.g. you want to completely control the ordering) + * Should still be possible, can put as many Tasks as you want into before and after + +Related: +* [Task Specialization: most appealing options?](https://docs.google.com/presentation/d/12QPKFTHBZKMFbgpOoX6o1--HyGqjjNJ7own6KqM-s68) +* [TEP-0054](https://github.com/tektoncd/community/pull/369) suggests something similar to this but: + * Uses "steps" as the unit + * Wants to combine these in the embedded Task spec vs in the Pipeline Task + +### Update PipelineResources to use Tasks + +In this option, we directly tackle problems with PipelineResources by updating them to refer to Tasks (e.g. catalog +Tasks). + +In the example below a PipelineResource type can refer to Tasks: + +```yaml +kind: PipelineResourceType +apiVersion: v1beta1 +metadata: + name: GCS +spec: + description: | + GCS PipelineResources download files onto a + Workspace from GCS when used as an input and uploads + files to GCS from a Workspace when used as an output. + input: + taskRef: + name: gcs-download # From catalog + output: + taskRef: + name: gcs-upload # From catalog +``` + +```yaml +kind: PipelineResourceType +apiVersion: v1beta1 +metadata: + name: GIT +spec: + description: | + GIT PipelineResources clone files from a Git + repo onto a Workspace when used as an input. It has + no output behaviour. + input: + taskRef: + name: git-clone # From catalog +``` + +```yaml +apiVersion: tekton.dev/v1beta1 +kind: Pipeline +metadata: + name: build-test-deploy +spec: + params: + - name: url + value: https://github.com/tektoncd/pipeline.git + - name: revision + value: v0.11.3 + workspaces: + - name: source-code + - name: test-results + tasks: + - name: run-unit-tests + taskRef: + name: just-unit-tests + workspaces: + - name: source-code + - name: test-results + resources: + inputs: + - resourceRef: GIT # the pipelineresource defined above + params: + - name: url + value: $(params.url) + - name: revision + value: $(params.revision) + workspaces: + - name: source-code + workspace: source-code + outputs: + - resourceRef: GCS # the pipelineresource defined above + params: + - name: location + value: gs://my-test-results-bucket/testrun-$(taskRun.name) + workspaces: + - name: data + workspace: test-results +``` + +(Credit to @sbwsg for this proposal and example!) + +If we pursue this we can make some choices around whether this works similar to today's PipelineResources where Tasks +need to declare that they expect them, or we could make it so that PipelineResources can be used with a Task regardless +of what it declares (the most flexible). + +Pros: +* "fixes" PipelineResources +* Uses concepts we already have in Tekton but upgrades them + +Cons: +* Not clear what the idea of a PipelineResource is really giving us if it's just a wrapper for Tasks +* If you want to use 2 Tasks together, you'll have to make a PipelineResource type for at least one of them +* Only helps us with some scheduling problems (e.g. doesn't help with parallel tasks or finally task execution) + +Related: +* [Specializing Tasks: Visions and Goals](https://docs.google.com/document/d/1G2QbpiMUHSs4LOqcNaIRswcdvoy8n7XuhTV8tXdcE7A/edit) +* [Specializing Tasks: Possible Designs](https://docs.google.com/document/d/1p8zq_wkAcwr1l5BpNQDyNjgWngOtnEhCYEpcNKMHvG4/edit) + +### Automagically combine Tasks based on workspace use + +In this option we could leave Pipelines as they are, but at runtime instead of mapping a Task to a pod, we could decide +what belongs in what pod based on workspace usage. + +In the example below, `get-source`, `run-unit-tests` and `upload-results` are all at least one of the two workspaces +so they will be executed as one pod, while `update-slack` would be run as a separate pod: + +```yaml +apiVersion: tekton.dev/v1beta1 +kind: Pipeline +metadata: + name: build-test-deploy +spec: + params: + - name: url + value: https://github.com/tektoncd/pipeline.git + - name: revision + value: v0.11.3 + workspaces: + - name: source-code + - name: test-results + tasks: + - name: get-source + workspaces: + - name: source-code + workspace: source-code + taskRef: + name: git-clone + params: + - name: url + value: $(params.url) + - name: revision + value: $(params.revision) + - name: run-unit-tests + runAfter: get-source + taskRef: + name: just-unit-tests + workspaces: + - name: source-code + workspcae: source-code + - name: test-results + workspace: test-results + - name: upload-results + runAfter: run-unit-tests + taskRef: + name: gcs-upload + params: + - name: location + value: gs://my-test-results-bucket/testrun-$(taskRun.name) + workspaces: + - name: data + workspace: test-results +finally: +- name: update-slack + params: + - name: message + value: "Tests completed with $(tasks.run-unit-tests.status) status" +``` + +Possible tweaks: +* We could do this scheduling only when + [a Task requires a workspace `from` another Task](https://github.com/tektoncd/pipeline/issues/3109). +* We could combine this with other options but have this be the default behavior + + +Pros: +* Doesn't require any changes for Pipeline or Task authors + +Cons: +* Will need to update our entrypoint logic to allow for steps running in parallel +* Doesn't give as much flexibility as being explicit + * This functionality might not even be desirable for folks who want to make use of multiple nodes + * We could mitigate this by adding more configuration, e.g. opt in or out at a Pipeline level, but could get + complicated if people want more control (e.g. opting in for one workspace but not another) + +### Introduce scheduling rules to pipeline + +In these options, we add some syntax that allows Pipeline authors to express how they want Tasks to be executed. + +#### Add "grouping" to tasks in a pipeline + +In this option we add some notion of "groups" into a Pipeline; any Tasks in a group will be scheduled together. + +In this example, everything in the `fetch-test-upload` group would be executed as one pod. The `update-slack` Task would +be a separate pod. + +```yaml +kind: Pipeline +metadata: + name: build-test-deploy +spec: + params: + - name: url + value: https://github.com/tektoncd/pipeline.git + - name: revision + value: v0.11.3 + workspaces: + - name: source-code + - name: test-results + tasks: + - name: get-source + group: fetch-test-upload # our new group syntax + workspaces: + - name: source-code + workspace: source-code + taskRef: + name: git-clone + params: + - name: url + value: $(params.url) + - name: revision + value: $(params.revision) + - name: run-unit-tests + group: fetch-test-upload # our new group syntax + runAfter: get-source + taskRef: + name: just-unit-tests + workspaces: + - name: source-code + workspcae: source-code + - name: test-results + workspace: test-results + - name: upload-results + group: fetch-test-upload # our new group syntax + runAfter: run-unit-tests + taskRef: + name: gcs-upload + params: + - name: location + value: gs://my-test-results-bucket/testrun-$(taskRun.name) + workspaces: + - name: data + workspace: test-results +finally: +- name: update-slack + params: + - name: message + value: "Tests completed with $(tasks.run-unit-tests.status) status" +``` + +Or we could have a group syntax that exists as a root element in the Pipeline, for example for the above: + +```yaml +groups: +- [get-source, run-unit-tests, upload-results] +``` + +Pros: +* Minimal changes for Pipeline authors + +Cons: +* Will need to update our entrypoint logic to allow for steps running in parallel + * We could (at least initially) only support sequential groups +* Might be hard to reason about what is executed together +* Might be hard to reason about what which Tasks can be combined in a group and which can't + +#### some other directive, e.g. labels, to indicate what should be scheduled together? + +This option is the same as the previous `groups` proposal but maybe we decide on some other ways to indicating grouping, +e.g. labels. + +### Runtime instead of authoring time + +These options pursue a solution that only works at runtime; this means Pipeline authors would not have any control +over the scheduling. + +#### PipelineRun: emptyDir + +In this solution we use the values provided at runtime for workspaces to determine what to run. Specifically, we allow +[`emptyDir`](https://github.com/tektoncd/pipeline/blob/a7ad683af52e3745887e6f9ed58750f682b4f07d/docs/workspaces.md#emptydir) +to be provided as a workspace at the Pipeline level even when that workspace is used by multiple Tasks, and when that +happens, we take that as the cue to schedule those Tasks together. + +For example given this Pipeline: + +```yaml +kind: Pipeline +metadata: + name: build-test-deploy +spec: + workspaces: + - name: source-code + - name: test-results + tasks: + - name: get-source + workspaces: + - name: source-code + workspace: source-code + taskRef: + name: git-clone + params: + - name: url + value: $(params.url) + - name: revision + value: $(params.revision) + - name: run-unit-tests + runAfter: get-source + taskRef: + name: just-unit-tests + workspaces: + - name: source-code + workspcae: source-code + - name: test-results + workspace: test-results + - name: upload-results + runAfter: run-unit-tests + taskRef: + name: gcs-upload + params: + - name: location + value: gs://my-test-results-bucket/testrun-$(taskRun.name) + workspaces: + - name: data + workspace: test-results +``` + +Running with this PipelineRun would cause `get-source` and `run-unit-tests` to be run in one pod, with `upload-results` +in another: + +```yaml +apiVersion: tekton.dev/v1beta1 +kind: PipelineRun +metadata: + name: run +spec: + pipelineRef: + name: build-test-deply + workspaces: + - name: source-code + emptyDir: {} + - name: test-results + persistentVolumeClaim: + claimName: mypvc +``` + +Running with this PipelineRun would cause all of the Tasks to be run in one pod: + +```yaml +apiVersion: tekton.dev/v1beta1 +kind: PipelineRun +metadata: + name: run +spec: + pipelineRef: + name: build-test-deply + workspaces: + - name: source-code + emptyDir: {} + - name: test-results + emptyDir: {} +``` + +Running with this PipelineRun would cause all of the Tasks to be run in separate pods: + +```yaml +apiVersion: tekton.dev/v1beta1 +kind: PipelineRun +metadata: + name: run +spec: + pipelineRef: + name: build-test-deply + workspaces: + - name: source-code + persistentVolumeClaim: + claimName: otherpvc + - name: test-results + persistentVolumeClaim: + claimName: mypvc +``` + +Pros: +* Allows runtime decisions about scheduling without changing the Pod + +Cons: +* If it's important for a Pipeline to be executed in a certain way, that information will have to be encoded somewhere + other than the Pipeline +* For very large Pipelines, this default behavior may cause problems (e.g. if the Pipeline is too large to be scheduled + into one pod) +* A bit strange and confusing to overload the meaning of `emptyDir`, might be simpler and clearer to have a field instead + +#### PipelineRun: field + +This is similar to the `emptyDir` based solution but instead of adding extra meaning to `emptyDir` we add a field to the +runtime workspace information or to the entire PipelineRun (maybe when this field is set workspaces do not need to be +provided.) + +A field could also be added as part of the Pipeline definition if desired (vs at runtime via a PipelineRun). + +#### Controller level + +This option is [TEP-0046](https://github.com/tektoncd/community/pull/318). In this option, the Tekton controller can +be configured to always execute Pipelines inside one pod. + +Pros: +* Authors of PipelineRuns and Pipelines don't have to think about how the Pipeline will be executed +* Pipelines can be used without updates + +Cons: +* Only cluster administrators will be able to control this scheduling, there will be no runtime or authoring time + flexibility +* Executing a pipeline in a pod will require significantly re-architecting our graph logic so it can execute outside + the controller and has a lot of gotchas we'll need to iron out (see + [https://hackmd.io/@vdemeester/SkPFtAQXd](https://hackmd.io/@vdemeester/SkPFtAQXd) for some more brainstorming) + +### Within the Task + +In this option we ignore the "not at Task authoring time" requirement and we allow for Tasks to contain other Tasks. + +This is similar to [TEP-0054](https://github.com/tektoncd/community/pull/369) which proposes this via the Task spec +in a Pipeline, but does not (yet) propose it for Tasks outside of Pipelines. + +For example: + +```yaml +apiVersion: tekton.dev/v1beta1 +kind: Task +metadata: + name: build-test-upload +spec: + workspaces: + - name: source + mountPath: /workspace/source/go/src/github.com/GoogleContainerTools/skaffold + steps: + - name: get-source + uses: git-clone + params: + url: $(params.url) + workspaces: + - name: source + workspace: source + - name: run-tests + image: golang + workingDir: $(workspaces.source.path) + script: | + go test + - name: upload-results + uses: gcs-upload +``` + +Pros: +* Doesn't require many new concepts + +Cons: +* Can create confusing chains of nested Tasks (Task A can contain Task B which can Contain Task C...) +* Requires creating new Tasks to leverage the reuse (maybe embedded specs negate this?) +* Doesn't help with parallel use cases + +### Remove distinction between Tasks and Pipelines + +In this version, we try to combine Tasks and Pipelines into one thing; e.g. by getting rid of Pipelines and adding all +the features they have to Tasks, and by giving Tasks the features that Pipelines have which they do not have. + +Things Tasks can do that Pipelines can't: +* Sidecars +* Refer to images (including args to images like script, command, args, env....) + +Things Pipelines can do that Tasks can't: +* Create DAGs, including running in parallel +* Finally +* When expressions + +For example, say our new thing is called a Foobar: + +```yaml +kind: Foobar +metadata: + name: git-clone +spec: + workspaces: + - name: source-code + foobars: + - name: get-source + steps: # or maybe each FooBar can only have 1 step and we need to use runAfter / dependencies to indicate ordering? + - name: clone + image: gcr.io/tekton-releases/github.com/tektoncd/pipeline/cmd/git-init:v0.21.0 + script: