-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add Node Affinity for TaskRuns that share PVC workspace
TaskRuns within a PipelineRun may share files using a workspace volume. The typical case is files from a git-clone operation. Tasks in a CI-pipeline often perform operations on the filesystem, e.g. generate files or analyze files, so the workspace abstraction is very useful. The Kubernetes way of using file volumes is by using [PersistentVolumeClaims](https://kubernetes.io/docs/concepts/storage/persistent-volumes/#persistentvolumeclaims). PersistentVolumeClaims use PersistentVolumes with different [access modes](https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes). The most commonly available PV access mode is ReadWriteOnce, volumes with this access mode can only be mounted on one Node at a time. When using parallel Tasks in a Pipeline, the pods for the TaskRuns is scheduled to any Node, most likely not to the same Node in a cluster. Since volumes with the commonly available ReadWriteOnce access mode cannot be use by multiple nodes at a time, these "parallel" pods is forced to execute sequentially, since the volume only is available on one node at a time. This may make that your TaskRuns time out. Clusters are often _regional_, e.g. they are deployed across 3 Availability Zones, but Persistent Volumes are often _zonal_, e.g. they are only available for the Nodes within a single zone. Some cloud providers offer regional PVs, but sometimes regional PVs is only replicated to one additional zone, e.g. not all 3 zones within a region. This works fine for most typical stateful application, but Tekton uses storage in a different way - it is designed so that multiple pods access the same volume, in a sequece or parallel. This makes it difficult to design a Pipeline that starts with parallel tasks using its own PVC and then have a common tasks that mount the volume from the earlier tasks - since - what happens if those tasks were scheduled to different zones - the common task can not mount the PVCs that now is located in different zones, so the PipelineRun is deadlocked. There are a few technical solutions that offer parallel executions of Tasks even when sharing PVC workspace: - Using PVC access mode ReadWriteMany. But this access mode is not widely available, and is typically a NFS server or another not so "cloud native" solution. - An alternative is to use a storage that is tied to a specific node, e.g. local volume and then configure so pods are scheduled to this node, but this is not commonly available and it has drawbacks, e.g. the pod may need to consume and mount a whole disk e.g. several hundreds GB. Consequently, it would be good to find a way so that TaskRun pods that share workspace are scheduled to the same Node - and thereby make it easy to use parallel tasks with workspace - while executing concurrently - on widely available Kubernetes cluster and storage configurations. A few alternative solutions have been considered, as documented in #2586. However, they all have major drawbacks, e.g. major API and contract changes. This commit introduces an "Affinity Assistant" - a minimal placeholder-pod, so that it is possible to use [Kubernetes inter-pod affinity](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#inter-pod-affinity-and-anti-affinity) for TaskRun pods that need to be scheduled to the same Node. This solution has several benefits: it does not introduce any API changes, it does not break or change any existing Tekton concepts and it is implemented with very few changes. Additionally it can be disabled with a feature-flag. **How it works:** When a PipelineRun is initiated, an "Affinity Assistant" is created for each PVC workspace volume. TaskRun pods that share workspace volume is configured with podAffinity to the "Affinity Assisant" pod that was created for the volume. The "Affinity Assistant" lives until the PipelineRun is completed, or deleted. "Affinity Assistant" pods are configured with podAntiAffinity to repel other "Affinity Assistants" - in a Best Effort fashion. The Affinity Assistant is _singleton_ workload, since it acts as a placeholder pod and TaskRun pods with affinity must be scheduled to the same Node. It is implemented with [QoS class Guaranteed](https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/#create-a-pod-that-gets-assigned-a-qos-class-of-guaranteed) but with minimal resource requests - since it does not provide any work other than beeing a placeholder. Singleton workloads can be implemented in multiple ways, and they differ in behavior when the Node becomes unreachable: - as a Pod - the Pod is not managed, so it will not be recreated. - as a Deployment - the Pod will be recreated and puts Availability before the singleton property - as a StatefulSet - the Pod will be recreated but puds the singleton property before Availability Therefor the Affinity Assistant is implemented as a StatefulSet. Essentialy this commit provides an effortless way to use a functional task parallelism with any Kubernetes cluster that has any PVC based storage. Solves #2586 /kind feature
- Loading branch information
1 parent
1fbac2a
commit 0104371
Showing
13 changed files
with
817 additions
and
40 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
205 changes: 205 additions & 0 deletions
205
examples/v1beta1/pipelineruns/pipeline-run-with-parallel-tasks-using-pvc.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,205 @@ | ||
# This example shows how both sequential and parallel Tasks can share data | ||
# using a PersistentVolumeClaim as a workspace. The TaskRun pods that share | ||
# workspace will be scheduled to the same Node in your cluster with an | ||
# Affinity Assistant (unless it is disabled). The REPORTER task does not | ||
# use a workspace so it does not get affinity to the Affinity Assistant | ||
# and can be scheduled to any Node. If multiple concurrent PipelineRuns is | ||
# executed, their Affinity Assistant pods will repel eachother to different | ||
# Nodes in a Best Effort fashion. | ||
# | ||
# A PipelineRun will pass a message parameter to the Pipeline in this example. | ||
# The STARTER task will write the message to a file in the workspace. The UPPER | ||
# and LOWER tasks will execute in parallel and process the message written by | ||
# the STARTER, and transform it to upper case and lower case. The REPORTER task | ||
# is will use the Task Result from the UPPER task and print it - it is intended | ||
# to mimic a Task that sends data to an external service and shows a Task that | ||
# doesn't use a workspace. The VALIDATOR task will validate the result from | ||
# UPPER and LOWER. | ||
# | ||
# Use the runAfter property in a Pipeline to configure that a task depend on | ||
# another task. Output can be shared both via Task Result (e.g. like REPORTER task) | ||
# or via files in a workspace. | ||
# | ||
# -- (upper) -- (reporter) | ||
# / \ | ||
# (starter) (validator) | ||
# \ / | ||
# -- (lower) ------------ | ||
|
||
apiVersion: tekton.dev/v1beta1 | ||
kind: Pipeline | ||
metadata: | ||
name: parallel-pipeline | ||
spec: | ||
params: | ||
- name: message | ||
type: string | ||
|
||
workspaces: | ||
- name: ws | ||
|
||
tasks: | ||
- name: starter # Tasks that does not declare a runAfter property | ||
taskRef: # will start execution immediately | ||
name: persist-param | ||
params: | ||
- name: message | ||
value: $(params.message) | ||
workspaces: | ||
- name: task-ws | ||
workspace: ws | ||
subPath: init | ||
|
||
- name: upper | ||
runAfter: # Note the use of runAfter her to declare that this task | ||
- starter # depend on a previous task | ||
taskRef: | ||
name: to-upper | ||
params: | ||
- name: input-path | ||
value: init/message | ||
workspaces: | ||
- name: w | ||
workspace: ws | ||
|
||
- name: lower | ||
runAfter: | ||
- starter | ||
taskRef: | ||
name: to-lower | ||
params: | ||
- name: input-path | ||
value: init/message | ||
workspaces: | ||
- name: w | ||
workspace: ws | ||
|
||
- name: reporter # This task does not use workspace and may be scheduled to | ||
runAfter: # any Node in the cluster. | ||
- upper | ||
taskRef: | ||
name: result-reporter | ||
params: | ||
- name: result-to-report | ||
value: $(tasks.upper.results.message) # A result from a previous task is used as param | ||
|
||
- name: validator # This task validate the output from upper and lower Task | ||
runAfter: # It does not strictly depend on the reporter Task | ||
- reporter # But you may want to skip this task if the reporter Task fail | ||
- lower | ||
taskRef: | ||
name: validator | ||
workspaces: | ||
- name: files | ||
workspace: ws | ||
--- | ||
apiVersion: tekton.dev/v1beta1 | ||
kind: Task | ||
metadata: | ||
name: persist-param | ||
spec: | ||
params: | ||
- name: message | ||
type: string | ||
results: | ||
- name: message | ||
description: A result message | ||
steps: | ||
- name: write | ||
image: ubuntu | ||
script: echo $(params.message) | tee $(workspaces.task-ws.path)/message $(results.message.path) | ||
workspaces: | ||
- name: task-ws | ||
--- | ||
apiVersion: tekton.dev/v1beta1 | ||
kind: Task | ||
metadata: | ||
name: to-upper | ||
spec: | ||
description: | | ||
This task read and process a file from the workspace and write the result | ||
both to a file in the workspace and as a Task Result. | ||
params: | ||
- name: input-path | ||
type: string | ||
results: | ||
- name: message | ||
description: Input message in upper case | ||
steps: | ||
- name: to-upper | ||
image: ubuntu | ||
script: cat $(workspaces.w.path)/$(params.input-path) | tr '[:lower:]' '[:upper:]' | tee $(workspaces.w.path)/upper $(results.message.path) | ||
workspaces: | ||
- name: w | ||
--- | ||
apiVersion: tekton.dev/v1beta1 | ||
kind: Task | ||
metadata: | ||
name: to-lower | ||
spec: | ||
description: | | ||
This task read and process a file from the workspace and write the result | ||
both to a file in the workspace and as a Task Result | ||
params: | ||
- name: input-path | ||
type: string | ||
results: | ||
- name: message | ||
description: Input message in lower case | ||
steps: | ||
- name: to-lower | ||
image: ubuntu | ||
script: cat $(workspaces.w.path)/$(params.input-path) | tr '[:upper:]' '[:lower:]' | tee $(workspaces.w.path)/lower $(results.message.path) | ||
workspaces: | ||
- name: w | ||
--- | ||
apiVersion: tekton.dev/v1beta1 | ||
kind: Task | ||
metadata: | ||
name: result-reporter | ||
spec: | ||
description: | | ||
This task is supposed to mimic a service that post data from the Pipeline, | ||
e.g. to an remote HTTP service or a Slack notification. | ||
params: | ||
- name: result-to-report | ||
type: string | ||
steps: | ||
- name: report-result | ||
image: ubuntu | ||
script: echo $(params.result-to-report) | ||
--- | ||
apiVersion: tekton.dev/v1beta1 | ||
kind: Task | ||
metadata: | ||
name: validator | ||
spec: | ||
steps: | ||
- name: validate-upper | ||
image: ubuntu | ||
script: cat $(workspaces.files.path)/upper | grep HELLO\ TEKTON | ||
- name: validate-lower | ||
image: ubuntu | ||
script: cat $(workspaces.files.path)/lower | grep hello\ tekton | ||
workspaces: | ||
- name: files | ||
--- | ||
apiVersion: tekton.dev/v1beta1 | ||
kind: PipelineRun | ||
metadata: | ||
generateName: parallel-pipelinerun- | ||
spec: | ||
params: | ||
- name: message | ||
value: Hello Tekton | ||
pipelineRef: | ||
name: parallel-pipeline | ||
workspaces: | ||
- name: ws | ||
volumeClaimTemplate: | ||
spec: | ||
accessModes: | ||
- ReadWriteOnce | ||
resources: | ||
requests: | ||
storage: 1Gi |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.