Start affinity-assistant lazily #3540

JPEWdev · 2020-11-18T20:26:22Z

Feature request

Only start the affinity assistance before parallel tasks start executing in a Pipeline

Use case

My understanding is that the purpose affinity assistant is to make tasks running in parallel execute on the same node so that they can share a PV, however I've noticed that it can prevent optimal scheduling of pipelines runs, particularly when the number of pipelines running greatly exceeds the cluster capacity. The affinity assistance gets created and effectively pins all the pipelines to a specific cluster node because it has such a small resource request that it can basically run anywhere. However, this means that every pipeline/task has it's node chosen before it executes. If that particular node is less capable than other nodes, this may result in non-optimal scheduling. If the affinity assistant was only created at the point where parallel tasks were detected in the pipeline DAG, A) it would never run for linear pipelines, and B) it would better allow the natural scheduling of tasks to free nodes during overload

jlpettersson · 2020-11-20T23:32:01Z

The affinity assistance gets created and effectively pins all the pipelines to a specific cluster node because it has such a small resource request that it can basically run anywhere.

This should not be the case. The Affinity Assistant is configured with PodAntiAffinity to other Affinity Assistants - such that they should repel - in a best-effort fashion.

However, this means that every pipeline/task has it's node chosen before it executes.

Yes, the point with the Affinity Assistant is to schedule tasks that use the same volume to the node were the volume is mounted.

If that particular node is less capable than other nodes, this may result in non-optimal scheduling.

Yes. The scheduling can indeed be non-optimal with the Affinity Assistant. But it can not be improved much as it is implemented now. By implementing this feature as a Custom Scheduler instead of a Pod can hopefully improve on this - but scheduling is not so easy problem, especially not for Tekton Tasks that share a volume.

If the affinity assistant was only created at the point where parallel tasks were detected in the pipeline DAG, A) it would never run for linear pipelines

There are actually several benefits with the Affinity Assistant in a linear pipeline - as is:

It makes the Pipeline much faster - as it commonly takes 25-35 seconds to mount a volume - without the Affinity Assistant this will add (30 seconds x number of tasks), while with the Affinity Assistant - the volume is only mounted once at one node.
In a regional cluster - it prevents deadlocks since the tasks must be scheduled to the same Availabilty Zone as the volume - since volumes typically is a zonal resource.

tekton-robot · 2021-02-21T20:40:41Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale with a justification.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

tekton-robot · 2021-03-23T21:18:44Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

tekton-robot · 2021-05-07T23:40:43Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen with a justification.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/close

Send feedback to tektoncd/plumbing.

tekton-robot · 2021-05-07T23:40:45Z

@tekton-robot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen with a justification.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/close

Send feedback to tektoncd/plumbing.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

JPEWdev added the kind/feature Categorizes issue or PR as related to a new feature. label Nov 18, 2020

tekton-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 21, 2021

tekton-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 23, 2021

tekton-robot closed this as completed May 7, 2021

lbernick mentioned this issue Apr 14, 2023

TEP-0135: Per-PipelineRun (instead of per-workspace) affinity assistant #6543

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Start affinity-assistant lazily #3540

Start affinity-assistant lazily #3540

JPEWdev commented Nov 18, 2020 •

edited

Loading

jlpettersson commented Nov 20, 2020 •

edited

Loading

tekton-robot commented Feb 21, 2021

tekton-robot commented Mar 23, 2021

tekton-robot commented May 7, 2021

tekton-robot commented May 7, 2021

Start affinity-assistant lazily #3540

Start affinity-assistant lazily #3540

Comments

JPEWdev commented Nov 18, 2020 • edited Loading

Feature request

Use case

jlpettersson commented Nov 20, 2020 • edited Loading

tekton-robot commented Feb 21, 2021

tekton-robot commented Mar 23, 2021

tekton-robot commented May 7, 2021

tekton-robot commented May 7, 2021

JPEWdev commented Nov 18, 2020 •

edited

Loading

jlpettersson commented Nov 20, 2020 •

edited

Loading