Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Start affinity-assistant lazily #3540

Closed
JPEWdev opened this issue Nov 18, 2020 · 5 comments
Closed

Start affinity-assistant lazily #3540

JPEWdev opened this issue Nov 18, 2020 · 5 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@JPEWdev
Copy link

JPEWdev commented Nov 18, 2020

Feature request

Only start the affinity assistance before parallel tasks start executing in a Pipeline

Use case

My understanding is that the purpose affinity assistant is to make tasks running in parallel execute on the same node so that they can share a PV, however I've noticed that it can prevent optimal scheduling of pipelines runs, particularly when the number of pipelines running greatly exceeds the cluster capacity. The affinity assistance gets created and effectively pins all the pipelines to a specific cluster node because it has such a small resource request that it can basically run anywhere. However, this means that every pipeline/task has it's node chosen before it executes. If that particular node is less capable than other nodes, this may result in non-optimal scheduling. If the affinity assistant was only created at the point where parallel tasks were detected in the pipeline DAG, A) it would never run for linear pipelines, and B) it would better allow the natural scheduling of tasks to free nodes during overload

@JPEWdev JPEWdev added the kind/feature Categorizes issue or PR as related to a new feature. label Nov 18, 2020
@jlpettersson
Copy link
Member

jlpettersson commented Nov 20, 2020

The affinity assistance gets created and effectively pins all the pipelines to a specific cluster node because it has such a small resource request that it can basically run anywhere.

This should not be the case. The Affinity Assistant is configured with PodAntiAffinity to other Affinity Assistants - such that they should repel - in a best-effort fashion.

However, this means that every pipeline/task has it's node chosen before it executes.

Yes, the point with the Affinity Assistant is to schedule tasks that use the same volume to the node were the volume is mounted.

If that particular node is less capable than other nodes, this may result in non-optimal scheduling.

Yes. The scheduling can indeed be non-optimal with the Affinity Assistant. But it can not be improved much as it is implemented now. By implementing this feature as a Custom Scheduler instead of a Pod can hopefully improve on this - but scheduling is not so easy problem, especially not for Tekton Tasks that share a volume.

If the affinity assistant was only created at the point where parallel tasks were detected in the pipeline DAG, A) it would never run for linear pipelines

There are actually several benefits with the Affinity Assistant in a linear pipeline - as is:

  • It makes the Pipeline much faster - as it commonly takes 25-35 seconds to mount a volume - without the Affinity Assistant this will add (30 seconds x number of tasks), while with the Affinity Assistant - the volume is only mounted once at one node.
  • In a regional cluster - it prevents deadlocks since the tasks must be scheduled to the same Availabilty Zone as the volume - since volumes typically is a zonal resource.

@tekton-robot
Copy link
Collaborator

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale with a justification.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 21, 2021
@tekton-robot
Copy link
Collaborator

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 23, 2021
@tekton-robot
Copy link
Collaborator

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen with a justification.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/close

Send feedback to tektoncd/plumbing.

@tekton-robot
Copy link
Collaborator

@tekton-robot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen with a justification.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/close

Send feedback to tektoncd/plumbing.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

3 participants