Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple descheduler strategies of the same type #486

Closed
irmiller22 opened this issue Jan 22, 2021 · 33 comments
Closed

Multiple descheduler strategies of the same type #486

irmiller22 opened this issue Jan 22, 2021 · 33 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@irmiller22
Copy link

irmiller22 commented Jan 22, 2021

Is your feature request related to a problem? Please describe.

k8s version: 1.18
descheduler version: 1.18

Currently, we have CronJob resources that are getting stuck due to this k8s issue: kubernetes/kubernetes#52172. We are using managed k8s (EKS), and 1.19 is not available yet. The issue is apparently fixed in 1.19, but we're unable to upgrade.

Long story short, a CronJob pod will spin up, and will immediately get stuck due to the bug described in the linked issue above, and the pod will ultimately end up in Waiting state with the reason being CreateContainerError.

State:          Waiting
  Reason:       CreateContainerError

The only way to address this currently is to manually delete the problematic pods and have k8s re-schedule the CronJob objects during the next scheduled run. Currently, these problematic pods are blocking the CronJob resource from scheduling new pods.

Describe the solution you'd like
We'd like to leverage descheduler to handle this case by allowing for multiple PodLifeTime strategies. As I understand it from the documentation, we're only allowed to implement one definition per strategy. We'd like to have a policy that applies only to our CronJob objects (which have a priority class called service-cron), and another policy that applies to all other pods.

For example, the below configuration would be desired:

apiVersion: "descheduler/v1alpha1"
kind: "DeschedulerPolicy"
strategies:
  "PodLifeTime":
     enabled: true
     params:
       podLifeTime:
         maxPodLifeTimeSeconds: 86400
         podStatusPhases:
         - "Pending"
  "PodLifeTime":
     enabled: true
     params:
       podLifeTime:
        maxPodLifeTimeSeconds: 86400
        podStatusPhases:
         - "Running"
        thresholdPriorityClassName: "service-cron" # this is a custom priority class

If there are other ways to accomplish this, please let me know! I'm open to suggestions.

Describe alternatives you've considered

We've considered upgrading our self-managed clusters to 1.19, but our managed clusters are unable to upgrade since 1.19 isn't available yet. We'd prefer not to run into version drift across our clusters.

What version of descheduler are you using?

descheduler version: 1.18

Additional context

@irmiller22 irmiller22 added the kind/feature Categorizes issue or PR as related to a new feature. label Jan 22, 2021
@damemi
Copy link
Contributor

damemi commented Jan 22, 2021

Hi @irmiller22, thanks for opening this issue. We have discussed similar requests before (I thought there was an open issue, but I couldn't find it), so there is definitely some validity to what you describe.

One option we discussed was something similar to scheduler profiles (https://kubernetes.io/docs/reference/scheduling/config/#multiple-profiles). This would take some minor refactoring but is certainly doable.

Since I couldn't find another issue for this, maybe we can officially open discussion on this here. @ingvagabund @seanmalloy wdyt?

@ingvagabund
Copy link
Contributor

We'd like to have a policy that applies only to our CronJob objects (which have a priority class called service-cron)

There's currently no way to separate pods into two distinct groups and have multiple instances of the same strategy to target different group. Specifying thresholdPriorityClassName in the strategy will still evict any pod that has the priority of the class of lower.

It's more better to allow a strategy to target specific group of pods based on a label selector. thresholdPriorityClassName is used for deciding if a pod gets evicted or not.

Nevertheless, +1 for allowing running multiple instances of the same strategy wherever it's applicable. We might start with PodLifeTime though it will require to re-design the policy file. E.g.

apiVersion: "descheduler/v1alpha2" <-- notice the version bump
kind: "DeschedulerPolicy"
strategies:
  - name: PodLifeTime
  - enabled: true
  ...
  - name: PodLifeTime
  - enabled: true
  ...

@damemi
Copy link
Contributor

damemi commented Jan 25, 2021

There's currently no way to separate pods into two distinct groups and have multiple instances of the same strategy to target different group

This was another concern I had. I wonder if implementing an annotation like deschedulerName on the pods would work with "multiple" deschedulers or if that's overcomplicating the project

@ingvagabund
Copy link
Contributor

annotation like deschedulerName on the pods

As long as it's guided by applications to choose which descheduler is more suitable. Not to run the same descheduler just with a different name and different set of strategies. Imho, it's orthogonal to this issue.

@bytetwin
Copy link
Contributor

I would like to pick this up if its still under consideration. Got some questions

  • Should the label selector be the differentiator between different instances of the same strategy. Having a default version of the strategy and its variants targeting a specific labeled pods. Pods matching label will execute the labeled strategy while others will use the default. This will make default as must
  • re-design the policy file - does alpha2 co-exist with alpha 1? Need some help here on how. The apimachinery supports only v1 and v1beta
  • What are the target strategies for this ?

@seanmalloy
Copy link
Member

I've given my perspective on your questions inline below. Prior to starting implementation we need to have some sort of high level agreement on what exactly the v1alpha2 DeschedulerPolicy will look like. For starters I recommend adding a well thought out comment on this issues with a detailed proposal for v1alpha2. Let's also see what @damemi and @ingvagabund think about this.

  • Should the label selector be the differentiator between different instances of the same strategy. Having a default version of the strategy and its variants targeting a specific labeled pods. Pods matching label will execute the labeled strategy while others will use the default. This will make default as must

The only other option I can think if using an annotation which was mentioned in #486 (comment).

  • re-design the policy file - does alpha2 co-exist with alpha 1? Need some help here on how. The apimachinery supports only v1 and v1beta

I believe it makes sense to have v1alpha1 one co-exist with v1alpha2.

  • What are the target strategies for this ?

I think we would want to allow multiple descheduler profiles for all strategies.

@ingvagabund
Copy link
Contributor

ingvagabund commented Apr 23, 2021

@hanumanthan thanks for picking this up.

  • Should the label selector be the differentiator between different instances of the same strategy. Having a default version of the strategy and its variants targeting a specific labeled pods. Pods matching label will execute the labeled strategy while others will use the default. This will make default as must

All strategies up to LowNodeUtilization can already specify a label selector: #510. As long as only admin needs to configure the descheduler, this is sufficient.

  • re-design the policy file - does alpha2 co-exist with alpha 1? Need some help here on how. The apimachinery supports only v1 and v1beta

I believe it makes sense to have v1alpha1 one co-exist with v1alpha2.

Providing multiple instances of the same strategy requires incompatible changes. So we will need to support both versions at the same time for at least 3 releases. Also, I'd like to keep the changes simple so we can simply convert v1alpha1 to v1alpha2.

With that, we might also get rid of enable: true and replace it with disabled: false (which will not be specified by default to reduce the number of params). Not sure what was the historical reason to create the inverted param. @ravisantoshgudimetla do you recall?

  • What are the target strategies for this ?

In theory all of them. As long as a strategy allows to set a namespace alongside other parameter (e.g. thresholdPriority). Even LowNodeUtilization can be utilized here if we allow to set a node selector to create multiple (and non-overlapping) node pools. On the other hand, it will be up to a user to make sure multiple instances of the same strategy do not interfere with each other where they are not supposed to.

Back to

differentiator between different instances of the same strategy

We can't reasonably check within the descheduler that every configuration is conflict free. Two instances of the same strategy can have non-overlapping groups of pods based on a label selector. Though, there's nothing forbidding to have a group that is targeted by both label selectors. Given this can also change dynamically.

@damemi
Copy link
Contributor

damemi commented Apr 23, 2021

As @seanmalloy said, this is a much higher level discussion which will require a design doc for the new version. A new API version is an opportunity for us to resolve other possible improvements we've uncovered as the project as grown (one example is the change I suggested in #314).

  • re-design the policy file - does alpha2 co-exist with alpha 1? Need some help here on how. The apimachinery supports only v1 and v1beta

I believe it makes sense to have v1alpha1 one co-exist with v1alpha2.

Providing multiple instances of the same strategy requires incompatible changes. So we will need to support both versions at the same time for at least 3 releases. Also, I'd like to keep the changes simple so we can simply convert v1alpha1 to v1alpha2.

Technically, an alpha API can be removed at any time, especially since this is just a config API and Descheduler is the only consumer of it.

However as good citizens it would be preferable to support both APIs to allow users time to switch over. This will require us to generate conversion functions between the two (probably with generate-internal-groups.sh). @hanumanthan as a side note here, this versioning is our own and is separate from the versions in apimachinery, which is a different API.

I will start a design document to gather some of the notes that have been mentioned here so far

@ravisantoshgudimetla
Copy link
Contributor

With that, we might also get rid of enable: true and replace it with disabled: false (which will not be specified by default to reduce the number of params). Not sure what was the historical reason to create the inverted param. @ravisantoshgudimetla do you recall?

I don't recall the exact reason but I believe we have done it for the sake of being in sync with featuregate flags which used to have Enabled: true IIRC

@bytetwin
Copy link
Contributor

Thanks @damemi I will wait for the design doc to start working on this feature.
Happy to be a collaborator for the design though I may have very less inputs for it

@damemi
Copy link
Contributor

damemi commented Apr 27, 2021

Here is a link to the doc I started: https://docs.google.com/document/d/1S1JCh-0F-QCJvBBG-kbmXiHAJFF8doArhDIAKbOj93I/edit#

Feel free to comment there or propose any alternatives. I think the options are fairly straightforward

@k8s-triage-robot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 27, 2021
@ingvagabund
Copy link
Contributor

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 27, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 25, 2021
@damemi
Copy link
Contributor

damemi commented Oct 25, 2021

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 25, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 23, 2022
@diranged
Copy link

diranged commented Feb 3, 2022

Just adding a voice for support on this issue.. the ability to define different TTLs for pods based on different labels is pretty important to us..

@seanmalloy
Copy link
Member

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 4, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 5, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 4, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ingvagabund ingvagabund reopened this Aug 24, 2022
@ingvagabund ingvagabund removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Aug 24, 2022
@migueleliasweb
Copy link

I'm also keen to see this moving. Just like @diranged , in the company that I work we really wanted to have specific strategies per namespace/label. Without this, it's hard to adopt the descheduler - although it does do what we want it doesn't to the way we wanted.

I initially thought the descheduler would be a fully fledged operator where we could provide CRDs in a distributed way and it would merge/compile the strategies/policies into one config and run it. If that was the case, it would be awesome.

(A question for the core team): Would you be willing to move torwards the "operator with CRDs" target?

@jacobstr
Copy link

jacobstr commented Nov 16, 2022

  • Is running multiple instances of kube-descheduler, each with a dedicated policy.yaml viable workaround in the short term? Any perceived gotchas with this approach? More requests on the API server than necessary seems immediate, but I think I'm more worried about correctness.
  • What is the rationale for adding an array of profiles to a new "resource type" vs. simply allowing multiple policies to co-exist? Unlike scheduling, where you can configure custom schedulers, such a capability does not exist for descheduling so I wonder if patterning the API design in manner inspired by scheduler profiles is necessary.

@jacobstr
Copy link

Has this been succeeded by #926? It appears the original design doc went stale in April 2021 and was replaced by a new document?

If so, might I suggest closing this since in favor of !926? I found it a bit confusing to find the new design.

@ingvagabund
Copy link
Contributor

Many of the issues were kept open so we can keep track of the original requests.

@damemi would you be ok referring the new design in https://docs.google.com/document/d/1S1JCh-0F-QCJvBBG-kbmXiHAJFF8doArhDIAKbOj93I/edit#?

@damemi
Copy link
Contributor

damemi commented Nov 16, 2022

@damemi would you be ok referring the new design in https://docs.google.com/document/d/1S1JCh-0F-QCJvBBG-kbmXiHAJFF8doArhDIAKbOj93I/edit#?

Yup, done

Also, agree with @ingvagabund on keeping this open. This issue is a subset of the overall design of v1alpha2 api in #926

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 14, 2023
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 16, 2023
@ingvagabund ingvagabund removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Mar 21, 2023
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 19, 2023
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 19, 2023
@a7i
Copy link
Contributor

a7i commented Oct 25, 2023

this is now possible using profiles in v1alpha2
/close

@k8s-ci-robot
Copy link
Contributor

@a7i: Closing this issue.

In response to this:

this is now possible using profiles in v1alpha2
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

Successfully merging a pull request may close this issue.