Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rescheduler #109

Closed
23 tasks
davidopp opened this issue Oct 1, 2016 · 34 comments
Closed
23 tasks

Rescheduler #109

davidopp opened this issue Oct 1, 2016 · 34 comments
Assignees
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling.

Comments

@davidopp
Copy link
Member

davidopp commented Oct 1, 2016

Feature Description

  • One-line feature description (can be used as a release note):
  • Primary contact (assignee): @davidopp @aveshagarwal
  • Responsible SIGs: @kubernetes/sig-scheduling-feature-requests
  • Design proposal link (community repo):
  • Reviewer(s) - (for LGTM) recommend having 2+ reviewers (at least one from code-area OWNERS file) agreed to review. Reviewers from multiple companies preferred:
  • Approver (likely from SIG/area to which feature belongs):
  • Feature target (which target equals to which milestone):
    • Alpha release target (x.y)
    • Beta release target (x.y)
    • Stable release target (x.y)
# Description

A component that evicts pods (that are managed by a controller) to achieve some set of objectives.

This feature needs a detailed design doc; an initial design proposal is here.

Progress Tracker

  • Before Alpha
    • Write and maintain draft quality doc
      • During development keep a doc up-to-date about the desired experience of the feature and how someone can try the feature in its current state. Think of it as the README of your new feature and a skeleton for the docs to be written before the Kubernetes release. Paste link to Google Doc: DOC-LINK
    • Design Approval
      • Design Proposal. This goes under docs/proposals. Doing a proposal as a PR allows line-by-line commenting from community, and creates the basis for later design documentation. Paste link to merged design proposal here: PROPOSAL-NUMBER
      • Decide which repo this feature's code will be checked into. Not everything needs to land in the core kubernetes repo. REPO-NAME
      • Initial API review (if API). Maybe same PR as design doc. PR-NUMBER
        • Any code that changes an API (/pkg/apis/...)
        • cc @kubernetes/api
      • Identify shepherd (your SIG lead and/or kubernetes-pm@googlegroups.com will be able to help you). My Shepherd is: replace.me@replaceme.com (and/or GH Handle)
        • A shepherd is an individual who will help acquaint you with the process of getting your feature into the repo, identify reviewers and provide feedback on the feature. They are not (necessarily) the code reviewer of the feature, or tech lead for the area.
        • The shepherd is not responsible for showing up to Kubernetes-PM meetings and/or communicating if the feature is on-track to make the release goals. That is still your responsibility.
      • Identify secondary/backup contact point. My Secondary Contact Point is: replace.me@replaceme.com (and/or GH Handle)
    • Write (code + tests + docs) then get them merged. ALL-PR-NUMBERS
      • Code needs to be disabled by default. Verified by code OWNERS
      • Minimal testing
      • Minimal docs
        • cc @kubernetes/docs on docs PR
        • cc @kubernetes/feature-reviewers on this issue to get approval before checking this off
        • New apis: Glossary Section Item in the docs repo: kubernetes/kubernetes.github.io
      • Update release notes
  • Before Beta
    • Testing is sufficient for beta
    • User docs with tutorials
      • Updated walkthrough / tutorial in the docs repo: kubernetes/kubernetes.github.io
      • cc @kubernetes/docs on docs PR
      • cc @kubernetes/feature-reviewers on this issue to get approval before checking this off
    • Thorough API review
      • cc @kubernetes/api
  • Before Stable
    • docs/proposals/foo.md moved to docs/design/foo.md
      • cc @kubernetes/feature-reviewers on this issue to get approval before checking this off
    • Soak, load testing
    • detailed user docs and examples
      • cc @kubernetes/docs
      • cc @kubernetes/feature-reviewers on this issue to get approval before checking this off

FEATURE_STATUS is used for feature tracking and to be updated by @kubernetes/feature-reviewers.
FEATURE_STATUS: IN_DEVELOPMENT

More advice:

Design

  • Once you get LGTM from a @kubernetes/feature-reviewers member, you can check this checkbox, and the reviewer will apply the "design-complete" label.

Coding

  • Use as many PRs as you need. Write tests in the same or different PRs, as is convenient for you.
  • As each PR is merged, add a comment to this issue referencing the PRs. Code goes in the http://github.com/kubernetes/kubernetes repository,
    and sometimes http://github.com/kubernetes/contrib, or other repos.
  • When you are done with the code, apply the "code-complete" label.
  • When the feature has user docs, please add a comment mentioning @kubernetes/feature-reviewers and they will
    check that the code matches the proposed feature and design, and that everything is done, and that there is adequate
    testing. They won't do detailed code review: that already happened when your PRs were reviewed.
    When that is done, you can check this box and the reviewer will apply the "code-complete" label.

Docs

  • Write user docs and get them merged in.
  • User docs go into http://github.com/kubernetes/kubernetes.github.io.
  • When the feature has user docs, please add a comment mentioning @kubernetes/docs.
  • When you get LGTM, you can check this checkbox, and the reviewer will apply the "docs-complete" label.
@davidopp davidopp added the sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. label Oct 1, 2016
@idvoretskyi idvoretskyi modified the milestone: v1.5 Oct 11, 2016
@davidopp
Copy link
Member Author

Removing from 1.5 milestone.

@davidopp davidopp modified the milestones: next-milestone, v1.5 Oct 18, 2016
@aveshagarwal
Copy link
Member

milestone 1.7?

@davidopp
Copy link
Member Author

@aveshagarwal Will you be working on it for 1.7? If so, then yes we should set 1.7 milestone.

@aveshagarwal
Copy link
Member

@davidopp yes.

@davidopp davidopp modified the milestones: v1.7, next-milestone Apr 25, 2017
@davidopp
Copy link
Member Author

done

@idvoretskyi idvoretskyi added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label May 3, 2017
@idvoretskyi
Copy link
Member

@davidopp @aveshagarwal I've updated the feature description to fit the new template. Please, fill the empty fields in the new template (their actual state was unclear).

@davidopp
Copy link
Member Author

@aveshagarwal I assume we won't have any code for this in 1.7, probably just a design at most, so we should move it to next-milestone?

@aveshagarwal
Copy link
Member

@davidopp I am prototyping utilization based use case as per existing design doc in the current rescheduler code in contrib. So I am planning to have that by 1.7. But since it will be in contrib repo outside kube repo, not sure it would impact kube 1.7.

@aveshagarwal
Copy link
Member

aveshagarwal commented May 11, 2017

@davidopp the one thing that I am looking into is a new priority function based on node utilization that might be needed as part of existing scheduler, so that when a rescheduler moves a pod off a over utilized node, existing scheduler can schedule that pod to less/under utilized node to be in alignment with rescheduler decision. So that is the thing that might be needed for kube 1.7 as per my current understanding for the first version of rescheduler.

@gyliu513
Copy link

@aveshagarwal Does the rescheduler still only works for pods under kube-system ns after your work?

@davidopp
Copy link
Member Author

This issue is referring to a different rescheduler than the one we currently have. The naming is unfortunate. The current rescheduler will go away once #268 is implemented.

@gyliu513
Copy link

Good to know, thanks @davidopp

@fgrzadkowski
Copy link

fgrzadkowski commented May 12, 2017 via email

@aveshagarwal
Copy link
Member

Yes I am focusing on spreading use case based on node's resource utilization.

@vishh
Copy link
Contributor

vishh commented May 12, 2017 via email

@aveshagarwal
Copy link
Member

I think a benefit of the latter is to have a balanced cluster after following events, for rexample:

  1. a node comes back from maintenance
  2. auto scaling
  3. over time, pods' first scheduling decision might turn out a sub-optimal one.

To reschedule a pod experiencing performance issue or poor service is also an use case that we would like to handle eventually but not as a first step. Moreover, i think if we act pro actively, perhaps a pod may not probably experience poor service in the first place.

So there are various trigger that can cause a rescheduler to act like poor service as you mentioned and also node utilization and there are many others. But i think as per discussion, spreading based on node utilization seems to be the first step most users might be interested in.

@vishh
Copy link
Contributor

vishh commented May 12, 2017 via email

@aveshagarwal
Copy link
Member

I'd say to optimize (specifically minimize) number of over utilized nodes (x) in a cluster, such that 0<=x<=N where N<= Number of nodes in the cluster. Utilization threshold and N are configurable. also rescheduler makes best effort to optimize it but can not provide a guarantee. Also If it results in improved bin packing (as you mentioned) as a side effect, thats good, but not a direct effort to optimize it atleast for the first step.

@davidopp
Copy link
Member Author

Features repo should not be used for technical discussions. Please move the discussion to kubernetes/kubernetes#12140.

BTW @aveshagarwal it would probably be good if you were to write a short design doc for what you're doing.

@aveshagarwal
Copy link
Member

@davidopp Yeah sure, planning to have something by next week.

@idvoretskyi
Copy link
Member

@davidopp @aveshagarwal have you agreed to have this feature for 1.7? If yes, please, update the features template to reflect the actual status.

@davidopp
Copy link
Member Author

@aveshagarwal mentioned just one change that he might want in 1.7 #109 (comment)

But Avesh, what you described sounds like the current default scheduling policy (try to spread based on resources). So maybe you don't need a new priority function in 1.7?

@aveshagarwal
Copy link
Member

@davidopp Yeah that sounds good, so in that case does not seem any changes for kube for initial version, so should not impact kube 1.7.

Though, I was thinking a priority function based on actual resource utilization (like by obtaining metrics from something like heapster) which is different than how the existing spreading function works.

@davidopp
Copy link
Member Author

davidopp commented May 16, 2017

Though, I was thinking a priority function based on actual resource utilization (like by obtaining metrics from something like heapster) which is different than how the existing spreading function works.

We've talked about doing usage-based scheduling for best-effort pods (kubernetes/kubernetes#18438), but don't have it yet.

@idvoretskyi
Copy link
Member

@davidopp @aveshagarwal so, any update on the status, gentlemen?

@idvoretskyi
Copy link
Member

@davidopp @aveshagarwal does this feature going to land in 1.7? If not, I'll remove the 1.7 association.

@aveshagarwal
Copy link
Member

@idvoretskyi No.

@luxas luxas removed this from the v1.7 milestone Jun 15, 2017
@idvoretskyi
Copy link
Member

@davidopp @aveshagarwal @kubernetes/sig-scheduling-feature-requests any plans to continue the feature development for 1.9?

@idvoretskyi idvoretskyi added this to the next-milestone milestone Oct 2, 2017
@davidopp
Copy link
Member Author

davidopp commented Oct 3, 2017

There will be development in the future, but I'm not sure about 1.9. @aveshagarwal are you planning to do more work on this for 1.9?

@aveshagarwal
Copy link
Member

@davidopp @idvoretskyi yes, there will be on-going development for adding new features/functionalities and regular releases. Here is the repo: https://github.com/kubernetes-incubator/descheduler . After every kubernetes release, it will be rebased to latest kube release.

@idvoretskyi
Copy link
Member

@aveshagarwal @davidopp cool, I'll add this item for 1.9 features track.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 7, 2018
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 10, 2018
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

howardjohn pushed a commit to howardjohn/enhancements that referenced this issue Oct 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling.
Projects
None yet
Development

No branches or pull requests

9 participants