Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubelet evicts resources when observing disk pressure #39

Closed
22 tasks
derekwaynecarr opened this issue Jul 21, 2016 · 18 comments
Closed
22 tasks

kubelet evicts resources when observing disk pressure #39

derekwaynecarr opened this issue Jul 21, 2016 · 18 comments
Assignees
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/node Categorizes an issue or PR as relevant to SIG Node.
Milestone

Comments

@derekwaynecarr
Copy link
Member

derekwaynecarr commented Jul 21, 2016

Description

As a cluster operator, I want the kubelet to monitor local disk usage and respond accordingly to maintain node stability. If the kubelet observes available disk and/or inodes (rootfs or imagefs) are under pressure, the kubelet should pro-actively reclaim related resource to maintain node stability by deleting images, logs, and evicting pods.

Progress Tracker

  • Before Alpha
    • Write and maintain draft quality doc
      • During development keep a doc up-to-date about the desired experience of the feature and how someone can try the feature in its current state. Think of it as the README of your new feature and a skeleton for the docs to be written before the Kubernetes release. Paste link to Google Doc: DOC-LINK
    • Design Approval
      • Design Proposal. This goes under docs/proposals. Doing a proposal as a PR allows line-by-line commenting from community, and creates the basis for later design documentation. Paste link to merged design proposal here: PROPOSAL-NUMBER
      • Initial API review (if API). Maybe same PR as design doc. PR-NUMBER
        • Any code that changes an API (/pkg/apis/...)
        • cc @kubernetes/api
      • Identify shepherd (your SIG lead and/or kubernetes-pm@googlegroups.com will be able to help you). My Shepherd is: replace.me@replaceme.com (and/or GH Handle)
        • A shepherd is an individual who will help acquaint you with the process of getting your feature into the repo, identify reviewers and provide feedback on the feature. They are not (necessarily) the code reviewer of the feature, or tech lead for the area.
        • The shepherd is not responsible for showing up to Kubernetes-PM meetings and/or communicating if the feature is on-track to make the release goals. That is still your responsibility.
      • Identify secondary/backup contact point. My Secondary Contact Point is: replace.me@replaceme.com (and/or GH Handle)
    • Write (code + tests + docs) then get them merged. ALL-PR-NUMBERS
      • Code needs to be disabled by default. Verified by code OWNERS
      • Minimal testing
      • Minimal docs
        • cc @kubernetes/docs on docs PR
        • cc @kubernetes/feature-reviewers on this issue to get approval before checking this off
        • New apis: Glossary Section Item in the docs repo: kubernetes/kubernetes.github.io
      • Update release notes
  • Before Beta
    • Testing is sufficient for beta
    • User docs with tutorials
      • Updated walkthrough / tutorial in the docs repo: kubernetes/kubernetes.github.io
      • cc @kubernetes/docs on docs PR
      • cc @kubernetes/feature-reviewers on this issue to get approval before checking this off
    • Thorough API review
      • cc @kubernetes/api
  • Before Stable
    • docs/proposals/foo.md moved to docs/design/foo.md
      • cc @kubernetes/feature-reviewers on this issue to get approval before checking this off
    • Soak, load testing
    • detailed user docs and examples
      • cc @kubernetes/docs
      • cc @kubernetes/feature-reviewers on this issue to get approval before checking this off

FEATURE_STATUS is used for feature tracking and to be updated by @kubernetes/feature-reviewers.
FEATURE_STATUS: IN_DEVELOPMENT

More advice:

Design

  • Once you get LGTM from a @kubernetes/feature-reviewers member, you can check this checkbox, and the reviewer will apply the "design-complete" label.

Coding

  • Use as many PRs as you need. Write tests in the same or different PRs, as is convenient for you.
  • As each PR is merged, add a comment to this issue referencing the PRs. Code goes in the http://github.com/kubernetes/kubernetes repository,
    and sometimes http://github.com/kubernetes/contrib, or other repos.
  • When you are done with the code, apply the "code-complete" label.
  • When the feature has user docs, please add a comment mentioning @kubernetes/feature-reviewers and they will
    check that the code matches the proposed feature and design, and that everything is done, and that there is adequate
    testing. They won't do detailed code review: that already happened when your PRs were reviewed.
    When that is done, you can check this box and the reviewer will apply the "code-complete" label.

Docs

  • Write user docs and get them merged in.
  • User docs go into http://github.com/kubernetes/kubernetes.github.io.
  • When the feature has user docs, please add a comment mentioning @kubernetes/docs.
  • When you get LGTM, you can check this checkbox, and the reviewer will apply the "docs-complete" label.
@derekwaynecarr derekwaynecarr changed the title kubelet evicts resources when observing disk pressure to maintain node stability kubelet evicts resources when observing disk pressure Jul 21, 2016
@derekwaynecarr
Copy link
Member Author

/cc @kubernetes/sig-node

@vishh
Copy link
Contributor

vishh commented Jul 23, 2016

cc @ronnielai

@timothysc
Copy link
Member

Given todays conversation, perhaps we could have a default policy of: "should pro-actively reclaim related resource to maintain node stability by deleting images, logs, and evicting pods." with the potential of firing an administrator controlled script which could also apply to other resource dimensions.

/cc @nqn

@derekwaynecarr
Copy link
Member Author

I think that is feature creep. The goal is disk. Other resource
dimensions can be monitored outside of the Kubelet. We can discuss in
sig-node.

On Monday, July 25, 2016, Timothy St. Clair notifications@github.com
wrote:

Given todays conversation, perhaps we could have a default policy of:
"should pro-actively reclaim related resource to maintain node stability by
deleting images, logs, and evicting pods." with the potential of firing an
administrator controlled script which could also apply to other resource
dimensions.

/cc @nqn https://github.com/nqn


You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
#39 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AF8dbKgYiMydscDH1bMwo52nrkK3105qks5qZRiWgaJpZM4JR21_
.

@idvoretskyi idvoretskyi added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Aug 4, 2016
@derekwaynecarr
Copy link
Member Author

All PRs planned for 1.4 have merged in time for feature freeze, I will update the check-list next week.

@janetkuo
Copy link
Member

janetkuo commented Sep 2, 2016

@derekwaynecarr Are the docs ready? Please update the docs in https://github.com/kubernetes/kubernetes.github.io, and then add PR numbers and check the docs box in the issue description

@derekwaynecarr
Copy link
Member Author

Docs are coming next week unless @ronnielai has anything yet? I think we
can update the existing eviction doc pretty quickly from the design doc.
They are pretty close for a reason :-)

On Friday, September 2, 2016, Janet Kuo notifications@github.com wrote:

@derekwaynecarr https://github.com/derekwaynecarr Are the docs ready?
Please update the docs in https://github.com/kubernetes/
kubernetes.github.io, and then add PR numbers and check the docs box in
the issue description


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#39 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AF8dbLRdJAt5Uq_Wqlj9DscMukPPJCWeks5qmGY6gaJpZM4JR21_
.

@jaredbhatti
Copy link

@derekwaynecarr Can you add your docs PR here when you have it ready?

@vishh
Copy link
Contributor

vishh commented Sep 7, 2016

@derekwaynecarr My assumption is that this feature is alpha or beta in
v1.4. I hope the docs will reflect that!

On Wed, Sep 7, 2016 at 2:02 PM, Jared notifications@github.com wrote:

@derekwaynecarr https://github.com/derekwaynecarr Can you add your docs
PR here when you have it ready?


You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub
#39 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AGvIKNRzEQaiP_bJjF_BFGfRRDM_3UIrks5qnyZsgaJpZM4JR21_
.

@derekwaynecarr
Copy link
Member Author

@derekwaynecarr
Copy link
Member Author

@kubernetes/docs -- added feature doc pr kubernetes/website#1196

@jaredbhatti
Copy link

@derekwaynecarr Is this feature Stable or Beta?

@idvoretskyi
Copy link
Member

@derekwaynecarr can you provide us with the actual feature status?
Thanks.

@derekwaynecarr
Copy link
Member Author

It's not alpha, I would say beta (when dealing with inodes) but stable for
disk capacity.

On Wednesday, September 21, 2016, Ihor Dvoretskyi notifications@github.com
wrote:

@derekwaynecarr https://github.com/derekwaynecarr can you provide us
with the actual feature status?
Thanks.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#39 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AF8dbPWzNauUTXcsgIeSn96pW8bv_C3Mks5qsahfgaJpZM4JR21_
.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 2, 2018
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 7, 2018
@ezware
Copy link

ezware commented Feb 12, 2018

how to disable disk pressure observe?

@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

ingvagabund pushed a commit to ingvagabund/enhancements that referenced this issue Apr 2, 2020
make  accept generic .status.relatedResources
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
None yet
Development

No branches or pull requests

10 participants