Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubelet image GC after a maximum age #4210

Open
8 tasks done
haircommander opened this issue Sep 14, 2023 · 40 comments
Open
8 tasks done

kubelet image GC after a maximum age #4210

haircommander opened this issue Sep 14, 2023 · 40 comments
Assignees
Labels
sig/node Categorizes an issue or PR as relevant to SIG Node. stage/beta Denotes an issue tracking an enhancement targeted for Beta status
Milestone

Comments

@haircommander
Copy link
Contributor

haircommander commented Sep 14, 2023

Enhancement Description

Please keep this description up to date. This will help the Enhancement Team to track the evolution of the enhancement efficiently.

@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Sep 14, 2023
@haircommander
Copy link
Contributor Author

/sig node

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Sep 14, 2023
@SergeyKanzhelev
Copy link
Member

/stage alpha
/milestone v1.29
/label lead-opted-in

@k8s-ci-robot k8s-ci-robot added the stage/alpha Denotes an issue tracking an enhancement targeted for Alpha status label Sep 15, 2023
@k8s-ci-robot k8s-ci-robot added this to the v1.29 milestone Sep 15, 2023
@k8s-ci-robot k8s-ci-robot added the lead-opted-in Denotes that an issue has been opted in to a release label Sep 15, 2023
@dchen1107
Copy link
Member

cc/ @bsdnet to help reviewing.

@npolshakova
Copy link

Hello @haircommander 👋, 1.29 Enhancements team here!

Just checking in as we approach enhancements freeze on 01:00 UTC, Friday, 6th October, 2023.

This enhancement is targeting for stage alpha for 1.29 (correct me, if otherwise)

Here's where this enhancement currently stands:

  • KEP readme using the latest template has been merged into the k/enhancements repo.
  • KEP status is marked as implementable for latest-milestone: 1.29. KEPs targeting stable will need to be marked as implemented after code PRs are merged and the feature gates are removed.
  • KEP readme has up-to-date graduation criteria
  • KEP has a production readiness review that has been completed and merged into k/enhancements. (For more information on the PRR process, check here).

It looks like https://github.com/kubernetes/enhancements/pull/4211/files will address most of these issues when it merges in!

The status of this enhancement is marked as at risk for enhancement freeze. Please keep the issue description up-to-date with appropriate stages as well. Thank you!

@npolshakova npolshakova moved this to At Risk for Enhancements Freeze in 1.29 Enhancements Tracking Sep 22, 2023
@kannon92
Copy link
Contributor

/assign @haircommander

@npolshakova
Copy link

Hi @haircommander, just checking in once more as we approach the 1.29 enhancement freeze deadline this week on 01:00 UTC, Friday, 6th October, 2023. The status of this enhancement is marked as at risk for enhancement freeze.

It looks like #4211 will address most of the requirements when it merges. Let me know if I missed anything. Thanks!

@npolshakova
Copy link

With KEP PR #4211 approved, the enhancement is ready for the enhancements freeze. The status is now marked as tracked for enhancement freeze for 1.29. 🚀 Thank you!

@npolshakova npolshakova moved this from At Risk for Enhancements Freeze to Tracked for Enhancements Freeze in 1.29 Enhancements Tracking Oct 6, 2023
@katcosgrove
Copy link
Contributor

Hey there @haircommander! 👋, v1.29 Docs Lead here.
Does this enhancement work planned for v1.29 require any new docs or modification to existing docs?
If so, please follows the steps here to open a PR against dev-1.29 branch in the k/website repo. This PR can be just a placeholder at this time and must be created before Thursday, 19 October 2023.
Also, take a look at Documenting for a release to get yourself familiarize with the docs requirement for the release.
Thank you!

@pacoxu
Copy link
Member

pacoxu commented Oct 17, 2023

containerd/containerd#9022 adds support for image expiration in GC, which is a similar concept. But this is a containerd v2.0 feature.

@katcosgrove
Copy link
Contributor

Hi again @haircommander! The deadline to open a placeholder PR against k/website for required documentation is this Thursday, 19 October. Could you please update me on the status of docs for this enhancement? Thank you!

@haircommander
Copy link
Contributor Author

thanks for the reminder @katcosgrove I've opened the draft here kubernetes/website#43544

@npolshakova
Copy link

Hey again @haircommander 👋 1.29 Enhancements team here,

Just checking in as we approach code freeze at 01:00 UTC Wednesday 1st November 2023: .

Here's where this enhancement currently stands:

  • All PRs to the Kubernetes repo that are related to your enhancement are linked in the above issue description (for tracking purposes).
  • All PR/s are ready to be merged (they have approved and lgtm labels applied) by the code freeze deadline. This includes tests.

This KEP is currently marked as at risk for code freeze.

Also, please let me know if there are other PRs in k/k we should be tracking for this KEP.
As always, we are here to help if any questions come up. Thanks!

@npolshakova npolshakova moved this from Tracked for Enhancements Freeze to At Risk for Code Freeze in 1.29 Enhancements Tracking Oct 23, 2023
@kannon92
Copy link
Contributor

@haircommander can you update this issue with the relevant PRs for alpha?

kubernetes/kubernetes#121275 should be added to this description.

#4211 is another that should be added.

@a-mccarthy
Copy link

Hi @haircommander, 👋 from the v1.29 Release Team-Communications! We would like to check if you have any plans to publish a blog for this KEP regarding new features, removals, and deprecations for this release.

If so, you need to open a PR placeholder in the website repository.
The deadline will be on Tuesday 14th November 2023 (after the Docs deadline PR ready for review)

Here is the 1.29 calendar

@npolshakova
Copy link

Thanks for updating the issue description. This is now tracked for code freeze for 1.29! 🚀

@npolshakova npolshakova moved this from At Risk for Code Freeze to Tracked for Code Freeze in 1.29 Enhancements Tracking Oct 30, 2023
@Checksumz
Copy link

Hi @haircommander ,

👋 from the v1.30 Communications Team! We'd love for you to opt in to write a feature blog about your enhancement!

We encourage blogs for features including, but not limited to: breaking changes, features and changes important to our users, and features that have been in progress for a long time and are graduating.

To opt in, you need to open a Feature Blog placeholder PR against the website repository.
The placeholder PR deadline is 27th February, 2024.
Here's the 1.30 Release Calendar

@pnbrown
Copy link

pnbrown commented Feb 29, 2024

Hey again @haircommander 👋 Enhancements team here,

Just checking in as we approach code freeze at 02:00 UTC Wednesday 6th March 2024 .

Here's where this enhancement currently stands:

  • All PRs to the Kubernetes repo that are related to your enhancement are linked in the above issue description (for tracking purposes).
  • All PR/s are ready to be merged (they have approved and lgtm labels applied) by the code freeze deadline. This includes tests.

Also, please let me know if there are other PRs in k/k we should be tracking for this KEP.
As always, we are here to help if any questions come up. Thanks!

@pnbrown pnbrown moved this from Tracked for Enhancements Freeze to Tracked for Code Freeze in 1.30 Enhancements Tracking Feb 29, 2024
@AnishShah
Copy link

AnishShah commented Mar 13, 2024

I was initially confused by the MinAge and ImageGCMaxAge config flags. I had to read the docs and api comments to understand better. Here's a clearer breakdown to help others:

  1. MinAge: This truly is the minimum age (time since it was first pulled) before the kubelet considers garbage collecting an image.

  2. ImageGCMaxAge: This is better understood as the maximum time an image can remain unused before the image become eligible for garbage collection.

It might be better to rename the flag to ImageUnusedThreshold or MaxUnusedTime or similar to be more accurate.

@drewhagen drewhagen moved this from Tracked for Code Freeze to At Risk for Docs Freeze in 1.30 Enhancements Tracking Mar 24, 2024
@drewhagen drewhagen moved this from At Risk for Docs Freeze to Tracked for Code Freeze in 1.30 Enhancements Tracking Apr 1, 2024
@drewhagen drewhagen moved this from Tracked for Code Freeze to Tracked for Doc Freeze in 1.30 Enhancements Tracking Apr 1, 2024
@sreeram-venkitesh
Copy link
Member

Hi @haircommander 👋, 1.31 Enhancements Lead here.

If you wish to progress this enhancement in v1.31, please have the SIG lead opt-in your enhancement by adding the lead-opted-in label and set the milestone to v1.31 before the Production Readiness Review Freeze.

/remove-label lead-opted-in

@k8s-ci-robot k8s-ci-robot removed the lead-opted-in Denotes that an issue has been opted in to a release label May 15, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 13, 2024
@haircommander
Copy link
Contributor Author

I think we should have more uses before we push for GA. I'm bumping this from the SIG node KEP planning for this release

@SergeyKanzhelev
Copy link
Member

I think we should have more uses before we push for GA. I'm bumping this from the SIG node KEP planning for this release

As discussed at the meeting today, maybe the lack of adoption indicates the design issue. Perhaps this needs to be re-visited to collect concerns why this is not being enabled. Perhaps more knobs or smartness is needed here and one setting for all images is not working very well.

@haircommander
Copy link
Contributor Author

I think at this point it's mostly a testing/data gap (at least speaking for Openshift)

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Oct 11, 2024
@SergeyKanzhelev
Copy link
Member

/remove-lifecycle rotten

testing/data gap

what testing do you mean? Something we should do upstream or production testing?

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Oct 11, 2024
@haircommander
Copy link
Contributor Author

I think the difficulty with testing this feature is for it to materially be useful, we'd need to set the age longer than most CI clusters run (so images that are intermittently used don't thrash). Finding that threshold is difficult without testing, testing is difficult without knowing the threshold and putting it in production where the clusters live longer

@SergeyKanzhelev
Copy link
Member

it to materially be useful, we'd need to set the age longer than most CI clusters run (so images that are intermittently used don't thrash)

This is what I meant before on maybe this means that we do not have enough config options to make it "safe" to enable. At least this is the case in our discussions on how we can offer this feature on GKE. Autoscaler may decide to re-use the node after it was not used over the weekend. So even 48 hours is not "safe" assumption on a high enough threshold.

One direction to go is to allow more granular configuration on what falls under this policy and what is not. Another direction is to avoid re-deleting an image that was re-pulled at least once after the first age-based deletion. Maybe there are more ideas.

If this is used in some prod environments, I would be interested to know what setting was used and whether cluster uses autoscaler, etc.

@haircommander
Copy link
Contributor Author

FWIW my initial thoughts on a default value is 2 weeks but I'm not sure if cronjobs that run that infrequently and repull would be a problem.

@SergeyKanzhelev
Copy link
Member

with 2 weeks, another question is whether we will hit the eviction faster than this age-based GC will even kick-in.

@haircommander
Copy link
Contributor Author

that's fine. I think this is really a fail-safe for all the other methods. I think we want this to be as non-aggressive as remains useful

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/node Categorizes an issue or PR as relevant to SIG Node. stage/beta Denotes an issue tracking an enhancement targeted for Beta status
Projects
Status: Tracked for Code Freeze
Status: Tracked for Doc Freeze
Status: Not for release
Development

No branches or pull requests