Allow DaemonSet pods to opt in/out from eviction #4172

x13n · 2021-06-29T08:16:38Z

Changes to documentation will be done in a follow-up PR.

MaciekPytel · 2021-06-29T08:25:40Z

cluster-autoscaler/core/scale_down.go

-func podsToEvict(pods []*apiv1.Pod, shouldEvict bool) []*apiv1.Pod {
-	if shouldEvict {
-		return pods
+func podsToEvict(pods []*apiv1.Pod, evictByDefault bool) (evictable []*apiv1.Pod) {


Can you rename this to make it clear it's related to ds and not 'regular' pods? Also maybe consider moving it to cluster-autoscaler/utils/daemonset?

I think moving this to cluster-autoscaler/utils/daemonset will suffice as a naming change, since it will then become daemonset.PodsToEvict. Will move it there.

MaciekPytel · 2021-06-29T08:33:13Z

This PR only covers empty node deletion path. To respect annotation consistently I think you need to add filtering logic in drainNode() as well.

x13n · 2021-06-29T08:41:41Z

It covers both paths, podsToEvict was already used before the call to drainNode.

MaciekPytel · 2021-06-29T12:36:54Z

/lgtm
/approve

k8s-ci-robot · 2021-06-29T12:37:01Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: MaciekPytel, x13n

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~cluster-autoscaler/OWNERS~~ [MaciekPytel]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

…dd flag to control DaemonSet eviction on non-empty nodes & Allow DaemonSet pods to opt in/out from eviction.

…21-daemonset-eviction-for-empty-nodes-and-occupied-nodes Backport #4162 and #4172 [cluster-autoscaler] "Add a flag to control DaemonSet eviction on non-empty nodes and Allow DaemonSet pods to opt in/out from eviction" into 1.21

* Set maxAsgNamesPerDescribe to the new maximum value While this was previously effectively limited to 50, `DescribeAutoScalingGroups` now supports fetching 100 ASG per calls on all regions, matching what's documented: https://docs.aws.amazon.com/autoscaling/ec2/APIReference/API_DescribeAutoScalingGroups.html ``` AutoScalingGroupNames.member.N The names of the Auto Scaling groups. By default, you can only specify up to 50 names. You can optionally increase this limit using the MaxRecords parameter. MaxRecords The maximum number of items to return with this call. The default value is 50 and the maximum value is 100. ``` Doubling this halves API calls on large clusters, which should help to prevent throttling. * Break out unmarshal from GenerateEC2InstanceTypes Refactor to allow for optimisation * Optimise GenerateEC2InstanceTypes unmarshal memory usage The pricing json for us-east-1 is currently 129MB. Currently fetching this into memory and parsing results in a large memory footprint on startup, and can lead to the autoscaler being OOMKilled. Change the ReadAll/Unmarshal logic to a stream decoder to significantly reduce the memory use. * use aws sdk to find region * Merge pull request kubernetes#4274 from kinvolk/imran/cloud-provider-packet-fix Cloud provider[Packet] fixes * Fix templated nodeinfo names collisions in BinpackingNodeEstimator Both upscale's `getUpcomingNodeInfos` and the binpacking estimator now uses the same shared DeepCopyTemplateNode function and inherits its naming pattern, which is great as that fixes a long standing bug. Due to that, `getUpcomingNodeInfos` will enrich the cluster snapshots with generated nodeinfos and nodes having predictable names (using template name + an incremental ordinal starting at 0) for upcoming nodes. Later, when it looks for fitting nodes for unschedulable pods (when upcoming nodes don't satisfy those (FitsAnyNodeMatching failing due to nodes capacity, or pods antiaffinity, ...), the binpacking estimator will also build virtual nodes and place them in a snapshot fork to evaluate scheduler predicates. Those temporary virtual nodes are built using the same pattern (template name and an index ordinal also starting at 0) as the one previously used by `getUpcomingNodeInfos`, which means it will generate the same nodeinfos/nodes names for nodegroups having upcoming nodes. But adding nodes by the same name in an existing cluster snapshot isn't allowed, and the evaluation attempt will fail. Practically this blocks re-upscales for nodegroups having upcoming nodes, which can cause a significant delay. * Improve misleading log Signed-off-by: Sylvain Rabot <sylvain@abstraction.fr> * dont proactively decrement azure cache for unregistered nodes * annotate fakeNodes so that cloudprovider implementations can identify them if needed * move annotations to cloudprovider package * Cluster Autoscaler 1.21.1 * CA - AWS - Instance List Update 03-10-21 - 1.21 release branch * CA - AWS - Instance List Update 29-10-21 - 1.21 release branch * Cluster-Autoscaler update AWS EC2 instance types with g5, m6 and r6 * CA - AWS Instance List Update - 13/12/21 - 1.21 * Merge pull request kubernetes#4497 from marwanad/add-more-azure-instance-types add more azure instance types * Cluster Autoscaler 1.21.2 * Add `--feature-gates` flag to support scale up on volume limits (CSI migration enabled) Signed-off-by: ialidzhikov <i.alidjikov@gmail.com> * [Cherry pick 1.21] Remove TestDeleteBlob UT Signed-off-by: Zhecheng Li <zhechengli@microsoft.com> * cherry-pick kubernetes#4022 [cluster-autoscaler] Publish node group min/max metrics * Skipping metrics tests added in kubernetes#4022 Each test works in isolation, but they cause panic when the entire suite is run (ex. make test-in-docker), because the underlying metrics library panics when the same metric is registered twice. (cherry picked from commit 52392b3) * cherry-pick kubernetes#4162 and kubernetes#4172 [cluster-autoscaler]Add flag to control DaemonSet eviction on non-empty nodes & Allow DaemonSet pods to opt in/out from eviction. * CA - AWS Cloud Provider - 1.21 Static Instance List Update 02-06-2022 * fix instance type fallback Instead of logging a fatal error, log a standard error and fall back to loading instance types from the static list. * Cluster Autoscaler - 1.21.3 release * FAQ updated * Sync_changes file updated Co-authored-by: Benjamin Pineau <benjamin.pineau@datadoghq.com> Co-authored-by: Adrian Lai <aidy@loathe.me.uk> Co-authored-by: darkpssngr <shreyas300691@gmail.com> Co-authored-by: Kubernetes Prow Robot <k8s-ci-robot@users.noreply.github.com> Co-authored-by: Sylvain Rabot <sylvain@abstraction.fr> Co-authored-by: Marwan Ahmed <marwanad@microsoft.com> Co-authored-by: Jakub Tużnik <jtuznik@google.com> Co-authored-by: GuyTempleton <guy.templeton@skyscanner.net> Co-authored-by: sturman <4456572+sturman@users.noreply.github.com> Co-authored-by: Maciek Pytel <maciekpytel@google.com> Co-authored-by: ialidzhikov <i.alidjikov@gmail.com> Co-authored-by: Zhecheng Li <zhechengli@microsoft.com> Co-authored-by: Shubham Kuchhal <shubham.kuchhal@india.nec.com> Co-authored-by: Todd Neal <tnealt@amazon.com>

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 29, 2021

k8s-ci-robot requested review from aleksandra-malinowska and feiskyer June 29, 2021 08:16

x13n force-pushed the master branch from dd1748e to fa6cd74 Compare June 29, 2021 08:21

MaciekPytel reviewed Jun 29, 2021

View reviewed changes

x13n force-pushed the master branch 2 times, most recently from b6d7a56 to ecb3764 Compare June 29, 2021 09:10

x13n mentioned this pull request Jun 29, 2021

Document DaemonSet eviction opt in/out behavior #4173

Merged

Allow DaemonSet pods to opt in/out from eviction

44b8d67

x13n force-pushed the master branch from ecb3764 to 44b8d67 Compare June 29, 2021 09:58

k8s-ci-robot assigned MaciekPytel Jun 29, 2021

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 29, 2021

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 29, 2021

k8s-ci-robot merged commit a839343 into kubernetes:master Jun 29, 2021

jmenan mentioned this pull request Apr 27, 2022

backport "daemonset-eviction-for-empty-nodes" and "daemonset-eviction-for-occupied-nodes" from 1.22 to 1.21 #4830

Closed

Shubham82 added a commit to Shubham82/autoscaler that referenced this pull request May 25, 2022

cherry-pick kubernetes#4162 and kubernetes#4172 [cluster-autoscaler]A…

fcd0433

…dd flag to control DaemonSet eviction on non-empty nodes & Allow DaemonSet pods to opt in/out from eviction.

Shubham82 mentioned this pull request May 25, 2022

Backport #4162 and #4172 [cluster-autoscaler] "Add a flag to control DaemonSet eviction on non-empty nodes and Allow DaemonSet pods to opt in/out from eviction" into 1.21 #4916

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow DaemonSet pods to opt in/out from eviction #4172

Allow DaemonSet pods to opt in/out from eviction #4172

x13n commented Jun 29, 2021

MaciekPytel Jun 29, 2021

x13n Jun 29, 2021

MaciekPytel commented Jun 29, 2021

x13n commented Jun 29, 2021

MaciekPytel commented Jun 29, 2021

k8s-ci-robot commented Jun 29, 2021

Allow DaemonSet pods to opt in/out from eviction #4172

Allow DaemonSet pods to opt in/out from eviction #4172

Conversation

x13n commented Jun 29, 2021

MaciekPytel Jun 29, 2021

Choose a reason for hiding this comment

x13n Jun 29, 2021

Choose a reason for hiding this comment

MaciekPytel commented Jun 29, 2021

x13n commented Jun 29, 2021

MaciekPytel commented Jun 29, 2021

k8s-ci-robot commented Jun 29, 2021