Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a flag to control DaemonSet eviction on non-empty nodes #4162

Merged
merged 1 commit into from
Jun 25, 2021

Conversation

x13n
Copy link
Member

@x13n x13n commented Jun 24, 2021

No description provided.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jun 24, 2021
@k8s-ci-robot k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Jun 24, 2021
Copy link
Collaborator

@towca towca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve
/hold
Feel free to unhold if you don't agree with the nit.

cluster-autoscaler/core/scale_down.go Outdated Show resolved Hide resolved
@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 24, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: towca, x13n

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jun 24, 2021
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 25, 2021
@towca
Copy link
Collaborator

towca commented Jun 25, 2021

/lgtm
/unhold

@k8s-ci-robot k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Jun 25, 2021
@k8s-ci-robot k8s-ci-robot merged commit 509b3e3 into kubernetes:master Jun 25, 2021
Shubham82 added a commit to Shubham82/autoscaler that referenced this pull request May 25, 2022
…dd flag to control DaemonSet eviction on non-empty nodes & Allow DaemonSet pods to opt in/out

from eviction.
k8s-ci-robot added a commit that referenced this pull request Jun 6, 2022
…21-daemonset-eviction-for-empty-nodes-and-occupied-nodes

Backport #4162 and #4172 [cluster-autoscaler] "Add a flag to control DaemonSet eviction on non-empty nodes and Allow DaemonSet pods to opt in/out from eviction" into 1.21
himanshu-kun added a commit to gardener/autoscaler that referenced this pull request Jun 25, 2022
* Set maxAsgNamesPerDescribe to the new maximum value

While this was previously effectively limited to 50, `DescribeAutoScalingGroups` now supports
fetching 100 ASG per calls on all regions, matching what's documented:
https://docs.aws.amazon.com/autoscaling/ec2/APIReference/API_DescribeAutoScalingGroups.html
```
     AutoScalingGroupNames.member.N
       The names of the Auto Scaling groups.
       By default, you can only specify up to 50 names.
       You can optionally increase this limit using the MaxRecords parameter.
     MaxRecords
       The maximum number of items to return with this call.
       The default value is 50 and the maximum value is 100.
```

Doubling this halves API calls on large clusters, which should help to prevent throttling.

* Break out unmarshal from GenerateEC2InstanceTypes

Refactor to allow for optimisation

* Optimise GenerateEC2InstanceTypes unmarshal memory usage

The pricing json for us-east-1 is currently 129MB. Currently fetching
this into memory and parsing results in a large memory footprint on
startup, and can lead to the autoscaler being OOMKilled.

Change the ReadAll/Unmarshal logic to a stream decoder to significantly
reduce the memory use.

* use aws sdk to find region

* Merge pull request kubernetes#4274 from kinvolk/imran/cloud-provider-packet-fix

Cloud provider[Packet] fixes

* Fix templated nodeinfo names collisions in BinpackingNodeEstimator

Both upscale's `getUpcomingNodeInfos` and the binpacking estimator now uses
the same shared DeepCopyTemplateNode function and inherits its naming
pattern, which is great as that fixes a long standing bug.

Due to that, `getUpcomingNodeInfos` will enrich the cluster snapshots with
generated nodeinfos and nodes having predictable names (using template name
+ an incremental ordinal starting at 0) for upcoming nodes.

Later, when it looks for fitting nodes for unschedulable pods (when upcoming
nodes don't satisfy those (FitsAnyNodeMatching failing due to nodes capacity,
or pods antiaffinity, ...), the binpacking estimator will also build virtual
nodes and place them in a snapshot fork to evaluate scheduler predicates.

Those temporary virtual nodes are built using the same pattern (template name
and an index ordinal also starting at 0) as the one previously used by
`getUpcomingNodeInfos`, which means it will generate the same nodeinfos/nodes
names for nodegroups having upcoming nodes.

But adding nodes by the same name in an existing cluster snapshot isn't
allowed, and the evaluation attempt will fail.

Practically this blocks re-upscales for nodegroups having upcoming nodes,
which can cause a significant delay.

* Improve misleading log

Signed-off-by: Sylvain Rabot <sylvain@abstraction.fr>

* dont proactively decrement azure cache for unregistered nodes

* annotate fakeNodes so that cloudprovider implementations can identify them if needed

* move annotations to cloudprovider package

* Cluster Autoscaler 1.21.1

* CA - AWS - Instance List Update 03-10-21 - 1.21 release branch

* CA - AWS - Instance List Update 29-10-21 - 1.21 release branch

* Cluster-Autoscaler update AWS EC2 instance types with g5, m6 and r6

* CA - AWS Instance List Update - 13/12/21 - 1.21

* Merge pull request kubernetes#4497 from marwanad/add-more-azure-instance-types

add more azure instance types

* Cluster Autoscaler 1.21.2

* Add `--feature-gates` flag to support scale up on volume limits (CSI migration enabled)

Signed-off-by: ialidzhikov <i.alidjikov@gmail.com>

* [Cherry pick 1.21] Remove TestDeleteBlob UT

Signed-off-by: Zhecheng Li <zhechengli@microsoft.com>

* cherry-pick kubernetes#4022 [cluster-autoscaler] Publish node group min/max metrics

* Skipping metrics tests added in kubernetes#4022

Each test works in isolation, but they cause panic when the entire
suite is run (ex. make test-in-docker), because the underlying
metrics library panics when the same metric is registered twice.

(cherry picked from commit 52392b3)

* cherry-pick kubernetes#4162 and kubernetes#4172 [cluster-autoscaler]Add flag to control DaemonSet eviction on non-empty nodes & Allow DaemonSet pods to opt in/out
from eviction.

* CA - AWS Cloud Provider - 1.21 Static Instance List Update 02-06-2022

* fix instance type fallback

Instead of logging a fatal error, log a standard error and fall back to
loading instance types from the static list.

* Cluster Autoscaler - 1.21.3 release

* FAQ updated

* Sync_changes file updated

Co-authored-by: Benjamin Pineau <benjamin.pineau@datadoghq.com>
Co-authored-by: Adrian Lai <aidy@loathe.me.uk>
Co-authored-by: darkpssngr <shreyas300691@gmail.com>
Co-authored-by: Kubernetes Prow Robot <k8s-ci-robot@users.noreply.github.com>
Co-authored-by: Sylvain Rabot <sylvain@abstraction.fr>
Co-authored-by: Marwan Ahmed <marwanad@microsoft.com>
Co-authored-by: Jakub Tużnik <jtuznik@google.com>
Co-authored-by: GuyTempleton <guy.templeton@skyscanner.net>
Co-authored-by: sturman <4456572+sturman@users.noreply.github.com>
Co-authored-by: Maciek Pytel <maciekpytel@google.com>
Co-authored-by: ialidzhikov <i.alidjikov@gmail.com>
Co-authored-by: Zhecheng Li <zhechengli@microsoft.com>
Co-authored-by: Shubham Kuchhal <shubham.kuchhal@india.nec.com>
Co-authored-by: Todd Neal <tnealt@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants