-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automated cherry pick of #3924: Fix bug where a node that becomes ready after 2 mins can be #4319
Merged
k8s-ci-robot
merged 1 commit into
kubernetes:cluster-autoscaler-release-1.20
from
matthias50:automated-cherry-pick-of-#3924-upstream-cluster-autoscaler-release-1.20
Sep 30, 2021
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…s unready. Deprecated LongNotStarted In cases where node n1 would: 1) Be created at t=0min 2) Ready condition is true at t=2.5min 3) Not ready taint is removed at t=3min the ready node is counted as unready Tested cases after fix: 1) Case described above 2) Nodes not starting even after 15mins still treated as unready 3) Nodes created long ago that suddenly become unready are counted as unready.
k8s-ci-robot
added
cncf-cla: yes
Indicates the PR's author has signed the CNCF CLA.
size/L
Denotes a PR that changes 100-499 lines, ignoring generated files.
labels
Sep 9, 2021
/lgtm |
k8s-ci-robot
added
the
lgtm
"Looks good to me", indicates that a PR is ready to be merged.
label
Sep 30, 2021
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: MaciekPytel, matthias50, towca The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
k8s-ci-robot
added
the
approved
Indicates a PR has been approved by an approver from all required OWNERS files.
label
Sep 30, 2021
towca
added a commit
to towca/autoscaler
that referenced
this pull request
Sep 30, 2021
…ed to 1.20 in kubernetes#4319 The backport included unit tests using a function that changed signature after 1.20. This was not detected before merging because CI is not running correctly on 1.20.
towca
pushed a commit
to towca/autoscaler
that referenced
this pull request
Sep 30, 2021
…ick-of-#3924-upstream-cluster-autoscaler-release-1.20 Automated cherry pick of kubernetes#3924: Fix bug where a node that becomes ready after 2 mins can be
himanshu-kun
added a commit
to gardener/autoscaler
that referenced
this pull request
Jun 25, 2022
* Fix cluster-autoscaler clusterapi sample manifest This commit fixes sample manifest of cluster-autoscaler clusterapi provider. (cherry picked from commit a5fee21) * Adding functionality to cordon the node before destroying it. This helps load balancer to remove the node from healthy hosts (ALB does have this support). This won't fix the issue of 502 completely as there is some time node has to live even after cordoning as to serve In-Flight request but load balancer can be configured to remove Cordon nodes from healthy host list. This feature is enabled by cordon-node-before-terminating flag with default value as false to retain existing behavior. * Set maxAsgNamesPerDescribe to the new maximum value While this was previously effectively limited to 50, `DescribeAutoScalingGroups` now supports fetching 100 ASG per calls on all regions, matching what's documented: https://docs.aws.amazon.com/autoscaling/ec2/APIReference/API_DescribeAutoScalingGroups.html ``` AutoScalingGroupNames.member.N The names of the Auto Scaling groups. By default, you can only specify up to 50 names. You can optionally increase this limit using the MaxRecords parameter. MaxRecords The maximum number of items to return with this call. The default value is 50 and the maximum value is 100. ``` Doubling this halves API calls on large clusters, which should help to prevent throttling. * Break out unmarshal from GenerateEC2InstanceTypes Refactor to allow for optimisation * Optimise GenerateEC2InstanceTypes unmarshal memory usage The pricing json for us-east-1 is currently 129MB. Currently fetching this into memory and parsing results in a large memory footprint on startup, and can lead to the autoscaler being OOMKilled. Change the ReadAll/Unmarshal logic to a stream decoder to significantly reduce the memory use. * use aws sdk to find region * update readme * Update cluster-autoscaler/cloudprovider/aws/README.md Co-authored-by: Guy Templeton <guyjtempleton@googlemail.com> * Merge pull request kubernetes#4274 from kinvolk/imran/cloud-provider-packet-fix Cloud provider[Packet] fixes * Fix bug where a node that becomes ready after 2 mins can be treated as unready. Deprecated LongNotStarted In cases where node n1 would: 1) Be created at t=0min 2) Ready condition is true at t=2.5min 3) Not ready taint is removed at t=3min the ready node is counted as unready Tested cases after fix: 1) Case described above 2) Nodes not starting even after 15mins still treated as unready 3) Nodes created long ago that suddenly become unready are counted as unready. * Improve misleading log Signed-off-by: Sylvain Rabot <sylvain@abstraction.fr> * dont proactively decrement azure cache for unregistered nodes * Cluster Autoscaler: fix unit tests after kubernetes#3924 was backported to 1.20 in kubernetes#4319 The backport included unit tests using a function that changed signature after 1.20. This was not detected before merging because CI is not running correctly on 1.20. * Cluster Autoscaler: backport Github Actions CI to 1.20 (kubernetes#4366) * annotate fakeNodes so that cloudprovider implementations can identify them if needed * move annotations to cloudprovider package * fix 1.19 test * remove flaky test that's removed in master * Cluster Autoscaler 1.20.1 * Make arch-specific releases use separate images instead of tags on the same image This seems to be the current convention in k8s. * Cluster Autoscaler: add arch-specific build targets to .gitignore * CA - AWS - Instance List Update 03-10-21 - 1.20 release branch * CA - AWS - Instance List Update 29-10-21 - 1.20 release branch * Cluster-Autoscaler update AWS EC2 instance types with g5, m6 and r6 * CA - AWS Instance List Update - 13/12/21 - 1.20 * Merge pull request kubernetes#4497 from marwanad/add-more-azure-instance-types add more azure instance types * Cluster Autoscaler 1.20.2 * Add `--feature-gates` flag to support scale up on volume limits (CSI migration enabled) Signed-off-by: ialidzhikov <i.alidjikov@gmail.com> * CA - AWS Cloud Provider - 1.20 Static Instance List Update 02-06-2022 * Cluster Autoscaler - 1.20.3 release * sync_file updates & other changes * Updating vendor against git@github.com:kubernetes/kubernetes.git:e3de62298a730415c5d2ab72607ef6adadd6304d (e3de622) * fixed some declaration errors Co-authored-by: Kubernetes Prow Robot <k8s-ci-robot@users.noreply.github.com> Co-authored-by: Hidekazu Nakamura <hidekazuna@gmail.com> Co-authored-by: atul <atul.aggarwal@cleartax.in> Co-authored-by: Benjamin Pineau <benjamin.pineau@datadoghq.com> Co-authored-by: Adrian Lai <aidy@loathe.me.uk> Co-authored-by: darkpssngr <shreyas300691@gmail.com> Co-authored-by: Guy Templeton <guyjtempleton@googlemail.com> Co-authored-by: Vivek Bagade <vivek.bagade92@gmail.com> Co-authored-by: Sylvain Rabot <sylvain@abstraction.fr> Co-authored-by: Marwan Ahmed <marwanad@microsoft.com> Co-authored-by: Jakub Tużnik <jtuznik@google.com> Co-authored-by: GuyTempleton <guy.templeton@skyscanner.net> Co-authored-by: sturman <4456572+sturman@users.noreply.github.com> Co-authored-by: Maciek Pytel <maciekpytel@google.com> Co-authored-by: ialidzhikov <i.alidjikov@gmail.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
approved
Indicates a PR has been approved by an approver from all required OWNERS files.
area/cluster-autoscaler
cncf-cla: yes
Indicates the PR's author has signed the CNCF CLA.
lgtm
"Looks good to me", indicates that a PR is ready to be merged.
size/L
Denotes a PR that changes 100-499 lines, ignoring generated files.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Cherry pick of #3924 on cluster-autoscaler-release-1.20.
#3924: Fix bug where a node that becomes ready after 2 mins can be
For details on the cherry pick process, see the cherry pick requests page.