Skip to content

Commit

Permalink
Activate Early backoff functionality (#253)
Browse files Browse the repository at this point in the history
* created cloudprovider.instance with proper state and status

* disabled retry of RunOnce on failure to remove createErrorNodes

* added basic test case for Nodes() method

* lint changes

* updated log level for scale down status log

* copied function as not exported in MCM vendored version

* added vendored files

* docs for early backoff added

* Apply direct suggestions from code review

Co-authored-by: Rishabh Patel <66425093+rishabh-11@users.noreply.github.com>

* other comments addressed

* fixing spelling checks, excluding balancer component from lint tests

* fix excluding balancer component change

* ignored files correctly from gofmt and golint checks

* fixed gofmt failure

* reset gofmt changes

* reset golint changes

* Revert "disabled retry of RunOnce on failure to remove createErrorNodes"

This reverts commit 7947c71.

* added corner case in docs

* Update cluster-autoscaler/FAQ.md

Co-authored-by: Rishabh Patel <66425093+rishabh-11@users.noreply.github.com>

* update recreate advice in docs

* addressed review comments

* changes made to unit tests

* removed findCodeAndMessage

* updated doc

---------

Co-authored-by: Rishabh Patel <66425093+rishabh-11@users.noreply.github.com>
  • Loading branch information
himanshu-kun and rishabh-11 authored Oct 3, 2023
1 parent 053c0d5 commit da973f4
Show file tree
Hide file tree
Showing 21 changed files with 569 additions and 100 deletions.
52 changes: 50 additions & 2 deletions cluster-autoscaler/FAQ.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,8 +67,11 @@ this document:
* [How can I update CA dependencies (particularly k8s.io/kubernetes)?](#how-can-i-update-ca-dependencies-particularly-k8siokubernetes)

* [In the context of Gardener](#in-the-context-of-gardener)
* [How do I sync gardener autoscaler with an upstream autoscaler minor release?](#how-do-i-sync-gardener-autoscaler-with-an-upstream-autoscaler-minor-release)
* [How do I revendor a different version of MCM in autoscaler?](#how-do-i-revendor-a-different-version-of-mcm-in-autoscaler)
* [For User](#for-user)
* [When does autoscaler back off early from a node group?](#when-does-autoscaler-backs-off-early-from-a-node-group)
* [For Developer](#for-developer)
* [How do I sync gardener autoscaler with an upstream autoscaler minor release?](#how-do-i-sync-gardener-autoscaler-with-an-upstream-autoscaler-minor-release)
* [How do I revendor a different version of MCM in autoscaler?](#how-do-i-revendor-a-different-version-of-mcm-in-autoscaler)
<!--- TOC END -->

# Basics
Expand Down Expand Up @@ -1087,6 +1090,51 @@ Caveats:

# In the context of Gardener:

## For User
### When does autoscaler backs off early from a node group?

Autoscaler backs off from a node group if the scale-up requested doesn't succeed. Autoscaler decides to backoff based on:
- Timeout
- if the node doesn't join in `max-node-provision-time`
- `ResourceExhausted` error
- if the node doesn't join due to error from cloud provider side, and the error is classified as `ResourceExhausted`
- Scale up operation fails for a node group
As the name suggests, early back-off doesn't wait till `timeout` but backs off when a certain condition is satisfied. This helps in trying other node groups quickly.

Currently early-backoff is enabled only for `ResourceExhausted` errors. Errors classified as `ResourceExhausted` are(and not limited to):
- `out of quota` errors where customer quota is exhausted, and the quota is configurable per zone (not per region). Generally quotas for VMs, cpus, gpus and disks are configurable per zone, but please confirm the same for your cloud provider
- `out of stock` errors where cloud-provider doesn't have enough resources in the particular zone, but the resource is available in other zones
- `not-supported` errors where the instance type or disk type is not supported in the particular zone.
Errors not classified as `ResourceExhausted` are:(and not limited to):
- `invalid credentials`
- `rate limiting`
- `policy constraints defined by customer`
- `service-unavailable` on cloud-provider side
Backoff after `timeout` will happen for errors other than `ResourceExhausted`.
*NOTE:* The identifier for the error might differ for each cloud-provider. The above listed errors are general names used.
**--Caveat during rolling update--**
Case:
- If node-grp `ng-A` is in rolling update, AND
- If the scale-up happens for `ng-A` due to an unschedulable pod `podA`, or a set of pods, AND
- if the node(say `node1`) couldn't join due to `ResourceExhausted`

then autoscaler will early backoff and try to remove the node, but the node removal won't succeed as currently CA is not allowed to perform any scale-down/delete node operation for a rolling update node-grp.
In the above scenario, CA won't try to scale-up any other node-grp for `podA` as it still calculates `node1` to be a possible candidate to join(`ResourceExhausted` errors are recoverable errors).
Scale-up would still work for any new pods that can't fit on upcoming `node1` but can fit on some other node group.
The scale-up would stay blocked for such pod(s) for maximum `max-node-provision-time` , because after that the node won't be considered an upcoming node

Refer issue https://github.com/gardener/autoscaler/issues/154 to track changes made for early-backoff enablement

## For Developer
### How do I sync gardener autoscaler with an upstream autoscaler minor release?

This is helpful in order to offer Gardener CA with latest or recent K8s version. Note that this may also demand a need to upgrade K8s version used by Machine Controller Manager.
Expand Down
17 changes: 6 additions & 11 deletions cluster-autoscaler/cloudprovider/mcm/mcm_cloud_provider.go
Original file line number Diff line number Diff line change
Expand Up @@ -24,13 +24,14 @@ package mcm
import (
"context"
"fmt"
"strings"

apiv1 "k8s.io/api/core/v1"
"k8s.io/apimachinery/pkg/api/resource"
"k8s.io/apimachinery/pkg/labels"
"k8s.io/apimachinery/pkg/types"
"k8s.io/klog/v2"
schedulerframework "k8s.io/kubernetes/pkg/scheduler/framework"
"strings"

"k8s.io/autoscaler/cluster-autoscaler/cloudprovider"
"k8s.io/autoscaler/cluster-autoscaler/config"
Expand Down Expand Up @@ -237,7 +238,7 @@ func ReferenceFromProviderID(m *McmManager, id string) (*Ref, error) {
for _, machine := range machines {
machineID := strings.Split(machine.Spec.ProviderID, "/")
nodeID := strings.Split(id, "/")
// If registered, the ID will match the AWS instance ID.
// If registered, the ID will match the cloudprovider instance ID.
// If unregistered, the ID will match the machine name.
if machineID[len(machineID)-1] == nodeID[len(nodeID)-1] ||
nodeID[len(nodeID)-1] == machine.Name {
Expand Down Expand Up @@ -371,7 +372,7 @@ func (machinedeployment *MachineDeployment) Belongs(node *apiv1.Node) (bool, err
}

// DeleteNodes deletes the nodes from the group. It is expected that this method will not be called
// for nodes not part of ANY machine deployment.
// for nodes which are not part of ANY machine deployment.
func (machinedeployment *MachineDeployment) DeleteNodes(nodes []*apiv1.Node) error {
size, err := machinedeployment.mcmManager.GetMachineDeploymentSize(machinedeployment)
if err != nil {
Expand Down Expand Up @@ -409,17 +410,11 @@ func (machinedeployment *MachineDeployment) Debug() string {

// Nodes returns a list of all nodes that belong to this node group.
func (machinedeployment *MachineDeployment) Nodes() ([]cloudprovider.Instance, error) {
nodeProviderIDs, err := machinedeployment.mcmManager.GetMachineDeploymentNodes(machinedeployment)
instances, err := machinedeployment.mcmManager.GetInstancesForMachineDeployment(machinedeployment)
if err != nil {
return nil, fmt.Errorf("failed to get the nodes backed by the machinedeployment %q, error: %v", machinedeployment.Name, err)
return nil, fmt.Errorf("failed to get the cloudprovider.Instance for machines backed by the machinedeployment %q, error: %v", machinedeployment.Name, err)
}

instances := make([]cloudprovider.Instance, len(nodeProviderIDs))
for i := range nodeProviderIDs {
instances[i] = cloudprovider.Instance{
Id: nodeProviderIDs[i],
}
}
return instances, nil
}

Expand Down
135 changes: 131 additions & 4 deletions cluster-autoscaler/cloudprovider/mcm/mcm_cloud_provider_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -20,14 +20,19 @@ import (
"context"
"errors"
"fmt"
"math"
"strings"
"testing"

machinecodes "github.com/gardener/machine-controller-manager/pkg/util/provider/machinecodes/codes"
"k8s.io/autoscaler/cluster-autoscaler/cloudprovider"
customfake "k8s.io/autoscaler/cluster-autoscaler/cloudprovider/mcm/fakeclient"

"github.com/gardener/machine-controller-manager/pkg/apis/machine/v1alpha1"
. "github.com/onsi/gomega"
corev1 "k8s.io/api/core/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/runtime"
customfake "k8s.io/autoscaler/cluster-autoscaler/cloudprovider/mcm/fakeclient"
"math"
"testing"
)

const (
Expand Down Expand Up @@ -375,7 +380,7 @@ func TestRefresh(t *testing.T) {
nodeGroups: []string{nodeGroup2},
},
expect{
machines: []*v1alpha1.Machine{newMachine("machine-1", "fakeID", nil, "machinedeployment-1", "machineset-1", "1", false)},
machines: []*v1alpha1.Machine{newMachine("machine-1", "fakeID-1", nil, "machinedeployment-1", "machineset-1", "1", false, true)},
err: errors.Join(fmt.Errorf("could not reset priority annotation on machine machine-1, Error: %v", mcUpdateErrorMsg)),
},
},
Expand Down Expand Up @@ -414,3 +419,125 @@ func TestRefresh(t *testing.T) {
})
}
}

// Different kinds of cases possible and expected cloudprovider.Instance returned for them
// (mobj, mobjPid, nodeobj) -> instance(nodeobj.pid,_)
// (mobj, mobjPid, _) -> instance("requested://<machine-name>",_)
// (mobj, _,_) -> instance("requested://<machine-name>",_)
// (mobj, _,_) with quota error -> instance("requested://<machine-name>",status{'creating',{'outofResourcesClass','ResourceExhausted','<message>'}})
// (mobj, _,_) with invalid credentials error -> instance("requested://<machine-name>",_)

// Example machine.status.lastOperation for a `ResourceExhausted` error
//
// lastOperation: {
// type: Creating
// state: Failed
// errorCode: ResourceExhausted
// description: "Cloud provider message - machine codes error: code = [ResourceExhausted] message = [Create machine "shoot--ddci--cbc-sys-tests03-pool-c32m256-3b-z1-575b9-hlvj6" failed: The following errors occurred: [{QUOTA_EXCEEDED Quota 'N2_CPUS' exceeded. Limit: 6000.0 in region europe-west3. [] []}]]."
// }
// }
func TestNodes(t *testing.T) {
const (
outOfQuotaMachineStatusErrorDescription = "Cloud provider message - machine codes error: code = [ResourceExhausted] message = [Create machine \"machine-with-vm-create-error-out-of-quota\" failed: The following errors occurred: [{QUOTA_EXCEEDED Quota 'N2_CPUS' exceeded. Limit: 6000.0 in region europe-west3. [] []}]]"
invalidCredentialsMachineStatusErrorDescription = "Cloud provider message - machine codes error: code = [Internal] message = [user is not authorized to perform this action]"
)
type expectationPerInstance struct {
providerID string
instanceState cloudprovider.InstanceState
instanceErrorClass cloudprovider.InstanceErrorClass
instanceErrorCode string
instanceErrorMessage string
}
type expect struct {
expectationPerInstanceList []expectationPerInstance
}
type data struct {
name string
setup setup
expect expect
}
table := []data{
{
"Correct instances should be returned for machine objects under the machinedeployment",
setup{
nodes: []*corev1.Node{newNode("node-1", "fakeID-1", false)},
machines: func() []*v1alpha1.Machine {
allMachines := make([]*v1alpha1.Machine, 0, 5)
allMachines = append(allMachines, newMachine("machine-with-registered-node", "fakeID-1", nil, "machinedeployment-1", "", "", false, true))
allMachines = append(allMachines, newMachine("machine-with-vm-but-no-node", "fakeID-2", nil, "machinedeployment-1", "", "", false, false))
allMachines = append(allMachines, newMachine("machine-with-vm-creating", "", nil, "machinedeployment-1", "", "", false, false))
allMachines = append(allMachines, newMachine("machine-with-vm-create-error-out-of-quota", "", &v1alpha1.MachineStatus{LastOperation: v1alpha1.LastOperation{Type: v1alpha1.MachineOperationCreate, State: v1alpha1.MachineStateFailed, ErrorCode: machinecodes.ResourceExhausted.String(), Description: outOfQuotaMachineStatusErrorDescription}}, "machinedeployment-1", "", "", false, false))
allMachines = append(allMachines, newMachine("machine-with-vm-create-error-invalid-credentials", "", &v1alpha1.MachineStatus{LastOperation: v1alpha1.LastOperation{Type: v1alpha1.MachineOperationCreate, State: v1alpha1.MachineStateFailed, ErrorCode: machinecodes.Internal.String(), Description: invalidCredentialsMachineStatusErrorDescription}}, "machinedeployment-1", "", "", false, false))
return allMachines
}(),
machineDeployments: newMachineDeployments(1, 2, nil, nil, nil),
nodeGroups: []string{nodeGroup1},
},
expect{
expectationPerInstanceList: []expectationPerInstance{
{"fakeID-1", cloudprovider.InstanceState(-1), cloudprovider.InstanceErrorClass(-1), "", ""},
{placeholderInstanceIDForMachineObj("machine-with-vm-but-no-node"), cloudprovider.InstanceState(-1), cloudprovider.InstanceErrorClass(-1), "", ""},
{placeholderInstanceIDForMachineObj("machine-with-vm-creating"), cloudprovider.InstanceState(-1), cloudprovider.InstanceErrorClass(-1), "", ""},
{placeholderInstanceIDForMachineObj("machine-with-vm-create-error-out-of-quota"), cloudprovider.InstanceCreating, cloudprovider.OutOfResourcesErrorClass, machinecodes.ResourceExhausted.String(), outOfQuotaMachineStatusErrorDescription},
// invalid credentials error is mapped to Internal code as it can't be fixed by trying another zone
{placeholderInstanceIDForMachineObj("machine-with-vm-create-error-invalid-credentials"), cloudprovider.InstanceState(-1), cloudprovider.InstanceErrorClass(-1), "", ""},
},
},
},
}

for _, entry := range table {
entry := entry // have a shallow copy of the entry for parallelization of tests
t.Run(entry.name, func(t *testing.T) {
t.Parallel()
g := NewWithT(t)
stop := make(chan struct{})
defer close(stop)
controlMachineObjects, targetCoreObjects := setupEnv(&entry.setup)
m, trackers, hasSyncedCacheFns := createMcmManager(t, stop, testNamespace, nil, controlMachineObjects, targetCoreObjects)
defer trackers.Stop()
waitForCacheSync(t, stop, hasSyncedCacheFns)

if entry.setup.targetCoreFakeResourceActions != nil {
trackers.TargetCore.SetFailAtFakeResourceActions(entry.setup.targetCoreFakeResourceActions)
}
if entry.setup.controlMachineFakeResourceActions != nil {
trackers.ControlMachine.SetFailAtFakeResourceActions(entry.setup.controlMachineFakeResourceActions)
}

md, err := buildMachineDeploymentFromSpec(entry.setup.nodeGroups[0], m)
g.Expect(err).To(BeNil())

returnedInstances, err := md.Nodes()
g.Expect(err).To(BeNil())
g.Expect(len(returnedInstances)).To(BeNumerically("==", len(entry.expect.expectationPerInstanceList)))

for _, expectedInstance := range entry.expect.expectationPerInstanceList {
found := false
for _, gotInstance := range returnedInstances {
g.Expect(gotInstance.Id).ToNot(BeEmpty())
if expectedInstance.providerID == gotInstance.Id {
if !strings.Contains(gotInstance.Id, "requested://") {
// must be a machine obj whose node is registered (ready or notReady)
g.Expect(gotInstance.Status).To(BeNil())
} else {
if int(expectedInstance.instanceState) != -1 {
g.Expect(gotInstance.Status).ToNot(BeNil())
g.Expect(gotInstance.Status.State).To(Equal(expectedInstance.instanceState))
}
if int(expectedInstance.instanceErrorClass) != -1 || expectedInstance.instanceErrorCode != "" || expectedInstance.instanceErrorMessage != "" {
g.Expect(gotInstance.Status.ErrorInfo).ToNot(BeNil())
g.Expect(gotInstance.Status.ErrorInfo.ErrorClass).To(Equal(expectedInstance.instanceErrorClass))
g.Expect(gotInstance.Status.ErrorInfo.ErrorCode).To(Equal(expectedInstance.instanceErrorCode))
g.Expect(gotInstance.Status.ErrorInfo.ErrorMessage).To(Equal(expectedInstance.instanceErrorMessage))
}
}
found = true
break
}
}
g.Expect(found).To(BeTrue())
}
})
}
}
80 changes: 54 additions & 26 deletions cluster-autoscaler/cloudprovider/mcm/mcm_manager.go
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,6 @@ import (
"errors"
"flag"
"fmt"
"k8s.io/autoscaler/cluster-autoscaler/utils/taints"
"math/rand"
"net/http"
"os"
Expand All @@ -41,6 +40,7 @@ import (
machineapi "github.com/gardener/machine-controller-manager/pkg/client/clientset/versioned/typed/machine/v1alpha1"
machineinformers "github.com/gardener/machine-controller-manager/pkg/client/informers/externalversions"
machinelisters "github.com/gardener/machine-controller-manager/pkg/client/listers/machine/v1alpha1"
machinecodes "github.com/gardener/machine-controller-manager/pkg/util/provider/machinecodes/codes"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/api/core/v1"
kube_errors "k8s.io/apimachinery/pkg/api/errors"
Expand All @@ -55,6 +55,7 @@ import (
"k8s.io/autoscaler/cluster-autoscaler/config/dynamic"
"k8s.io/autoscaler/cluster-autoscaler/processors/nodegroupset"
"k8s.io/autoscaler/cluster-autoscaler/utils/gpu"
"k8s.io/autoscaler/cluster-autoscaler/utils/taints"
"k8s.io/client-go/discovery"
coreinformers "k8s.io/client-go/informers"
corelisters "k8s.io/client-go/listers/core/v1"
Expand Down Expand Up @@ -602,43 +603,70 @@ func (m *McmManager) retry(fn func(ctx context.Context) (bool, error), resourceT
}
}

// GetMachineDeploymentNodes returns the set of Nodes which belongs to the MachineDeployment.
func (m *McmManager) GetMachineDeploymentNodes(machinedeployment *MachineDeployment) ([]string, error) {
md, err := m.machineDeploymentLister.MachineDeployments(m.namespace).Get(machinedeployment.Name)
if err != nil {
return nil, fmt.Errorf("Unable to fetch MachineDeployment object %s, Error: %v", machinedeployment.Name, err)
}
// GetInstancesForMachineDeployment returns list of cloudprovider.Instance for machines which belongs to the MachineDeployment.
func (m *McmManager) GetInstancesForMachineDeployment(machinedeployment *MachineDeployment) ([]cloudprovider.Instance, error) {
var (
list = []string{machinedeployment.Name}
selector = labels.NewSelector()
req, _ = labels.NewRequirement("name", selection.Equals, list)
)

machineList, err := m.machineLister.Machines(m.namespace).List(labels.Everything())
selector = selector.Add(*req)
machineList, err := m.machineLister.Machines(m.namespace).List(selector)
if err != nil {
return nil, fmt.Errorf("Unable to fetch list of Machine objects %v", err)
return nil, fmt.Errorf("unable to fetch list of Machine objects %v for machinedeployment %q", err, machinedeployment.Name)
}

nodeList, err := m.nodeLister.List(labels.Everything())
if err != nil {
return nil, fmt.Errorf("Unable to fetch list of Nodes %v", err)
return nil, fmt.Errorf("unable to fetch list of Nodes %v", err)
}

var nodes []string
instances := make([]cloudprovider.Instance, 0, len(machineList))
// Bearing O(n2) complexity, assuming we will not have lot of nodes/machines, open for optimisations.
for _, machine := range machineList {
if strings.Contains(machine.Name, md.Name) {
var found bool
for _, node := range nodeList {
if machine.Labels["node"] == node.Name {
nodes = append(nodes, node.Spec.ProviderID)
found = true
break
}
}
if !found {
// No node found - either the machine has not registered yet or AWS is unable to fulfill the request.
// Report a special ID so that the autoscaler can track it as an unregistered node.
nodes = append(nodes, fmt.Sprintf("requested://%s", machine.Name))
}
instance := findMatchingInstance(nodeList, machine)
instances = append(instances, instance)
}
return instances, nil
}

func findMatchingInstance(nodes []*v1.Node, machine *v1alpha1.Machine) cloudprovider.Instance {
for _, node := range nodes {
if machine.Labels["node"] == node.Name {
return cloudprovider.Instance{Id: node.Spec.ProviderID}
}
}
// No k8s node found , one of the following cases possible
// - MCM is unable to fulfill the request to create VM.
// - VM is being created
// - the VM is up but has not registered yet

// Report instance with a special placeholder ID so that the autoscaler can track it as an unregistered node.
// Report InstanceStatus only for `ResourceExhausted` errors
return cloudprovider.Instance{
Id: placeholderInstanceIDForMachineObj(machine.Name),
Status: checkAndGetResourceExhaustedInstanceStatus(machine),
}
}

func placeholderInstanceIDForMachineObj(name string) string {
return fmt.Sprintf("requested://%s", name)
}

// checkAndGetResourceExhaustedInstanceStatus returns cloudprovider.InstanceStatus for the machine obj
func checkAndGetResourceExhaustedInstanceStatus(machine *v1alpha1.Machine) *cloudprovider.InstanceStatus {
if machine.Status.LastOperation.Type == v1alpha1.MachineOperationCreate && machine.Status.LastOperation.State == v1alpha1.MachineStateFailed && machine.Status.LastOperation.ErrorCode == machinecodes.ResourceExhausted.String() {
return &cloudprovider.InstanceStatus{
State: cloudprovider.InstanceCreating,
ErrorInfo: &cloudprovider.InstanceErrorInfo{
ErrorClass: cloudprovider.OutOfResourcesErrorClass,
ErrorCode: machinecodes.ResourceExhausted.String(),
ErrorMessage: machine.Status.LastOperation.Description,
},
}
}
return nodes, nil
return nil
}

// validateNodeTemplate function validates the NodeTemplate object of the MachineClass
Expand Down
Loading

0 comments on commit da973f4

Please sign in to comment.