Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Annotate machine-deployment if machine-types are not available in cloud #454

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions pkg/controller/constants.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
package controller

// TODO: Split and move this constants to provider-specific machine-controllers in future.
const (
// MachineTypeNotAvailableAzure is a log message depicting certain machine-types are not available in the azure-cloud.
MachineTypeNotAvailableAzure = "The requested size for resource '.*' is currently not available in location '.*' zones '.*' for subscription '.*'"

// MachineTypeNotAvailableAWS is a log message depicting certain machine-types are not available in the azure-cloud.
MachineTypeNotAvailableAWS = "Unsupported: Your requested instance type (.*) is not supported in your requested Availability Zone (.*). Please retry your request by not specifying an Availability Zone or choosing .*"

// MachineTypeNotAvailableAnnotation annotation is put on machine and node obejcts when cloud-provider is out of certain machine-types.
MachineTypeNotAvailableAnnotation = "machine.sapcloud.io/machine-type-not-available"
)

// Original error messages:
// AWS:
// Cloud provider message - Unsupported: Your requested instance type (m5.4xlarge) is not supported in your requested Availability Zone (us-east-1e). Please retry your request by not specifying an Availability Zone or choosing us-east-1b, us-east-1d, us-east-1a, us-east-1f, us-east-1c.

// Azure:
// The requested size for resource '/subscriptions/123c2-4-1234/resourceGroups/shoot--12-12/providers/Microsoft.Compute/virtualMac2321321312/1234-nz7vw' is currently not available in location 'westeurope' zones '1' for subscription '2c3d1231-s21321321-8c04-2711234'.
2 changes: 1 addition & 1 deletion pkg/controller/deployment.go
Original file line number Diff line number Diff line change
Expand Up @@ -302,7 +302,7 @@ func (dc *controller) getMachineDeploymentForMachine(machine *v1alpha1.Machine)
// No controller owns this Machine.
return nil
}
if controllerRef.Kind != "MachineDeployment" { //TODO: Remove hardcoded string
if controllerRef.Kind != "MachineSet" { //TODO: Remove hardcoded string
hardikdr marked this conversation as resolved.
Show resolved Hide resolved
// Not a Machine owned by a machine set.
return nil
}
Expand Down
37 changes: 37 additions & 0 deletions pkg/controller/machine.go
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ import (
"errors"
"fmt"
"math"
"regexp"
"strings"
"time"

Expand Down Expand Up @@ -428,6 +429,27 @@ func (c *controller) machineCreate(machine *v1alpha1.Machine, driver driver.Driv
klog.Warning(err)
}

// Check if the creation-error logs contain the specific pattern, and annotate the machine-deployment if found.
contains, _ := c.containsSpecialPattern(err.Error(), MachineTypeNotAvailableAzure, MachineTypeNotAvailableAWS)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should move this logic into the machine deployment controller. We shouldn't be updating machine deployments in machine reconciliation. The machine deployment controller can look up for these patterns in failedMachines field?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, that's a valid point.
I had actually started implementing this into the deployment controller. But later moved it here. First of all the log-patterns are extremely cloud-provider specific, may keep changing, and providers may want to add more such patterns in the future. I felt the best place for this logic is to be at provider-specific machine-controllers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern being that, the machine controller on OOT will not have access to machineDeployments. It will not be able to update this. One alternate approach could be this - gardener/autoscaler#37 (comment)?


if contains {
md := c.getMachineDeploymentForMachine(machine)
if md != nil {
mdCopy := md.DeepCopy()
annotations := mdCopy.Spec.Template.Spec.NodeTemplateSpec.Annotations
if annotations == nil {
annotations = make(map[string]string)
}
annotations[MachineTypeNotAvailableAnnotation] = "True"
mdCopy.Spec.Template.Spec.NodeTemplateSpec.Annotations = annotations

_, errUpdate := c.controlMachineClient.MachineDeployments(machine.Namespace).Update(mdCopy)
if errUpdate != nil {
klog.Errorf("Error updating the machine-deployment %+v", errUpdate)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not returning the error here, to keep the error the intact from the cloud-provider, or should we return the errUpdate ?

}
}
}

return err
}
klog.V(2).Infof("Created/Adopted machine: %q, MachineID: %s", machine.Name, actualProviderID)
Expand Down Expand Up @@ -1025,3 +1047,18 @@ func decodeMachineID(id string) string {
splitProviderID := strings.Split(id, "/")
return splitProviderID[len(splitProviderID)-1]
}

// containsSpecialPattern checks if the errorString contains any of the patterns
func (c *controller) containsSpecialPattern(errorString string, patterns ...string) (bool, error) {
for _, pattern := range patterns {
matched, err := regexp.MatchString(pattern, errorString)
if err != nil {
klog.Errorf("Error matching the string %+v", err)
return false, err
}
if matched {
return true, nil
}
}
return false, nil
}