Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Annotate machine-deployment if machine-types are not available in cloud #454

Closed
wants to merge 1 commit into from

Conversation

hardikdr
Copy link
Member

@hardikdr hardikdr commented Apr 29, 2020

What this PR does / why we need it:
This PR adds the annotation at the nodeSpec of the MachineDeployment, whenever a specific pattern is found in the creation-logs.
It is also mainly motivated due to the resource-scarcity at Azure, where certain machine-types are not available. MCM adds the annotation on such MachineDeployment, which becomes the hint for the autoscaler to keep trying to scale-up it up.

Related to this: gardener/autoscaler#35

Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:

Release note:

MCM adds the annotation `machine.sapcloud.io/machine-type-not-available` whenever certain machine-types are not available in Azure and AWS.

@gardener-robot-ci-2 gardener-robot-ci-2 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Apr 29, 2020
@gardener-robot-ci-3 gardener-robot-ci-3 added needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Apr 29, 2020
@hardikdr hardikdr force-pushed the feature/oor-annotation branch from 40a10a1 to 50d3104 Compare May 2, 2020 12:06
@gardener-robot-ci-3 gardener-robot-ci-3 added reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels May 2, 2020
@hardikdr hardikdr changed the title WIP: Annotate machine-deployment if machine-types are not available in cloud Annotate machine-deployment if machine-types are not available in cloud May 2, 2020
@hardikdr hardikdr force-pushed the feature/oor-annotation branch from 50d3104 to 5574a53 Compare May 2, 2020 12:19
@gardener-robot-ci-2 gardener-robot-ci-2 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label May 2, 2020
@gardener-robot-ci-1 gardener-robot-ci-1 removed the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label May 2, 2020
@hardikdr hardikdr marked this pull request as ready for review May 2, 2020 12:21
@hardikdr hardikdr requested review from ggaurav10 and a team as code owners May 2, 2020 12:21
@hardikdr
Copy link
Member Author

hardikdr commented May 2, 2020

/test

@ghost ghost added the kind/test Test label May 2, 2020
@hardikdr hardikdr added reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed kind/test Test labels May 2, 2020
@gardener-robot-ci-2 gardener-robot-ci-2 removed the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label May 3, 2020

_, errUpdate := c.controlMachineClient.MachineDeployments(machine.Namespace).Update(mdCopy)
if errUpdate != nil {
klog.Errorf("Error updating the machine-deployment %+v", errUpdate)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not returning the error here, to keep the error the intact from the cloud-provider, or should we return the errUpdate ?

pkg/controller/deployment.go Show resolved Hide resolved
@@ -428,6 +429,27 @@ func (c *controller) machineCreate(machine *v1alpha1.Machine, driver driver.Driv
klog.Warning(err)
}

// Check if the creation-error logs contain the specific pattern, and annotate the machine-deployment if found.
contains, _ := c.containsSpecialPattern(err.Error(), MachineTypeNotAvailableAzure, MachineTypeNotAvailableAWS)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should move this logic into the machine deployment controller. We shouldn't be updating machine deployments in machine reconciliation. The machine deployment controller can look up for these patterns in failedMachines field?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, that's a valid point.
I had actually started implementing this into the deployment controller. But later moved it here. First of all the log-patterns are extremely cloud-provider specific, may keep changing, and providers may want to add more such patterns in the future. I felt the best place for this logic is to be at provider-specific machine-controllers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern being that, the machine controller on OOT will not have access to machineDeployments. It will not be able to update this. One alternate approach could be this - gardener/autoscaler#37 (comment)?

@hardikdr
Copy link
Member Author

Update: We hold the development here unless we see the need for improvement again.

Let's re-open later if required.
/close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants