-
Notifications
You must be signed in to change notification settings - Fork 247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 Power off nodes upon deletion #1176
Conversation
/test-centos-integration-main |
/lgtm |
Heads up: centOS failure is unrelated to these changes, we are facing issues with CI. xref: https://kubernetes.slack.com/archives/CHD49TLE7/p1666273231196429 |
what is the progress here? |
72efc2a
to
8b41131
Compare
/test-centos-integration-main |
/lgtm |
/lgtm cancel One issue inline, otherwise looking good. |
8b41131
to
2f4593e
Compare
2f4593e
to
277a166
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refamiliarising myself with how all of this works 😅
277a166
to
77fdf84
Compare
|
||
if err != nil { | ||
if info.host.Status.ErrorCount < maxPowerOffRetryCount { | ||
return actionError{errors.Wrap(err, "failed to power off")} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not seeing anything addressing my earlier comment about an infinite error loop in cases where the node is missing from ironic and we don't have credentials to re-register it, which we handle in deprovisioning here and also now need to handle in deleting.
I wonder if this would all be made simpler by putting the new code into a separate actionPowerOffBeforeDeleting()
method, so that the state machine code can easily distinguish between errors coming from the power off vs. the delete.
} | ||
} | ||
|
||
info.host.Status.ErrorCount = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned in the last part of https://github.com/metal3-io/baremetal-operator/pull/1176/files#r1053444986, iff the error count wasn't already 0 we'll want to return actionUpdate wrapping actionContinue on line 533.
} | ||
return result | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably also set Status.PoweredOn
to false so that we are reporting what we've actually done.
77fdf84
to
b61cb84
Compare
This might be overkill but the suggested changes led me to create a new step in the state machine. When a delete is requested, instead of When we clear |
b61cb84
to
1630c13
Compare
/test-centos-integration-main |
/test-centos-e2e-integration-main |
1630c13
to
2f38157
Compare
@@ -561,6 +566,37 @@ func (hsm *hostStateMachine) handleDeprovisioning(info *reconcileInfo) actionRes | |||
return actResult | |||
} | |||
|
|||
func (hsm *hostStateMachine) handlePoweringOffBeforeDelete(info *reconcileInfo) actionResult { | |||
actResult := hsm.Reconciler.actionPowerOffBeforeDeleting(hsm.Provisioner, info) | |||
skipToDelete := func() actionResult { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: most of this function is repeating handleDeprovisioning. It would be great to refactor them.
/test-centos-e2e-integration-main |
/assign @zaneb |
We introduce a new step in the state machine where the node goes through a power off stage before it's deleted. We attempt to power it off 3 times before giving up, and proceeding to the delete.
2f38157
to
6f65d8e
Compare
/lgtm |
/test-centos-e2e-integration-main |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dtantsur The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This is a continuation of #816 which in turn tries to fix #410.
Co-authored-by: Sandhya Dasu sadasu@redhat.com @sadasu