Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Prevent hosts from getting into infinite power-off loops #1356

Merged
merged 3 commits into from
Oct 19, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 12 additions & 1 deletion controllers/metal3.io/host_state_machine.go
Original file line number Diff line number Diff line change
Expand Up @@ -225,6 +225,9 @@ func (hsm *hostStateMachine) checkInitiateDelete(log logr.Logger) bool {
switch hsm.NextState {
default:
hsm.NextState = metal3api.StatePoweringOffBeforeDelete
case metal3api.StateRegistering, metal3api.StateUnmanaged, metal3api.StateNone:
// Skip the power off before delete
hsm.NextState = metal3api.StateDeleting
case metal3api.StateProvisioning, metal3api.StateProvisioned:
if hsm.Host.OperationalStatus() == metal3api.OperationalStatusDetached {
if delayDeleteForDetachedHost(hsm.Host) {
Expand All @@ -238,12 +241,20 @@ func (hsm *hostStateMachine) checkInitiateDelete(log logr.Logger) bool {
hsm.NextState = metal3api.StateDeprovisioning
}
case metal3api.StateDeprovisioning:
if hsm.Host.Status.ErrorType == metal3api.RegistrationError && hsm.Host.Status.ErrorCount > 3 {
hsm.NextState = metal3api.StateDeleting
return true
}
// Allow state machine to run to continue deprovisioning.
return false
case metal3api.StateDeleting:
// Already in deleting state. Allow state machine to run.
return false
case metal3api.StatePoweringOffBeforeDelete:
if hsm.Host.Status.ErrorType == metal3api.RegistrationError && hsm.Host.Status.ErrorCount > 3 {
hsm.NextState = metal3api.StateDeleting
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't return true here, then we'll continue to process the state machine as if the current state were already Deleting, but without updating it in the Status unless handleDeleting() happens to return actionUpdate or actionComplete. Functionally this is OK - ensureRegistered() will look at the next state and not bother trying to re-register. But for the sake of consistency with all the other cases of changing hsm.NextState in this function, I think we should return true when we set it so that the user can explicitly see what is happening at each step.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return statement added here, and also at the deprovisioning state

return true
}
// Already in powering off state. Allow state machine to run.
return false
}
Expand Down Expand Up @@ -327,7 +338,7 @@ func (hsm *hostStateMachine) ensureRegistered(info *reconcileInfo) (result actio
case metal3api.StateMatchProfile:
// Backward compatibility, remove eventually
return
case metal3api.StateDeleting, metal3api.StatePoweringOffBeforeDelete:
case metal3api.StateDeleting:
// In the deleting state the whole idea is to de-register the host
return
case metal3api.StateRegistering:
Expand Down
4 changes: 2 additions & 2 deletions docs/BaremetalHost_ProvisioningState.dot
Original file line number Diff line number Diff line change
Expand Up @@ -5,14 +5,14 @@ digraph BaremetalHost {

Unmanaged [shape=doublecircle]
Unmanaged -> Registering [label="BMC.* != \"\""]
Unmanaged -> PoweringOffBeforeDelete [label="!DeletionTimestamp.IsZero()"]
Unmanaged -> Deleting [label="!DeletionTimestamp.IsZero()"]

ExternallyProvisioned [label="Externally\nProvisioned"]

Registering -> Inspecting [label="!externallyProvisioned &&\nNeedsHardwareInspection()"]
Registering -> Preparing [label="!externallyProvisioned &&\ninspectionDisabled()"]
Registering -> ExternallyProvisioned [label="externallyProvisioned"]
Registering -> PoweringOffBeforeDelete [label="!DeletionTimestamp.IsZero()"]
Registering -> Deleting [label="!DeletionTimestamp.IsZero()"]

ExternallyProvisioned -> Inspecting [label="!externallyProvisioned &&\nNeedsHardwareInspection()"]
ExternallyProvisioned -> Preparing [label="!externallyProvisioned &&\n!NeedsHardwareInspection()"]
Expand Down
Binary file modified docs/BaremetalHost_ProvisioningState.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.