Unable to destroy VM during first terraform destroy run: unexpected state 'RUNNING', wanted target 'DONE' #132

meise · 2021-07-15T09:28:49Z

Error Description

In my terraform/OpenNebula combination it's not possible to destroy an opennebula_virtual_machine during first terraform destroy run. Terraform always fails with error message: Error: Error waiting for virtual machine (546) to be in state DONE: unexpected state 'RUNNING', wanted target 'DONE'. last error: %!s(<nil>) (state: ACTIVE, lcmState: RUNNING). I have to run terraform destroy multiple times to delete all VM resources.

Code snippets

As far as I can understand the code, after reaching the 10s destroying wait delay, its expected to retry waitForVMState.

terraform-provider-opennebula/opennebula/resource_opennebula_virtual_machine.go

Line 998 in 2017157

// Retry if timeout not reached

But in my case, vmState == vm.Active && vmLcmState == vm.EpilogFailure is always false.

terraform-provider-opennebula/opennebula/resource_opennebula_virtual_machine.go

Line 1004 in 2017157

if vmState == vm.Active && vmLcmState == vm.EpilogFailure {

Versions

OpenNebula 5.12.0.3
opennebula-module: 0.3.0
terraform: 1.0.1

Resource

resource "opennebula_virtual_machine" "primary" {
  count       = var.primary_nodes
  name        = "primary-${random_string.primary_node_name[count.index].result}.${var.cluster_fqdn}"
  cpu         = 1
  vcpu        = 2
  memory      = 2048

  context = {
    NETWORK = "YES"
    SET_HOSTNAME = "$NAME"
  }

  graphics {
    type   = "VNC"
    listen = "0.0.0.0"
    keymap = "de"
  }

  os {
    arch = "x86_64"
    boot = "disk0"
  }

  disk {
    image_id = var.image_id
    size     = 10000
    target   = "vda"
    driver   = "raw"
  }

  nic {
    network_id      = var.network_id
    security_groups = [opennebula_security_group.k8s-primary.id]
  }

  tags = {
    role         = "primary_node"
    node_type_id = count.index
    environment = "dev"
  }
}

Error message

Error: Error waiting for virtual machine (546) to be in state DONE: unexpected state 'RUNNING', wanted target 'DONE'. last error: %!s(<nil>) (state: ACTIVE, lcmState: RUNNING)

Log

destroying_vm.log

The text was updated successfully, but these errors were encountered:

meise · 2021-07-15T10:25:33Z

But in my case, vmState == vm.Active && vmLcmState == vm.EpilogFailure is always false.

Added some more Debug output to module. The following values are evaluated:
vmState: '3' == vm.Active: '3' && vmLcmState: '3' == vm.EpilogFailure: '40'

jaypif · 2021-07-15T12:02:55Z

Hi @meise ,

Thank you for your issue.

In the attached file I cannot see all the attempts while the default value for timeout is 3 minutes.
Is the destroy failing after 3 minutes or directly after the first 10 seconds ?

It looks like the VM is still active after the terminate hard command

Thanks

meise · 2021-07-15T12:12:32Z

Thank you @jaypif for your quick response.

The destroy fails after 10 seconds. Maybe our cluster is just slower, compared to other users of the provider.
Is it possible to increase the wait delay in any way?

github-actions · 2022-04-22T00:09:49Z

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 5 days

treywelsh · 2022-06-15T15:08:48Z

I reproduced the problem in delaying the terminate action (I put it in a goroutine that start with a big sleep of 10 sec minimum).
The goal was to keep the VM in the RUNNING state to simulate slow OpenNebula setup, and see what happened at first
waitForVMState check.

At first glance, in the implementation of waitForVMState, there is two interesting fields:

func waitForVMState(vmc *goca.VMController, timeout int, states ...string) (interface{}, error) {

	stateConf := &resource.StateChangeConf{
		Pending: []string{"anythingelse"},
		...
		Delay:      10 * time.Second,
                ...
	}

	return stateConf.WaitForState()
}

From the doc:
Delay: Wait this time before starting checks
Pending: States that are "allowed" and will continue trying

From here, I tried to add RUNNING to Pending field and this seems to work.

However, it's a quick fix so I'll still need to investigate more to propose something better.
There is a lot of VM states and transition and I want to try a compromise between something that fix this problem in a simple way, i.e. without enumerating a bunch of states (See states ref for ONE 6.4: https://docs.opennebula.io/6.4/integration_and_development/references/vm_states.html#vm-states)

github-actions bot added the status: stale label Apr 22, 2022

frousselet added bug and removed status: stale labels Apr 22, 2022

frousselet added this to the v0.5.1 milestone Jun 15, 2022

frousselet added the status: confirmed label Jun 24, 2022

treywelsh added a commit that referenced this issue Jun 24, 2022

B #132: rework waitForVMState function

b0e8806

treywelsh added a commit that referenced this issue Jun 24, 2022

B #132: rework waitForVMState function

1502952

treywelsh mentioned this issue Jun 24, 2022

B-132: wait vm state #301

Closed

7 tasks

treywelsh added a commit to treywelsh/terraform-provider-opennebula that referenced this issue Jun 24, 2022

B OpenNebula#132: rework waitForVMState function

ebb9bc3

treywelsh added a commit that referenced this issue Jun 24, 2022

B #132: rework waitForVMState function

3bcea38

treywelsh mentioned this issue Jun 24, 2022

B-132: wait vm state rework #302

Merged

7 tasks

treywelsh added a commit that referenced this issue Jun 24, 2022

B #132: rework waitForVMState function

76b0bcd

treywelsh added a commit that referenced this issue Jun 24, 2022

B #132: rework waitForVMState function

43ea8b5

treywelsh added a commit that referenced this issue Jun 27, 2022

B #132: rework waitForVMState function

fbec328

treywelsh added a commit that referenced this issue Jun 28, 2022

B #132: rework waitForVMState function

e3e6777

treywelsh added a commit that referenced this issue Jun 28, 2022

B #132: rework waitForVMState function

aa1e411

treywelsh added a commit that referenced this issue Jun 28, 2022

B #132: destroy a VM detach the VR NIC

2cc7f9e

treywelsh added a commit that referenced this issue Jun 29, 2022

B #132: rework waitForVMState function

34634a9

treywelsh added a commit that referenced this issue Jun 29, 2022

B #132: destroy a VM detach the VR NIC

981009c

treywelsh added a commit that referenced this issue Jun 29, 2022

B #132: update changelog

12c42bf

frousselet closed this as completed in #302 Jun 30, 2022

frousselet pushed a commit that referenced this issue Jun 30, 2022

B #132: rework waitForVMState function

5d21783

frousselet pushed a commit that referenced this issue Jun 30, 2022

B #132: destroy a VM detach the VR NIC

faf8801

frousselet pushed a commit that referenced this issue Jun 30, 2022

B #132: update changelog

58a3e86

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to destroy VM during first terraform destroy run: unexpected state 'RUNNING', wanted target 'DONE' #132

Unable to destroy VM during first terraform destroy run: unexpected state 'RUNNING', wanted target 'DONE' #132

meise commented Jul 15, 2021

meise commented Jul 15, 2021

jaypif commented Jul 15, 2021

meise commented Jul 15, 2021

github-actions bot commented Apr 22, 2022

treywelsh commented Jun 15, 2022 •

edited

Loading

Unable to destroy VM during first terraform destroy run: unexpected state 'RUNNING', wanted target 'DONE' #132

Unable to destroy VM during first terraform destroy run: unexpected state 'RUNNING', wanted target 'DONE' #132

Comments

meise commented Jul 15, 2021

Error Description

Code snippets

Versions

Resource

Error message

Log

meise commented Jul 15, 2021

jaypif commented Jul 15, 2021

meise commented Jul 15, 2021

github-actions bot commented Apr 22, 2022

treywelsh commented Jun 15, 2022 • edited Loading

treywelsh commented Jun 15, 2022 •

edited

Loading