Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aws_volume_attachment workflow with skip_destroy #1017

Closed
gtmtech opened this issue Jun 30, 2017 · 7 comments · Fixed by #21144
Closed

aws_volume_attachment workflow with skip_destroy #1017

gtmtech opened this issue Jun 30, 2017 · 7 comments · Fixed by #21144
Assignees
Labels
enhancement Requests to existing resources that expand the functionality or scope. service/ec2 Issues and PRs that pertain to the ec2 service. upstream-terraform Addresses functionality related to the Terraform core binary.
Milestone

Comments

@gtmtech
Copy link

gtmtech commented Jun 30, 2017

Terraform 0.9.8

Firstly, I feel that aws_volume_attachment should always have skip_destroy=true as default. aws_volume_attachments are notoriously tricky in terraform, because they often prevent destruction of resources.

For example, terraform up an aws_instance, an aws_ebs_volume, and an aws_volume_attachment that connects them (without skip_destroy), and then try and plan -destroy all 3 (and apply)

Terraform will simply do this:

Error applying plan:
1 error(s) occurred:
* aws_volume_attachment.foo (destroy): 1 error(s) occurred:
* aws_volume_attachment.foo: Error waiting for Volume (vol-0398c9b5a8017xxxx) to detach from Instance: i-023644c6c4c02xxxx

Worse still, the skip_destroy flag must be successfully APPLIED to the resource IN THE STATEFILE, before you have any hope of terraform destroying it. If the skip_destroy="true" flag is merely on the aws_volume_attachment resource .tf file, and you try and destroy the resource, you still get the above timeout error. - this means the docs are technically wrong.

Sometimes, you destroy resources simply by deleting the resource declarations in the terraform code and plan/applying - as this does not work for the same reasons (skip_destroy has not been applied in the statefile), it just means that you then have an impossible situation and have to revert the codebase back, in order to add the flag and then terraform apply before terraform destroying.

One of three things should happen (in my view):

  • skip_destroy defaults to true
  • skip_destroy changes so that it skips the destroy only if there's a timeout, and it does not require being applied to the statefile first, but just works by its presence.
  • the destruction of the aws_volume_attachment resource in conjunction with the destruction of its aws_instance or its aws_ebs_volume should delete the aws_instance or aws_ebs_volume first -- as this will then allow the destruction of the aws_volume_attachment resource without force flag.

Unfortunately there is no lifecycle event to swap the order of destruction of dependencies for (3).

Any comments appreciated - its a bit of a thorny workflow at the moment.

@radeksimko radeksimko added enhancement Requests to existing resources that expand the functionality or scope. upstream-terraform Addresses functionality related to the Terraform core binary. labels Oct 23, 2017
@njam
Copy link

njam commented Oct 28, 2017

For example, terraform up an aws_instance, an aws_ebs_volume, and an aws_volume_attachment that connects them (without skip_destroy), and then try and plan -destroy all 3 (and apply) [...] Terraform will simply do this: Error applying plan

I can't confirm this behaviour with terraform 0.10.8 and terraform-aws 1.1.0.
All 3 resources are successfully destroyed.
Was this maybe fixed?

Example code: https://gist.github.com/njam/cf572606f23625b941aa7ab61e2569b3

@nemosupremo
Copy link

@njam

I'm currently trying to figure out how to deal with this, I think the problem with your plan, and the problem that OP missed in his workflow, is the timeout commonly occurs when you try to destroy an attachment while it is still mounted. If you unmount the volume first, then run terraform apply, the destroy succeeds as you noticed. Likewise, you can probably replicate OP's issue by mounting the drive first.

I just started with Terraform and I "solved" the problem like so:

resource "aws_volume_attachment" "pritunl_att_data" {
  device_name = "/dev/sdd"
  instance_id = "${aws_instance.pritunl.id}"
  volume_id   = "${aws_ebs_volume.pritunl_data.id}"
  provisioner "remote-exec" {
      inline = [
        "if [ x`lsblk -ln -o FSTYPE /dev/xvdd` != 'xext4' ] ; then sudo mkfs.ext4 -L datanode /dev/xvdd ; fi",
        "sudo mount -a",
        "sudo mkdir -p /mnt/data/mongodb",
        "sudo chown -R mongodb:mongodb /mnt/data/mongodb",
        "sudo service mongod restart",
        "sudo sh -c \"echo 'yes' > /mnt/data/init\"",
      ]
      connection {
        user = "ubuntu"
        host = "${aws_eip.pritunl_ip.public_ip}"
        private_key = "${file("~/.ssh/master_rsa")}"
      }
    }
    provisioner "remote-exec" {
      when = "destroy"
      inline = [
        "sudo service mongod stop",
        "sudo umount /mnt/data"
      ]
      connection {
        user = "ubuntu"
        host = "${aws_eip.pritunl_ip.public_ip}"
        private_key = "${file("~/.ssh/master_rsa")}"
      }
    }
}

The problem I'm having however, is I think I'm running into hashicorp/terraform#16237, where the destroy provisioner causes a cycle.

I can solve the cycle by either 1. hard coding the instance's IP or 2. Adding in a data "aws_instances" with a filter. While of course 1 isn't desirable (I might not know the IP) 2, has its own set of problems when I apply the formula from scratch - data "aws_instances" returns 0 instances, which causes an error.

I thought there might be prior work on this, but it seems everyone is using skip_destroy. The issue I have with skip_destroy is if I'm moving an EBS instance, or changing the instance type, when I try to attach that instance again I get a timeout (because AWS thinks the EBS volume is already attached).

@radeksimko radeksimko added the service/ec2 Issues and PRs that pertain to the ec2 service. label Jan 28, 2018
@robax
Copy link

robax commented May 23, 2018

Also running into this issue. In our case, we're trying to use the remote provisioner to stop services and unmount an EBS volume "cleanly", this failed due to running into cycle issues exactly as @nemosupremo stated.

Our workaround is to do a dirty detachment using force_detach = true but this required a manual edit of the TF state file as described by OP @gtmtech. If others can confirm this I'll open a new issue.

@hartzell
Copy link

terraform destroy with an aws_instance, aws_ebs_volume and aws_volume_attachment isn't working when I'm using a FreeBSD 12 ZFS ami, even when the instance is not explicitly using volume. It works as expected if I use a CentOS AMI, whether or not I've made and mounted a filesystem on the device.

I'm using:

(alice)[11:19:28]attachment-example>>terraform version
Terraform v0.11.11
+ provider.aws v1.60.0

Your version of Terraform is out of date! The latest version
is 0.11.13. You can update by downloading from www.terraform.io/downloads.html

I believe that the problem is that AWS can't/won't detach the volume from the running FreeBSD instance. If I power down the instance first via the UI or shutdown -p now then terraform destroy succeeds.

I see a PR (#8602) that adds a stop_instance_before_detaching flag to the attachment

If I modify the configuration, setting skip_destroy = true in the aws_volume_attachment then terraform destroy succeeds.

I don't see any mention of this issue in the docs, skip_destroy seems to be intended for externally managed volumes.

The particular error that I'm seeing when destroying is:

(alice)[10:16:37]attachment-example>>terraform destroy
aws_ebs_volume.foo: Refreshing state... (ID: vol-079b66b5a574b8d1f)
aws_instance.bar: Refreshing state... (ID: i-0a10b26a1f0aa583c)
aws_volume_attachment.foo: Refreshing state... (ID: vai-2327136213)

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  - destroy

Terraform will perform the following actions:

  - aws_ebs_volume.foo

  - aws_instance.bar

  - aws_volume_attachment.foo


Plan: 0 to add, 0 to change, 3 to destroy.

Do you really want to destroy all resources?
  Terraform will destroy all your managed infrastructure, as shown above.
  There is no undo. Only 'yes' will be accepted to confirm.

  Enter a value: yes

aws_volume_attachment.foo: Destroying... (ID: vai-2327136213)
aws_volume_attachment.foo: Still destroying... (ID: vai-2327136213, 10s elapsed)
aws_volume_attachment.foo: Still destroying... (ID: vai-2327136213, 20s elapsed)
aws_volume_attachment.foo: Still destroying... (ID: vai-2327136213, 30s elapsed)
aws_volume_attachment.foo: Still destroying... (ID: vai-2327136213, 40s elapsed)
aws_volume_attachment.foo: Still destroying... (ID: vai-2327136213, 50s elapsed)
aws_volume_attachment.foo: Still destroying... (ID: vai-2327136213, 1m0s elapsed)
aws_volume_attachment.foo: Still destroying... (ID: vai-2327136213, 1m10s elapsed)
aws_volume_attachment.foo: Still destroying... (ID: vai-2327136213, 1m20s elapsed)
aws_volume_attachment.foo: Still destroying... (ID: vai-2327136213, 1m30s elapsed)
aws_volume_attachment.foo: Still destroying... (ID: vai-2327136213, 1m40s elapsed)
aws_volume_attachment.foo: Still destroying... (ID: vai-2327136213, 1m50s elapsed)
aws_volume_attachment.foo: Still destroying... (ID: vai-2327136213, 2m0s elapsed)
aws_volume_attachment.foo: Still destroying... (ID: vai-2327136213, 2m10s elapsed)
aws_volume_attachment.foo: Still destroying... (ID: vai-2327136213, 2m20s elapsed)
aws_volume_attachment.foo: Still destroying... (ID: vai-2327136213, 2m30s elapsed)
aws_volume_attachment.foo: Still destroying... (ID: vai-2327136213, 2m40s elapsed)
aws_volume_attachment.foo: Still destroying... (ID: vai-2327136213, 2m50s elapsed)
aws_volume_attachment.foo: Still destroying... (ID: vai-2327136213, 3m0s elapsed)
aws_volume_attachment.foo: Still destroying... (ID: vai-2327136213, 3m10s elapsed)
aws_volume_attachment.foo: Still destroying... (ID: vai-2327136213, 3m20s elapsed)
aws_volume_attachment.foo: Still destroying... (ID: vai-2327136213, 3m30s elapsed)
aws_volume_attachment.foo: Still destroying... (ID: vai-2327136213, 3m40s elapsed)
aws_volume_attachment.foo: Still destroying... (ID: vai-2327136213, 3m50s elapsed)
aws_volume_attachment.foo: Still destroying... (ID: vai-2327136213, 4m0s elapsed)
aws_volume_attachment.foo: Still destroying... (ID: vai-2327136213, 4m10s elapsed)
aws_volume_attachment.foo: Still destroying... (ID: vai-2327136213, 4m20s elapsed)
aws_volume_attachment.foo: Still destroying... (ID: vai-2327136213, 4m30s elapsed)
aws_volume_attachment.foo: Still destroying... (ID: vai-2327136213, 4m40s elapsed)
aws_volume_attachment.foo: Still destroying... (ID: vai-2327136213, 4m50s elapsed)
aws_volume_attachment.foo: Still destroying... (ID: vai-2327136213, 5m0s elapsed)

Error: Error applying plan:

1 error(s) occurred:

* aws_volume_attachment.foo (destroy): 1 error(s) occurred:

* aws_volume_attachment.foo: Error waiting for Volume (vol-079b66b5a574b8d1f) to detach from Instance: i-0a10b26a1f0aa583c

Terraform does not automatically rollback in the face of errors.
Instead, your Terraform state file has been partially updated with
any resources that successfully completed. Please address the error
above and apply again to incrementally change your infrastructure.

For completeness' sake, here's a test case (you'll need to subscribe to the CentOS ami; adjust the security group as necessary):

# region and etc... from environment, probably via aws-okta...
provider "aws" {
  region = "us-west-2"
}

resource "aws_ebs_volume" "foo" {
  size              = 1
  type              = "gp2"
  iops              = 750
  availability_zone = "us-west-2a"
}

resource "aws_volume_attachment" "foo" {
  device_name = "/dev/sdh"
  volume_id   = "${aws_ebs_volume.foo.id}"
  instance_id = "${aws_instance.bar.id}"

  #  skip_destroy = true
}

resource "aws_instance" "bar" {
  # ami               = "ami-07489cd9448bfa3d0" # FreeBSD 12 ZFS
  ami               = "ami-b63ae0ce" # CentOS
  instance_type     = "t3.small"
  availability_zone = "us-west-2a"
  key_name          = "alice-aws"

  vpc_security_group_ids = ["sg-USE_YOUR_OWN_SG"]

  root_block_device {
    volume_size           = "10"
    volume_type           = "gp2"
    delete_on_termination = true
  }
}

output "bar ip" {
  value = "${aws_instance.bar.public_ip}"
}

@RulerOf
Copy link
Contributor

RulerOf commented Sep 20, 2019

I overcame the issue here and the issues identified in #13549 and #16237 by working with a provisioner attached to a null_resource to removes the need for the skip_destroy, although you can still use it if desired.

Collect all of the dependent attributes into a null_resource and execute a provisioner there:

# Use your imagination for the missing code
resource "aws_instance" "my_server" {}
resource "aws_volume_attachment" "my_ebs_volume" {}

resource "null_resource" "ebs_volume_cleanup" {
  triggers = {
    private_ip = aws_instance.my_server.private_ip
    instance_id = aws_instance.my_server.id # Trigger on this in addition to the private_ip in case you use a static IP
    data_volume_attachment = aws_volume_attachment.my_ebs_volume.volume_id
  }
  
  provisioner "remote-exec" {
    when = "destroy"
    inline = [
      "sudo service my-service stop",
      "sudo sync",
      "if  mount | grep '/path/to/volume'; then sudo umount /path/to/volume; fi",
    ]
    connection {
      user = "ubuntu"
      host = self.triggers.private_ip
      private_key = "${file("/path/to/local/keyfile")}"
    }
  }
}

Passing the aws_instance.my_server.private_ip and the aws_volume_attachment.my_ebs_volume.volume_id into the triggers of the null_resource puts the destroy provisioner into the right spot in the graph, and frees up the volume for successful detachment.

This gives you a workflow that can cleanly execute like this to replace a server without destroying/recreating the EBS volume:

terraform taint aws_instance.my_server
terraform apply -auto-approve

@YakDriver YakDriver self-assigned this Oct 4, 2021
YakDriver pushed a commit that referenced this issue Oct 4, 2021
…lume

By stopping the instance, the volume is unmounted in the instance
and the detaching of the volume doesn't run into a timeout

fixes #6673
fixes #2084
fixes #2957
fixes #4770
fixes #288
fixes #1017
@github-actions github-actions bot added this to the v3.62.0 milestone Oct 5, 2021
@github-actions
Copy link

github-actions bot commented Oct 8, 2021

This functionality has been released in v3.62.0 of the Terraform AWS Provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

For further feature requests or bug reports with this functionality, please create a new GitHub issue following the template. Thank you!

@github-actions
Copy link

github-actions bot commented Jun 3, 2022

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jun 3, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement Requests to existing resources that expand the functionality or scope. service/ec2 Issues and PRs that pertain to the ec2 service. upstream-terraform Addresses functionality related to the Terraform core binary.
Projects
None yet
8 participants