Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Destroy is ignoring region from state file #15052

Closed
Alexhha opened this issue Sep 7, 2020 · 6 comments
Closed

Destroy is ignoring region from state file #15052

Alexhha opened this issue Sep 7, 2020 · 6 comments
Labels
bug Addresses a defect in current functionality. service/ec2 Issues and PRs that pertain to the ec2 service. stale Old or inactive issues managed by automation, if no further action taken these will get closed.

Comments

@Alexhha
Copy link

Alexhha commented Sep 7, 2020

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform CLI and Terraform AWS Provider Version

$ terraform version
Terraform v0.12.29
+ provider.aws v3.5.0
+ provider.random v2.3.0
+ provider.template v2.1.2

Affected Resource(s)

  • aws_instance

Terraform Configuration Files

main.tf

provider "aws" {
    access_key = "ACCESS_KEY"
    secret_key = "SECRET_KEY"
    region     = var.region
}

data "aws_ami" "linux_ami" {
    most_recent = true
    owners      = ["099720109477"]

    filter {
        name   = "name"
        values = ["ubuntu/images/hvm-ssd/ubuntu-*-20.04-amd64-server-*"]
    }

    filter {
        name   = "ena-support"
        values = ["true"]
    }

    filter {
        name   = "virtualization-type"
        values = ["hvm"]
    }
}


resource "random_id" "aws-prefix" {
    byte_length = 4
}


resource "aws_instance" "test" {
    ami           = data.aws_ami.linux_ami.id
    instance_type = "t3.micro"

    tags = {
        Name        = "test-${random_id.aws-prefix.hex}"
        environment = "test-${random_id.aws-prefix.hex}-${terraform.workspace}"
        ami         = data.aws_ami.linux_ami.id
    }

    root_block_device {
        delete_on_termination = true
        volume_size = var.disk_size
    }
}

variables.tf

variable "region" {
    type    = string
    default = "eu-central-1"
}

variable "disk_size" {
    type    = number
    default = 10
}

Debug Output

Panic Output

Expected Behavior

All resources from state file should be deleted. Or at least some warning message should be printed

Actual Behavior

aws_instance resource is not deleted

$ terraform apply -var 'region=us-west-1'
...
Plan: 2 to add, 0 to change, 0 to destroy.
random_id.aws-prefix: Creating...
random_id.aws-prefix: Creation complete after 0s [id=heyWcA]
aws_instance.test: Creating...
...
Apply complete! Resources: 2 added, 0 changed, 0 destroyed.

Check state

$ terraform state list
data.aws_ami.linux_ami
aws_instance.test
random_id.aws-prefix

Destroy the env with default region from variables.tf

$ terraform destroy
...
data.template_file.bootstrap: Refreshing state...
random_id.aws-prefix: Refreshing state... [id=heyWcA]
data.aws_ami.linux_ami: Refreshing state...
aws_instance.test: Refreshing state... [id=i-1234567890]
...
Plan: 0 to add, 0 to change, 1 to destroy.

Check state one more time

$ terraform state list

oops, aws_instance is not in the state and wasn't deleted. It is steel running. We can't delete it even with specific region

$ terraform destroy -var 'region=us-west-1'
data.aws_ami.linux_ami: Refreshing state...

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:

Terraform will perform the following actions:

Plan: 0 to add, 0 to change, 0 to destroy.

Steps to Reproduce

  1. $ terraform apply -var 'region=us-west-1'
  2. $ terraform destroy

Important Factoids

References

  • #0000
@ghost ghost added the service/ec2 Issues and PRs that pertain to the ec2 service. label Sep 7, 2020
@github-actions github-actions bot added the needs-triage Waiting for first response or review from a maintainer. label Sep 7, 2020
@bflad bflad added bug Addresses a defect in current functionality. and removed needs-triage Waiting for first response or review from a maintainer. labels Sep 9, 2020
@bflad
Copy link
Contributor

bflad commented Sep 9, 2020

Hi @Alexhha 👋 Thank you for raising this and sorry you ran into trouble here.

I'm presumptively going to mark this as a bug as we have seen similar previous reports and that the behavior is likely expected given the current design of the Terraform AWS Provider. It likely will not be easily solvable in the for this and all other resources in the near future. The region and AWS service endpoint configuration currently occurs during initialization of each provider instance, once per Terraform run. For starters, we would need to store the associated region in the Terraform state of each resource, which is not the case today, and the AWS Go SDK service initialization would need to be delayed until each individual resource invocation in Terraform's operation graph, which could cause logistical and performance issues even with some caching. It is also unclear upfront if other provider configuration would also need to be stored in the state as well, since endpoints and authentication behaviors can be customized.

For now, it is probably best to assume this will not be fixed any time soon given its implementation complexity and since it tends to be a less common use case to pass in the region directly during each Terraform invocation rather than include it as part of a long-lived configuration (e.g. Terraform Cloud workspace variable, hardcoded Terraform configuration value, etc). An option to help ensure that region configuration remains correct across future Terraform invocations can be to create a root Terraform configuration that initializes the provider configurations in expected regions and passes those provider instances to modules (like the configuration outlined, without the region variable), e.g.

provider "aws" {
  alias = "euc1"
  region = "eu-central-1"
}

provider "aws" {
  alias = "usw1"
  region = "us-west-1"
}

module "eu-central-1-instance" {
  source = "./path/to/code/above"

  disk_size = 15 # optional

  providers {
    aws = aws.euc1
  }
}

module "us-west-1-instance" {
  source = "./path/to/code/above"

  disk_size = 15 # optional

  providers {
    aws = aws.usw1
  }
}

See the Terraform documentation section passing providers to modules for more information. Hope this helps.

@Alexhha
Copy link
Author

Alexhha commented Sep 9, 2020

Good day,

If it's really hard to implement - is it possible to add at least warning to terraform destroy command ? As far as I can see we have region for each resource in state file. WDYT?

@bflad
Copy link
Contributor

bflad commented Sep 9, 2020

@Alexhha most resources should include a WARN log when the resource is "not found" and therefore being removed from the Terraform state (generally used to trigger resource recreation), although it looks like the aws_instance resource is missing that:

https://github.com/terraform-providers/terraform-provider-aws/blob/bc480ffb51e2056dd2eaec0dc45af172adc50065/aws/resource_aws_instance.go#L732-L735

That being said, those logs are generally non-visible unless you have logging enabled and something looking for those types of issues during a Terraform run. We recently just upgraded to the new Terraform Plugin SDK version 2 which will allow us to switch those logs to user interface warnings, but that will also requiring updating the resource implementations to new function signatures. We do not have a timeline to switch these over yet. I've created #15090 for tracking that effort.

@danielbrauer
Copy link

danielbrauer commented Mar 15, 2021

I think I understand the issue described above as affecting terraform's identification of state-referenced resources when switching regions. This can result in orphaned resources.

Could this same issue cause terraform to terminate a resource that is not in its state? We had an incident recently and as best I can tell this is what happened:

  1. Two similar terraform-managed deployments, in separate AWS regions defined by setting a variable differently on a module. Let's call them A (us-east-1) and B (us-east-2). Note that all the terraform commands were run deployment B.
  2. I ran destroy on deployment B, which didn't complete due to an identically named database snapshot.
  3. I manually removed the conflicting snapshot in east-2, and ran destroy again
  4. According to AWS event logs, the database from A (us-east-1) was destroyed at this moment. I did not notice at the time.
  5. Shortly thereafter, I noticed that the database in east-2 was still running.

The only way I can explain the above is if:

  1. I accidentally changed the region of deployment B before running destroy in step 3. This is possible.
  2. This issue not only causes Terraform to lose resources when switching regions, but allows it to misidentify resources if the region has changed since the last operation.

If this is what happened, then this issue seems much more dangerous than originally reported: it can cause incorrect destruction of resources rather than just losing track of them. Is this even possible, though? I would expect resource identifiers to be unique, and prevent terraform from ever thinking it owned something which it didn't.

terraform version
Terraform v0.12.30
+ provider.aws v2.70.0
+ provider.random v2.3.1

@github-actions
Copy link

github-actions bot commented Mar 5, 2023

Marking this issue as stale due to inactivity. This helps our maintainers find and focus on the active issues. If this issue receives no comments in the next 30 days it will automatically be closed. Maintainers can also remove the stale label.

If this issue was automatically closed and you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thank you!

@github-actions github-actions bot added the stale Old or inactive issues managed by automation, if no further action taken these will get closed. label Mar 5, 2023
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Apr 4, 2023
@github-actions
Copy link

github-actions bot commented May 5, 2023

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 5, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Addresses a defect in current functionality. service/ec2 Issues and PRs that pertain to the ec2 service. stale Old or inactive issues managed by automation, if no further action taken these will get closed.
Projects
None yet
Development

No branches or pull requests

3 participants