Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't tear down cluster after provisioning. #58

Closed
Firaenix opened this issue Apr 21, 2020 · 17 comments · Fixed by #119
Closed

Can't tear down cluster after provisioning. #58

Firaenix opened this issue Apr 21, 2020 · 17 comments · Fixed by #119
Labels
bug 🐛 An issue with the system

Comments

@Firaenix
Copy link

Describe the Bug

After provisioning an EKS Cluster with this module, you cannot then tear it down by commenting out the module.

Expected Behavior

A clear and concise description of what you expected to happen.

Steps to Reproduce

  1. Create module
  2. Plan and Apply
  3. Comment out the module
  4. Plan
  5. Error: Provider configuration not present

Screenshots

image

Environment (please complete the following information):

Terraform Cloud

@Firaenix Firaenix added the bug 🐛 An issue with the system label Apr 21, 2020
@tthayer
Copy link

tthayer commented Apr 24, 2020

Is there a reason why you aren't using terraform destroy for this?

@Firaenix
Copy link
Author

Because I have other things in my terraform workspace that I do not want to destroy.

Also when trying to run terraform destroy to just destroy the module, I am unable to because Terraform Cloud is connected to VCS and says that it is unable to destroy because the changes must be committed to a VCS to apply.

@tthayer
Copy link

tthayer commented Apr 27, 2020

Instead of commenting out the module, have you tried setting the enabled value to false?

@Firaenix
Copy link
Author

Experimenting with it now...

@Firaenix
Copy link
Author

OK, so I got a cluster up and going with the enabled flag, but when I go and set it to disabled I get this:

Error: Get http://localhost/api/v1/namespaces/kube-system/configmaps/aws-auth: dial tcp 127.0.0.1:80: connect: connection refused

I assume this is because the provider is being deleted as well and has no reference to active resources anymore.

@Firaenix
Copy link
Author

It looks like the problem lies on this line.

data "aws_eks_cluster_auth" "eks" {

Because enabled is set to false, its no longer referencing the created cluster.

@Firaenix
Copy link
Author

Firaenix commented May 1, 2020

Currently, I cant tear down this cluster without doing it manually.
Its sitting in my AWS environment eating away at my money

@Firaenix
Copy link
Author

Any progress on this? its really a pain having to manually destroy the cluster and leaving my terraform in a broken state.

@vkhatri
Copy link
Contributor

vkhatri commented May 26, 2020

Deleting the resource kubernetes_config_map from the tfstate should fix the error for you.

$ [terraform|terragrunt] state list | grep kubernetes_config_map
$ [terraform|terragrunt] state rm [resource name output from previous command]

@osterman
Copy link
Member

osterman commented Jul 3, 2020

Fwiw, our tests are working and tearing down the cluster. It's like this is related to upgrading between modules. @vkhatri's fix looks like it should work by eliminating the offending resource.

There's a related issue that is currently outside our control: #67

@osterman
Copy link
Member

osterman commented Jul 3, 2020

@Firaenix
Copy link
Author

Thanks, I'll check it out.
much appreciated.

@brianmalachiarts
Copy link

I ran into this as well, but for a slightly different reason. We set the EKS cluster endpoint to be Private only, and used allowed_cidr_blocks to allow reaching the K8s API from our private networks. While doing a terraform destroy it deletes the Security Group allowing this traffic before it tries to delete the ConfigMap resource—meaning it can't connect to K8s by then and errors out.

During a tf destroy (while on our VPN with access to the 10. networks)

module.eks_cluster.aws_iam_openid_connect_provider.default[0]: Destroying... [id=arn:aws:iam::1234567890:oidc-provider/oidc.eks.us-west-2.amazonaws.com/id/4D53A93F13B38E0D7F67CABE233B7BFE]
module.eks_cluster.aws_security_group_rule.ingress_cidr_blocks[0]: Destroying... [id=sgrule-3728649431]
module.eks_cluster.aws_security_group_rule.egress[0]: Destroying... [id=sgrule-4014147656]
module.eks_node_group.aws_eks_node_group.default[0]: Destroying... [id=dev-cluster:dev-ng0-workers]
module.eks_cluster.aws_iam_openid_connect_provider.default[0]: Destruction complete after 1s
module.eks_cluster.aws_security_group_rule.ingress_cidr_blocks[0]: Destruction complete after 1s
module.eks_cluster.aws_security_group_rule.egress[0]: Destruction complete after 1s

...more lines about deleting the node group...

module.eks_cluster.kubernetes_config_map.aws_auth_ignore_changes[0]: Destroying... [id=kube-system/aws-auth]
module.eks_cluster.kubernetes_config_map.aws_auth_ignore_changes[0]: Still destroying... [id=kube-system/aws-auth, 10s elapsed]
module.eks_cluster.kubernetes_config_map.aws_auth_ignore_changes[0]: Still destroying... [id=kube-system/aws-auth, 20s elapsed]

Error: Delete "https://4D53A93F13B38E0D7F67CABE233B7BFE.sk1.us-west-2.eks.amazonaws.com/api/v1/namespaces/kube-system/configmaps/aws-auth": dial tcp 10.10.1.107:443: i/o timeout

Deleting the specific resource from state as mentioned above by @vkhatri worked for us and let us continue with the destroy.

@vsheffer
Copy link

vsheffer commented Jan 26, 2021

I've had several different issues with destroying the cluster, this is the latest. I'm trying to move a test cluster from one region to another and this is failing. Not the highest priority, for me at least, but one I'd like to try to help resolve.

I know some of my cluster was destroyed, but now I can't even do a plan, so I'll have to check on the current state to try to discern the resources that remain.

Having used GKE I'm not a fan of the AWS EKS permissions requiring some interplay between IAM and K8s RBAC. Based on the error that is what looks like the issue.

If it helps, I can still "manage" the cluster using kubectl. So, some remnants remain. For the record this is my error:
Error: Get "http://localhost/api/v1/namespaces/kube-system/configmaps/aws-auth": dial tcp 127.0.0.1:80: connect: connection refused

Also for the record, the aws-auth configmap still exists but is "empty". Here are the contents:

Name:         aws-auth
Namespace:    kube-system
Labels:       <none>
Annotations:  <none>

Data
====
mapAccounts:
----
[]

mapRoles:
----
null

mapUsers:
----
[]

Events:  <none>

There definitely seems to be some dependency issue between TF kubernetes resources and EKS that shows up, not just here, but in other TF modules managing EKS.

@vsheffer
Copy link

Apologies for the confusion, but I clearly don't understand the interplay between IAM and the aws-auth configmap. I just created a new cluster using 0.30.2 of cloudposse/eks-cluster/aws and then, without doing anything else, destroyed it. After a lot of TF log output describing what was destroyed the end result is:

Error: Unauthorized

I do now think the underlying problem is with the hashicorp/kubernetes provider. Nonetheless, I can't reliably destroy a cluster using TF even without adding any other resources (e.g. node groups).

What am I doing wrong?

@vsheffer
Copy link

I'm happy to provide all of the TF files I have. There is nothing proprietary in them if that helps troubleshoot this. In my TFE workspace I'm only setting a few variables like region, AZs, and then the standard CloudPosse context variables.

@nnsense-bot
Copy link

All the above seems related the issues related using >terraform 0.14, I've experienced the same issues and sorted by using v 0.13, we've spoken on slack with the team about this and they told me it's currently being updated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐛 An issue with the system
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants