Can't tear down cluster after provisioning. #58

Firaenix · 2020-04-21T05:42:02Z

Describe the Bug

After provisioning an EKS Cluster with this module, you cannot then tear it down by commenting out the module.

Expected Behavior

A clear and concise description of what you expected to happen.

Steps to Reproduce

Create module
Plan and Apply
Comment out the module
Plan
Error: Provider configuration not present

Screenshots

Environment (please complete the following information):

Terraform Cloud

tthayer · 2020-04-24T17:35:23Z

Is there a reason why you aren't using terraform destroy for this?

Firaenix · 2020-04-25T00:56:11Z

Because I have other things in my terraform workspace that I do not want to destroy.

Also when trying to run terraform destroy to just destroy the module, I am unable to because Terraform Cloud is connected to VCS and says that it is unable to destroy because the changes must be committed to a VCS to apply.

tthayer · 2020-04-27T23:35:06Z

Instead of commenting out the module, have you tried setting the enabled value to false?

Firaenix · 2020-04-28T04:48:12Z

Experimenting with it now...

Firaenix · 2020-04-30T01:03:55Z

OK, so I got a cluster up and going with the enabled flag, but when I go and set it to disabled I get this:

Error: Get http://localhost/api/v1/namespaces/kube-system/configmaps/aws-auth: dial tcp 127.0.0.1:80: connect: connection refused

I assume this is because the provider is being deleted as well and has no reference to active resources anymore.

Firaenix · 2020-04-30T01:12:48Z

It looks like the problem lies on this line.

terraform-aws-eks-cluster/auth.tf

Line 72 in 79d7bf7

data "aws_eks_cluster_auth" "eks" {

Because enabled is set to false, its no longer referencing the created cluster.

Firaenix · 2020-05-01T03:29:16Z

Currently, I cant tear down this cluster without doing it manually.
Its sitting in my AWS environment eating away at my money

Firaenix · 2020-05-26T07:09:54Z

Any progress on this? its really a pain having to manually destroy the cluster and leaving my terraform in a broken state.

vkhatri · 2020-05-26T10:21:55Z

Deleting the resource kubernetes_config_map from the tfstate should fix the error for you.

$ [terraform|terragrunt] state list | grep kubernetes_config_map
$ [terraform|terragrunt] state rm [resource name output from previous command]

osterman · 2020-07-03T20:25:17Z

Fwiw, our tests are working and tearing down the cluster. It's like this is related to upgrading between modules. @vkhatri's fix looks like it should work by eliminating the offending resource.

There's a related issue that is currently outside our control: #67

osterman · 2020-07-03T20:26:26Z

Test logs here: https://github.com/cloudposse/terraform-aws-eks-cluster/actions

Firaenix · 2020-07-17T04:48:51Z

Thanks, I'll check it out.
much appreciated.

brianmalachiarts · 2020-08-04T21:32:08Z

I ran into this as well, but for a slightly different reason. We set the EKS cluster endpoint to be Private only, and used allowed_cidr_blocks to allow reaching the K8s API from our private networks. While doing a terraform destroy it deletes the Security Group allowing this traffic before it tries to delete the ConfigMap resource—meaning it can't connect to K8s by then and errors out.

During a tf destroy (while on our VPN with access to the 10. networks)

module.eks_cluster.aws_iam_openid_connect_provider.default[0]: Destroying... [id=arn:aws:iam::1234567890:oidc-provider/oidc.eks.us-west-2.amazonaws.com/id/4D53A93F13B38E0D7F67CABE233B7BFE]
module.eks_cluster.aws_security_group_rule.ingress_cidr_blocks[0]: Destroying... [id=sgrule-3728649431]
module.eks_cluster.aws_security_group_rule.egress[0]: Destroying... [id=sgrule-4014147656]
module.eks_node_group.aws_eks_node_group.default[0]: Destroying... [id=dev-cluster:dev-ng0-workers]
module.eks_cluster.aws_iam_openid_connect_provider.default[0]: Destruction complete after 1s
module.eks_cluster.aws_security_group_rule.ingress_cidr_blocks[0]: Destruction complete after 1s
module.eks_cluster.aws_security_group_rule.egress[0]: Destruction complete after 1s

...more lines about deleting the node group...

module.eks_cluster.kubernetes_config_map.aws_auth_ignore_changes[0]: Destroying... [id=kube-system/aws-auth]
module.eks_cluster.kubernetes_config_map.aws_auth_ignore_changes[0]: Still destroying... [id=kube-system/aws-auth, 10s elapsed]
module.eks_cluster.kubernetes_config_map.aws_auth_ignore_changes[0]: Still destroying... [id=kube-system/aws-auth, 20s elapsed]

Error: Delete "https://4D53A93F13B38E0D7F67CABE233B7BFE.sk1.us-west-2.eks.amazonaws.com/api/v1/namespaces/kube-system/configmaps/aws-auth": dial tcp 10.10.1.107:443: i/o timeout

Deleting the specific resource from state as mentioned above by @vkhatri worked for us and let us continue with the destroy.

vsheffer · 2021-01-26T02:39:02Z

I've had several different issues with destroying the cluster, this is the latest. I'm trying to move a test cluster from one region to another and this is failing. Not the highest priority, for me at least, but one I'd like to try to help resolve.

I know some of my cluster was destroyed, but now I can't even do a plan, so I'll have to check on the current state to try to discern the resources that remain.

Having used GKE I'm not a fan of the AWS EKS permissions requiring some interplay between IAM and K8s RBAC. Based on the error that is what looks like the issue.

If it helps, I can still "manage" the cluster using kubectl. So, some remnants remain. For the record this is my error:
Error: Get "http://localhost/api/v1/namespaces/kube-system/configmaps/aws-auth": dial tcp 127.0.0.1:80: connect: connection refused

Also for the record, the aws-auth configmap still exists but is "empty". Here are the contents:

Name:         aws-auth
Namespace:    kube-system
Labels:       <none>
Annotations:  <none>

Data
====
mapAccounts:
----
[]

mapRoles:
----
null

mapUsers:
----
[]

Events:  <none>

There definitely seems to be some dependency issue between TF kubernetes resources and EKS that shows up, not just here, but in other TF modules managing EKS.

vsheffer · 2021-01-26T05:19:47Z

Apologies for the confusion, but I clearly don't understand the interplay between IAM and the aws-auth configmap. I just created a new cluster using 0.30.2 of cloudposse/eks-cluster/aws and then, without doing anything else, destroyed it. After a lot of TF log output describing what was destroyed the end result is:

Error: Unauthorized

I do now think the underlying problem is with the hashicorp/kubernetes provider. Nonetheless, I can't reliably destroy a cluster using TF even without adding any other resources (e.g. node groups).

What am I doing wrong?

vsheffer · 2021-01-26T05:25:35Z

I'm happy to provide all of the TF files I have. There is nothing proprietary in them if that helps troubleshoot this. In my TFE workspace I'm only setting a few variables like region, AZs, and then the standard CloudPosse context variables.

nnsense-bot · 2021-02-19T15:04:13Z

All the above seems related the issues related using >terraform 0.14, I've experienced the same issues and sorted by using v 0.13, we've spoken on slack with the team about this and they told me it's currently being updated

Firaenix added the bug 🐛 An issue with the system label Apr 21, 2020

osterman mentioned this issue Aug 10, 2020

Is the null provider still required? cloudposse/terraform-null-label#82

Closed

osterman mentioned this issue Mar 8, 2021

Fail with I/O timeout due to bad configuration of the Kubernetes provider #104

Closed

Nuru mentioned this issue Jul 14, 2021

Enhance Kubernetes provider configuration #119

Merged

Nuru closed this as completed in #119 Jul 17, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't tear down cluster after provisioning. #58

Can't tear down cluster after provisioning. #58

Firaenix commented Apr 21, 2020

tthayer commented Apr 24, 2020 •

edited

Loading

Firaenix commented Apr 25, 2020

tthayer commented Apr 27, 2020

Firaenix commented Apr 28, 2020

Firaenix commented Apr 30, 2020

Firaenix commented Apr 30, 2020

Firaenix commented May 1, 2020

Firaenix commented May 26, 2020

vkhatri commented May 26, 2020

osterman commented Jul 3, 2020 •

edited

Loading

osterman commented Jul 3, 2020

Firaenix commented Jul 17, 2020

brianmalachiarts commented Aug 4, 2020

vsheffer commented Jan 26, 2021 •

edited

Loading

vsheffer commented Jan 26, 2021

vsheffer commented Jan 26, 2021

nnsense-bot commented Feb 19, 2021

Can't tear down cluster after provisioning. #58

Can't tear down cluster after provisioning. #58

Comments

Firaenix commented Apr 21, 2020

Describe the Bug

Expected Behavior

Steps to Reproduce

Screenshots

Environment (please complete the following information):

tthayer commented Apr 24, 2020 • edited Loading

Firaenix commented Apr 25, 2020

tthayer commented Apr 27, 2020

Firaenix commented Apr 28, 2020

Firaenix commented Apr 30, 2020

Firaenix commented Apr 30, 2020

Firaenix commented May 1, 2020

Firaenix commented May 26, 2020

vkhatri commented May 26, 2020

osterman commented Jul 3, 2020 • edited Loading

osterman commented Jul 3, 2020

Firaenix commented Jul 17, 2020

brianmalachiarts commented Aug 4, 2020

vsheffer commented Jan 26, 2021 • edited Loading

vsheffer commented Jan 26, 2021

vsheffer commented Jan 26, 2021

nnsense-bot commented Feb 19, 2021

tthayer commented Apr 24, 2020 •

edited

Loading

osterman commented Jul 3, 2020 •

edited

Loading

vsheffer commented Jan 26, 2021 •

edited

Loading