Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Access Entries are deleted before Helm charts so helm charts stays orphan in the terraform state #2923

Closed
skursadk opened this issue Feb 12, 2024 · 19 comments · Fixed by #3000

Comments

@skursadk
Copy link

skursadk commented Feb 12, 2024

Description

We deploy a couple of helm charts inside the same terraform state where the eks module is called. It works like a charm with access entries while creating the resources but it cannot delete the helm resources properly in terraform destroy. The reason is that terraform destroy deletes the access entries before the helm charts.

It was working great in version 19.21.0 with aws-auth configmap

  • [x ] ✋ I have searched the open/closed issues and my issue is not listed.

⚠️ Note

Versions

  • Module version [Required]: 20.2.1

  • Terraform version:1.6.6

  • Provider version(s): 1.6.6

Reproduction Code [Required]

Steps to reproduce the behavior:

  • Create the eks cluster without giving any access entries.
  • Deploy a dummy helm chart
  • Run terrafrom destroy

Expected behavior

Terraform destroy should run successfully

Actual behavior

It deletes the access entries and cannot delete the helm charts so helm charts stays idle at the terraform sate

@bryantbiggs
Copy link
Member

It was working great in version 19.21.0 with aws-auth configmap

Thats because the cluster creator was granted admin permissions automatically that were outside of Terraform's control. I'm open to ideas but I currently don't see a way to achieve the same.

Also, the guidance will always be - destroy the resources on the cluster first before destroying the cluster. There are a number of reasons for this

@skursadk
Copy link
Author

Should I add explicit depends_on = [ module.eks] on the resources? Is that only way to let terraform remove the helm charts and then the eks cluster?

@bryantbiggs
Copy link
Member

you can try, but if its a module then it will face challenges

you are better off separating your infrastructure from your applications. this would be two different statefiles, and you would need to explicitly handle the removal of the applications running on the cluster first, before destroying the cluster

@nascit
Copy link

nascit commented Mar 5, 2024

I have the same challenge.

I use terraform-aws-eks-blueprints-addons module to install addons like karpenter. All managed within the same state file.

When I perform a terraform destroy, it may fail depending if the access entry is deleted before or after the helm deployments performed by the terraform-aws-eks-blueprints-addons module.

@bryantbiggs
Copy link
Member

@nascit you'll need to either separate the cluster infra from the app deployment (recommended route), or follow the steps we provide to carefully tear down resources in order https://aws-ia.github.io/terraform-aws-eks-blueprints/patterns/karpenter/#destroy - and with Karpenter thats even more relevant because its creating additional AWS resources outside of Terraform's purview

@bryantbiggs
Copy link
Member

Since the permissions are now controlled in code, you will need to handle the destroy situation a little differently as stated above. You can use the following when destroying the cluster to ensure permissions aren't removed too early:

# Necessary to avoid removing Terraform's permissions too soon before its finished
# cleaning up the resources it deployed inside the cluster
terraform state rm 'module.eks.aws_eks_access_entry.this["cluster_creator"]' || true
terraform state rm 'module.eks.aws_eks_access_policy_association.this["cluster_creator_admin"]' || true

@gohmc
Copy link

gohmc commented Mar 13, 2024

hi @bryantbiggs ,

This issue can be avoided with authentication_mode set to CONFIG_MAP, but because bootstrap_cluster_creator_admin_permissions is hardcoded to false; the provider raise error during apply: bootstrapClusterCreatorAdminPermission must be true when authentication_mode is set to CONFIG_MAP.

Can this module allow us to have the flexibility to use CONFIG_MAP? Time is needed to adopt API_AND_CONFIG_MAP...

@bryantbiggs
Copy link
Member

Can this module allow us to have the flexibility to use CONFIG_MAP? Time is needed to adopt API_AND_CONFIG_MAP...

What in this module is stopping you from doing this?

@gohmc
Copy link

gohmc commented Mar 13, 2024

Using the built-in example here:

# Comment this line to not using EKS access management control yet. default to false.
# enable_cluster_creator_admin_permissions = true

# Add this line for the cluster to use CONFIG_MAP authentication mode only.
authentication_mode = CONFIG_MAP

Now perform terraform apply and it will fail with bootstrapClusterCreatorAdminPermission must be true when authentication_mode is set to CONFIG_MAP.

@lorengordon
Copy link
Contributor

lorengordon commented Apr 8, 2024

@bryantbiggs One thing that occurs to me that might help with this, would be to use depends_on in the output for the cluster_endpoint, forcing it to wait until the access entries and associations are complete. Since the helm provider requires the cluster endpoint as an input, that should get the order of operations correct on both create and destroy.

output "cluster_endpoint" {
  description = "Endpoint for your Kubernetes API server"
  value       = try(aws_eks_cluster.this[0].endpoint, null)

  depends_on = [
    aws_eks_access_entry.this,
    aws_eks_access_policy_association.this,
  ]
}

Edit: Perhaps do the same for the cluster_certificate_authority_data:

output "cluster_certificate_authority_data" {
  description = "Base64 encoded certificate data required to communicate with the cluster"
  value       = try(aws_eks_cluster.this[0].certificate_authority[0].data, null)

  depends_on = [
    aws_eks_access_entry.this,
    aws_eks_access_policy_association.this,
  ]
}

@bryantbiggs
Copy link
Member

I don't think those are valid - have you tested those?

@lorengordon
Copy link
Contributor

@bryantbiggs Yes absolutely, using depends_on in an output is 100% valid, and used for exactly this kind of timing issue where an output for a single resource isn't fully available until some other resource completes. S3 bucket and bucket policy is another common one. Or IAM role and role attachments.

@lorengordon
Copy link
Contributor

@lorengordon
Copy link
Contributor

Fyi, I am testing a PR to see if I can cleanly apply and destroy the "complete" test on the blueprints-addons project without manually using state rm and target... So far, so good! 🤞

@bryantbiggs
Copy link
Member

just validated this on the karpenter example which I think is more representative, and it was able to apply then destroy without targets or intervention! Great find - please feel free to open a PR to add these, I don't want to steal your thunder on the great find 😉

@lorengordon
Copy link
Contributor

@bryantbiggs Done, see #3000! (And I got the 3000th pr, woot!)

@bryantbiggs
Copy link
Member

dang, thats a lot of PRs 😅

@antonbabenko
Copy link
Member

This issue has been resolved in version 20.8.5 🎉

Copy link

github-actions bot commented May 9, 2024

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 9, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants