Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EKS Managed Node Group Module - policies being detached #3202

Open
1 task done
jan-g opened this issue Nov 8, 2024 · 8 comments
Open
1 task done

EKS Managed Node Group Module - policies being detached #3202

jan-g opened this issue Nov 8, 2024 · 8 comments

Comments

@jan-g
Copy link

jan-g commented Nov 8, 2024

Description

I'm using terraform-aws-modules/eks/aws//modules/eks-managed-node-group directly.

During planning, a delete & recreation of the policy attachments for the node group is being planned.

However, after execution, no policies are attached to the nodepool's role.

  • ✋ I have searched the open/closed issues and my issue is not listed.

Versions

  • Module version [Required]: 2.28.0, but older versions like 2.20.0 are affected.

  • Terraform version: ~> 1.7.0

  • Provider version(s): aws at 5.74.0

Reproduction Code [Required]

The desired behaviour is to have node updates affect the util-node group first, then the second node group.

Here:

module "k8s-cluster" {
  source                   = "terraform-aws-modules/eks/aws"
  cluster_name             = "some-cluster"
  cluster_version          = var.k8s_cluster_version
  # Subnets where AWS creates EIPs for accessing the EKS control plane.
  control_plane_subnet_ids = var.private_subnet_ids
  vpc_id                   = var.vpc_id
  create_cloudwatch_log_group = false # TODO: enable this once we sort out IAM permissions
  cluster_enabled_log_types = ["api", "audit", "authenticator"]
  enable_irsa              = true

  cluster_endpoint_private_access = true
  cluster_endpoint_public_access  = true

  create_kms_key = false
  cluster_encryption_config = {}

  authentication_mode                      = "API_AND_CONFIG_MAP"
  kms_key_enable_default_policy            = false
  enable_cluster_creator_admin_permissions = true

  cluster_timeouts = {
    create = "60m"
    // Note that the update timeout is used separately for both version and
    // vpc_config update timeouts.
    update = "60m"
    delete = "30m"
  }

  eks_managed_node_groups = {
  util = {
    name = "some-cluster-util"

    subnet_ids = local.private_subnet_ids
    min_size   = 2
    max_size   = 2
    desired_size = 2
    instance_types = [var.util_instance_type]
    ami_type = "AL2023_x86_64_STANDARD"
    disk_size = 50
    force_update_version = false
    iam_role_use_name_prefix = false
    labels = {
      util-node = "true"
    }
    taints = {}
  }
  }

  cluster_addons = {
    coredns = {
      # Or specify addon_version = "..."
      most_recent = true
    }
    // Use in preference to IRSA
    eks-pod-identity-agent = {
      most_recent = true
    }
    kube-proxy = {
      most_recent = true
    }
    vpc-cni = {
      most_recent = true
    }
  }
}

and then this, managed afterward:

module "second_node_group" {
  source  = "terraform-aws-modules/eks/aws//modules/eks-managed-node-group"

  name            = "some-cluster-second"
  cluster_name    = "some-cluster"
  cluster_version = var.k8s_cluster_version
  cluster_service_cidr = module.k8s-cluster.cluster_service_cidr

  // Because this is being used externally from the cluster creation
  cluster_primary_security_group_id = module.k8s-cluster.cluster_primary_security_group_id
  vpc_security_group_ids = [
    module.k8s-cluster.node_security_group_id,
  ]

  //	Identifiers of EC2 Subnets to associate with the EKS Node Group. These subnets must have the following resource tag: kubernetes.io/cluster/CLUSTER_NAME
  subnet_ids = local.public_subnet_ids
  min_size   = 2
  max_size   = 2
  desired_size = 2
  instance_types = [var.second_instance_type]
  ami_type = "AL2023_x86_64_STANDARD"
  disk_size = 50
  force_update_version = false
  iam_role_use_name_prefix = false
  labels = {
    second-node = "true"
  }
  taints = {
    proxy-node = {
      key    = "second-node"
      value  = "true"
      effect = "NO_SCHEDULE"
    }
  }

  depends_on = [module.k8s-cluster]
}

The issue is that there's a plan to replace (delete & recreate) these, whenever the cluster sees an upgrade (eg, a VPC-CNI update)

module.second_node_group.aws_iam_role_policy_attachment.this["AmazonEC2ContainerRegistryReadOnly"]
module.second_node_group.aws_iam_role_policy_attachment.this["AmazonEKS_CNI_Policy"]
module.second_node_group.aws_iam_role_policy_attachment.this["AmazonEKSWorkerNodePolicy"]

and that in itself would not be a problem - TF apply (I'm using TFC) tells me these were all created & replaced.

However, that's a lie; they aren't replaced, they're only removed. Running a second plan & apply will recreate the missing policy attachments.

I'm using TF cloud.

Yes, although this seems unrelated.

Many plans will not cause this if they plan-to-zero. However, an update to the main cluster (eg, a plugin upgrade) will lead to the second node group being reapplied. It's this that causes the detach/attach behaviour. That in and of itself wouldn't be an issue; however, the reattachment doesn't actually happen. I don't know why - possibly some kind of ordering issue?

Expected behavior

The second node pool would continue to operate correctly.

Actual behavior

The second node pool loses the ability to pull images, etc, because it no longer has the pertinent policies attached.

Terminal Output Screenshot(s)

Additional context

If there's some other way to get the module to roll node groups in order, I'd be happy to use that.

@jan-g
Copy link
Author

jan-g commented Nov 8, 2024

From the logs, it looks like the creation of the new policy attachment happens, then the deletion of the old one. The internal TF IDs don't appear to clash (they are derived from timestamps).

So something is triggering create_before_destroy for these, I think, and the result is that they are replaced and then the replacements are deleted.

@jan-g
Copy link
Author

jan-g commented Nov 8, 2024

I've just tried this manually:

aws --profile my-dev-account iam attach-role-policy --role-name some-cluster-second-eks-node-group --policy-arn arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy

can be run repeatedly (it's idempotent). The deletion just drops the policy attachment.

So I think the issue here is that create_before_destroy is being triggered by something, and the TF provider itself doesn't know any better. I don't know if supplying an explicit lifecycle block that opts out of this for the policy attachments would sort out the issue (but I can definitely attempt this).

@bryantbiggs
Copy link
Member

this is incorrect depends_on = [module.k8s-cluster] - remove that and your issue should be resolved

this is not a module issue

@jan-g
Copy link
Author

jan-g commented Nov 11, 2024

Without this, the two node pools (the one defined directly on the cluster, and the other) become eligible to roll at the same time.

Is there a way to achieve a one-after-the-other behaviour here?

@bryantbiggs
Copy link
Member

This feels like an X-Y problem - why are you creating nodes this way?

@jan-g
Copy link
Author

jan-g commented Nov 11, 2024

The two pools host distinct parts of an application. I'd like to ensure that one is rehomed before proceeding to the second.

@jan-g
Copy link
Author

jan-g commented Nov 11, 2024

Incidentally, I'd expect that setting depends_on shouldn't break the module behaviour. And indeed, TF looks to plan to replace the policy attachments - that might be a spurious action, but it shouldn't cause the policy to end up deleted. What I can't work out here is why those policy attachments are handled with create_before_delete, which seems to be the cause of the misbehaviour I'm witnessing.

@bryantbiggs
Copy link
Member

The two pools host distinct parts of an application. I'd like to ensure that one is rehomed before proceeding to the second.

This sounds like you have not architected for resiliency - Kubernetes applications should be resilient to node replacements. In addition, it sounds like your app is being closely coupled with a node pool which is also not ideal - it treats them more like pets rather than cattle. Correcting these will alleviate the need to venture down this path entirely

Incidentally, I'd expect that setting depends_on shouldn't break the module behaviour. And indeed, TF looks to plan to replace the policy attachments - that might be a spurious action, but it shouldn't cause the policy to end up deleted. What I can't work out here is why those policy attachments are handled with create_before_delete, which seems to be the cause of the misbehaviour I'm witnessing.

Please familiarize yourself with Terraform and its behaviors hashicorp/terraform#30340 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants