Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to add additional iam roles to cluster #155

Closed
jchanam opened this issue Jun 22, 2022 · 8 comments · Fixed by #206
Closed

Unable to add additional iam roles to cluster #155

jchanam opened this issue Jun 22, 2022 · 8 comments · Fixed by #206
Labels
bug 🐛 An issue with the system

Comments

@jchanam
Copy link

jchanam commented Jun 22, 2022

Found a bug? Maybe our Slack Community can help.

Slack Community

Describe the Bug

When using the module to create an EKS cluster, I'm trying to add additional roles to the aws_auth configmap. This only happens with the roles, adding additional users works perfectly.

The behavior changes depending on the kubernetes_config_map_ignore_role_changes config.

  • If I leave it as default (false), the worker roles are added, but not my additional roles.
  • If I change it to true, the worker roles are removed and my additional roles are added.

Expected Behavior

When adding map_additional_iam_roles, those roles should appear on the aws_auth configmap, together with the worker roles when the kubernetes_config_map_ignore_role_changes is set to false.

Steps to Reproduce

Steps to reproduce the behavior:

  1. Create a cluster and workers with this config (there are some vars not changed, so it's easy to use):
module "eks_cluster" {
  region  = var.region
  source  = "cloudposse/eks-cluster/aws"
  version = "2.2.0"

  name = var.name

  vpc_id     = var.vpc_id
  subnet_ids = var.private_subnet_ids

  endpoint_public_access  = false
  endpoint_private_access = true

  kubernetes_version = var.kubernetes_version

  kube_exec_auth_enabled = true

  kubernetes_config_map_ignore_role_changes = false
  map_additional_iam_roles = [
    {
      rolearn  = "arn:aws:iam::<account-id>:role/myRole"
      username = "added-role"
      groups   = ["system:masters"]
    }
  ]

  map_additional_iam_users = [
    {
      userarn  = "arn:aws:iam::<account-id>:user/myUser"
      username = "added-user"
      groups   = ["system:masters"]
    }
  ]
}

module "eks_node_group" {
  source  = "cloudposse/eks-node-group/aws"
  version = "2.4.0"

  ami_release_version   = [var.ami_release_version]
  instance_types        = [var.instance_type]
  subnet_ids            = var.private_subnet_ids
  min_size              = var.min_size
  max_size              = var.max_size
  desired_size          = var.desired_size
  cluster_name          = module.eks_cluster.eks_cluster_id
  name                  = var.node_group_name
  create_before_destroy = true
  kubernetes_version    = [var.kubernetes_version]

  cluster_autoscaler_enabled = var.autoscaling_policies_enabled

  module_depends_on = [module.eks_cluster.kubernetes_config_map_id]
}

Screenshots

This example shows when I change the variable kubernetes_config_map_ignore_role_changes from false to true.

2022-06-22 at 13 30

Also, I don’t see a difference in the map roles in the data block for both options except for the quoting.

2022-06-22 at 13 38

Environment (please complete the following information):

Anything that will help us triage the bug will help. Here are some ideas:

  • EKS version 1.22
  • Module version 2.2.0
@jchanam jchanam added the bug 🐛 An issue with the system label Jun 22, 2022
@Nuru
Copy link
Contributor

Nuru commented Jun 22, 2022

The kubernetes_config_map_ignore_role_changes setting is a bit of a hack. Terraform does not allow parameterization of the lifecycle block, so to give you the option of ignoring changes, we implement the resource twice, once with ignore_changes and once without.

If you set kubernetes_config_map_ignore_role_changes to true, then the expected behavior is that you will not be able to add or change Roles later, because you have told Terraform to ignore changes to the Roles. However, if you change the setting from false to true, the old ConfigMap is deleted and new one is created, so the first time you will get your roles added.

I’m attaching an example when I change the variable (from false to true).

The screen shot you provided looks like what would happen if you change kubernetes_config_map_ignore_role_changes from true to false and it looks like it includes the Roles you want. So I think you may just be understandably confused by Terraform's and this module's confusing behavior around how Role changes get ignored. I suggest you test it a little more with kubernetes_config_map_ignore_role_changes left at false (which is the only setting that will let you update the Roles after first creating the cluster) and report back if you still have questions.

@jchanam
Copy link
Author

jchanam commented Jun 23, 2022

Thanks for your answer.

My concern here is that, in the case that I let it to false, the worker roles will disappear, even though in the auth.tf file it’s concatenating them with:

mapRoles = replace(yamlencode(distinct(concat(local.map_worker_roles, var.map_additional_iam_roles))), "\"", local.yaml_quote)

This is why I’m more confused. I would love to manage all the authentication with terraform, but without breaking the regular configuration.

@sebastianmacarescu
Copy link

I kind of agree this is a bug because of:

  • if you set ignore role changes then any extra roles you add will be ignored so you need to update both terraform vars and manually apply the new config map - kind of bad because you lose the descriptive nature of terraform
  • if you don't ignore the role changes then this module will strictly apply what you specified in the variables and delete the existing aws auth map - also bad because you lose the roles added by eks when using managed node groups

So in both cases you lose something. I have got around this by first reading the config map (if exists) and merge with the current settings). That way terraform will manage only the roles setup in vars and will not touch any other manually added roles.

Example:

data "kubernetes_config_map" "existing_aws_auth" {
  metadata {
    name      = "aws-auth"
    namespace = "kube-system"
  }
  depends_on = [null_resource.wait_for_cluster[0]]
}
locals {
  map_node_roles = try(yamldecode(data.kubernetes_config_map.existing_aws_auth.data["mapRoles"]), {})
}

resource "kubernetes_config_map" "aws_auth" {
......
  mapRoles    = replace(yamlencode(distinct(concat(local.map_worker_roles, local.map_node_roles, var.map_additional_iam_roles))), "\"", local.yaml_quote)
....

Also I don't recommend using the new kubernetes_config_map_v1_data. This fails because the eks managed node group changes the ownership of the mapRoles in configMap if deployed separately. After testing vpcLambda is the sole owner of it so terraform applying again resets it.

@jchanam
Copy link
Author

jchanam commented Jun 23, 2022

@sebastianmacarescu I think that's an amazing option!

The only thing that I see it could happen, is when deleting roles. As they'd be already on the configmap, if you remove them from the map_additional_iam_roles, they won't be deleted as they'd be inside map_node_roles.

I definitely prefer to have to remove them manually but having the ability to manage them with terraform than having to add them by hand.

@Nuru
Copy link
Contributor

Nuru commented Jun 23, 2022

@jchanam

If you look at the Cloud Posse EKS Terraform component (think of at this point in time as a work-in-progress), you see that the way we handle this is to output and then read the managed role ARNs to include them in future auth maps. The component uses the Cloud Pose YAML stack config module to read the state, but you can use the terraform_remote_state data source directly.

@sebastianmacarescu
Copy link

@Nuru @jchanam I have done a working implementation that fully fixes this issue and I've tested it in production.

@Nuru can you take a look on that PR and let me know if the implementation looks good? If so I will also update the README file with proper documentation for the feature.

@sebastianmacarescu
Copy link

Can we get some review on the work done? It would be really great to have this merged

@honarkhah
Copy link

I kind of agree this is a bug because of:

  • if you set ignore role changes then any extra roles you add will be ignored so you need to update both terraform vars and manually apply the new config map - kind of bad because you lose the descriptive nature of terraform
  • if you don't ignore the role changes then this module will strictly apply what you specified in the variables and delete the existing aws auth map - also bad because you lose the roles added by eks when using managed node groups

So in both cases you lose something. I have got around this by first reading the config map (if exists) and merge with the current settings). That way terraform will manage only the roles setup in vars and will not touch any other manually added roles.

Example:

data "kubernetes_config_map" "existing_aws_auth" {
  metadata {
    name      = "aws-auth"
    namespace = "kube-system"
  }
  depends_on = [null_resource.wait_for_cluster[0]]
}
locals {
  map_node_roles = try(yamldecode(data.kubernetes_config_map.existing_aws_auth.data["mapRoles"]), {})
}

resource "kubernetes_config_map" "aws_auth" {
......
  mapRoles    = replace(yamlencode(distinct(concat(local.map_worker_roles, local.map_node_roles, var.map_additional_iam_roles))), "\"", local.yaml_quote)
....

Also I don't recommend using the new kubernetes_config_map_v1_data. This fails because the eks managed node group changes the ownership of the mapRoles in configMap if deployed separately. After testing vpcLambda is the sole owner of it so terraform applying again resets it.

What is the issue with moving to kubernetes_config_map_v1_data?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐛 An issue with the system
Projects
None yet
4 participants