Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add conditional policy statement attachments for EKS IAM role module #184

Conversation

bryantbiggs
Copy link
Member

@bryantbiggs bryantbiggs commented Feb 2, 2022

Description

  • Adds new sub-module for IAM role for service accounts with conditional IAM policies for users to opt into various permissions for common EKS tasks. Initially, the following permissions have been provided and more can easily be added/integrated:
  • Miscellaneous updates to align standards
    • outputs changed to use try()
    • versions updated to newer, expanded syntax style

Motivation and Context

Breaking Changes

  • No

How Has This Been Tested?

  • I have tested and validated these changes using one or more of the provided examples/* projects

@philicious
Copy link

philicious commented Feb 3, 2022

@bryantbiggs thx for drafting this !

it definitely goes into the right direction.

from a terraform-aws-modules/terraform-aws-eks users perspective, it would probably be nice having this right in that module but I understand that you rather wanna put it into this iam-eks module.
mentioning in docs or showing with examples how to easily integrate the two modules will help the user likewise

--

as for the question of which policies make sense to be included, I for myself would name:

  • cluster-autoscaler (very common use case)

  • aws-loadbalancer-controller (mentioned in the EKS docs and afaiu now recommended to use over the in-tree version)

  • AWS storage drivers as mentioned in AWS EKS docs: aws-ebs-csi-driver, aws-efs, aws-fsx. https://docs.aws.amazon.com/eks/latest/userguide/storage.html
    Some more, others rather uncommon but they are all in official AWS doc

  • external-dns is certainly common but its less close to "basic" EKS-setup then the others. If you like the idea of this module having also policies that fit in that category, its certainly worth adding

@bryantbiggs bryantbiggs marked this pull request as ready for review February 11, 2022 22:15
@bryantbiggs
Copy link
Member Author

@max-rocket-internet curious to hear your thoughts on this if you have the time to spare 🙏🏽

@@ -0,0 +1,454 @@
data "aws_partition" "current" {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Over all I like the idea of including the policies as it makes the module much more useful.

But it's quite a deviation from historical practice in the hashicorp module where they are generally kept VERY sparse and simple.

Copy link
Contributor

@max-rocket-internet max-rocket-internet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love all the changes but not this this one 🙂

Comment on lines 5 to 34
data "aws_eks_cluster" "main" {
for_each = var.cluster_service_accounts

name = each.key
}

data "aws_iam_policy_document" "assume_role_with_oidc" {
dynamic "statement" {
for_each = var.cluster_service_accounts

content {
effect = "Allow"

actions = ["sts:AssumeRoleWithWebIdentity"]

principals {
type = "Federated"

identifiers = [
"arn:${data.aws_partition.current.partition}:iam::${data.aws_caller_identity.current.account_id}:oidc-provider/${replace(data.aws_eks_cluster.main[statement.key].identity[0].oidc[0].issuer, "https://", "")}"
]
}

condition {
test = "StringEquals"
variable = "${replace(data.aws_eks_cluster.main[statement.key].identity[0].oidc[0].issuer, "https://", "")}:sub"
values = [for s in statement.value : "system:serviceaccount:${s}"]
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't remove all this! This is the whole reason why I wrote this module! To get a nice clean simple implementation, for example like this:

module "iam_eks_role" {
  source      = "terraform-aws-modules/iam/aws//modules/iam-eks-role"
  role_name   = "my-app"

  cluster_service_accounts = {
    "cluster1" = ["default:my-app"]
  }
}

In this case the user only needs to know the name of the cluster. That's it. By removing data.aws_eks_cluster.main you are then putting it on the user to get/find their OIDC provider ARN and URL. At which point the user needs to either have the EKS cluster in the same TF configuration (like you have in the examples now with provider=module.eks.oidc_provider) OR they have to use a data resource. If you rewrite it like this then it's basically the same as the iam-assumable-role-with-oidc module 😅

Furthermore, I already added var.provider_url_sa_pairs in case people really want to deal with the OIDC provider details or mix both.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unfortunately, this is the crux of the issue with this sub-module however.

  1. its not clear to users that the cluster name should be the key of the maps
  2. it restricts users to creating a role in one region while roles are global - they will will have to figure out when to use cluster_service_accounts and when to use provider_url_sa_pairs when its not quite clear what the difference is on the surface
  3. most importantly is the lifecycle issue. you cannot specify this role and a cluster and create at the same time. the cluster has to exist first

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm missing some context here, is there a github issue talking about issues with the iam-eks-role module?

  1. How is it not clear? Do you have some example github issues where users can't work it out? There were examples where it was clear but it seems @antonbabenko has changed it with random_pet 🤔 But in the main readme there is an example where the key is cluster1, that's pretty clear, no?
  2. But even with your change, it's not any different in this regard. If the difference between cluster_service_accounts/provider_url_sa_pairs is not clear from the docs then let's fix the docs. Or just remove provider_url_sa_pairs as this can be done in the module iam-assumable-role-with-oidc anyway.
  3. Why does that matter? An IAM role could be created in a separate TF configuration by a separate team to an EKS cluster. You can't create an RDS instance with terraform-aws-rds without a EC2 subnet existing but a subnet ID is just passed as a string, for example 🙂

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imho its not perfectly clear that the key of the map has to match the exact cluster name for functional reasons and isnt just meant to be an abbreviation to differentiate in the context of the map

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To give some context, this change came about from

And I agree with the sentiments of those issues - these seem like common patterns many folks are repeating over and over and over (creating an IRSA role, specifying a policy, etc. - for common addons like VPC CNI, cluster autoscaler, etc.)

At first I thought something like this was being requested terraform-aws-modules/terraform-aws-eks#1827 but after clarifying, its really just the IRSA role w/ policy for common addons. Something where folks can just simply create an IRSA role and flip a flag to provide canned permissions for the use case of that role.

So with that I came here because of the iam-assumable-role-with-oidc module, but then saw and remembered you added something for EKS and thought it would make the most sense to add on to this role to allow users to opt in to various canned permissions for common addons. However, in testing that I came across the lifecycle issues and through testing cycles just reduced it down to the version you see now.

Thats the context bit.

Regarding a path forward, I do believe we have to fix the lifecycle issue. Unfortunately this is how a lot of people work, especially when testing/trying out modules, plus we are seeing a lot more use cases where users are creating and destroying resources quite frequently in a CI pipeline type setup. So the targetted apply of terraform apply -target module.vpc -target module.eks is not going to fly. It also won't fly when we start propagating the use of this module to other modules and documentation. Its just one of the core tenants people rely on with Terraform - that Terraform will manage the lifecycle dependencies and not them.

Outside of that, personally I don't see the benefit in having the data source embedded with the restrictions it provides over having users provide the details. They can use the data source on their end if they like. And this is the area where I say that the module has been designed with in conjunction with the EKS module. I have aligned the inputs of this module with the outputs of that module so its very clear what is expected. That said, it doesn't lock people into using the EKS module; it just shows the relationship since I suspect this will most likely be the most used case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the context being given here is from the behemoth EKS module, used solely by EKS admins. Where the iam-eks-role module was written for regular users who have workloads on EKS who just want a simple IRSA role created.

}
}
```

Note: the EKS clusters must in the current AWS region and account as they use the default AWS provider.
This module has been design in conjunction with the [`terraform-aws-eks`](https://github.com/terraform-aws-modules/terraform-aws-eks) module and easily integrates with it:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it hasn't and shouldn't be related or reliant on this other module IMO

Suggested change
This module has been design in conjunction with the [`terraform-aws-eks`](https://github.com/terraform-aws-modules/terraform-aws-eks) module and easily integrates with it:
This module has been design in conjunction with the [`terraform-aws-eks`](https://github.com/terraform-aws-modules/terraform-aws-eks) module and easily integrates with it:

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but it has and it should be - this module handles the IAM roles/policies while the EKS module handles the cluster resources. there are next steps to remove current pieces from the EKS module like https://github.com/bryantbiggs/terraform-aws-eks/blob/master/node_groups.tf#L9-L47

plus we have other teams that are looking for this integration as well to simplify their user experience - Karpenter being one of them with others interested as well

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

likewise, we have made changes to make these more interoperable terraform-aws-modules/terraform-aws-eks#1870

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But there is nothing currently in this module that is related to the EKS module? They weren't "designed in conjunction with" at all, they were written and are separate. You could create your EKS cluster with eksctl and still use this module? Maybe my beef is just with the wording..

@max-rocket-internet
Copy link
Contributor

max-rocket-internet commented Feb 14, 2022

Maybe just ignore my previous comments about specifics and focus on the bigger picture with this PR and the iam-eks-role module: It looks like you want to use the iam-eks-role module to replace parts of the IAM policy code from the terraform-aws-eks module, and use it to apply preconfigured IAM policies for common "addons" like karpenter, external-dns etc. Is that correct? If so, I think that sounds great 👍

But then this functionality is also simply removing and replacing what I wrote this module for. It's now kind of an extension of the terraform-aws-eks module, making them closely related, based on the comments and examples in this PR. So you're basically just deleting the module I wrote and creating a new one with the same name 🙂 Also fine, but annoying for people who have complete separation between IRSA and EKS code and wanted a simple module to create many roles for use on EKS that are unrelated to cluster addons (like apps for example)

@bryantbiggs
Copy link
Member Author

Also fine, but annoying for people who have complete separation between IRSA and EKS code and wanted a simple module to create many roles for use on EKS that are unrelated to cluster addons (like apps for example)

I don't follow - unless I am missing something, these changes aren't locking people into anything. The changes do lean towards some assumptions (module will most likely be used with EKS and the EKS module in this organization), but it doesn't lock them in to only those assumptions

@max-rocket-internet
Copy link
Contributor

these changes aren't locking people into anything

No they are not but you are removing the clean implementation that was available in iam-eks-role so that it can work nicely with functionality for users of terraform-aws-eks. i.e. changing the docs to show implementation with terraform-aws-eks, specifically stating that it is now "design in conjunction with", removed the data resources etc.

some assumptions (module will most likely be used with EKS and the EKS module in this organization

Exactly. The user that iam-eks-role was written for is like a user of EKS who has nothing to do with EKS infrastructure themselves. i.e. they just know the cluster name, kubectl or helm and don't need anything more. i.e. not a cluster administrator. For example a developer.

To give you context: we have about 50 EKS clusters managed by a "devops" team and about 400 developers who manage a whole bunch of apps across these clusters. In a large environment like this, EKS cluster TF resources can't be mixed in with application resources like in your docs, otherwise the TF state gets too big, blast radius etc. Developers need to create IRSA roles (we have 1000s of them) for their apps but have absolutely no knowledge of OIDC and very little of EKS. I didn't write the module for myself, I think there are others who want something clean and simple like this also. If you think outside of the bubble of EKS administration, there is a lot of general users, especially in larger orgs.

But anyway, do as you wish, I don't really mind. I just wanted to make you aware of why the module was written like that and who it was for 🙂

@philicious
Copy link

as I spiked this addition in the eks-module and now seeing it being added here but having learned its origin from your convo, I can see both user-groups as stakeholders: those who want full flexibility and those who want some pre-canned policies.

shouldnt it be possible to keep the module flexible and satisfy both use-cases and not introduce breaking-changes ?

@max-rocket-internet
Copy link
Contributor

shouldnt it be possible to keep the module flexible and satisfy both use-cases and not introduce breaking-changes ?

Yes. Just don't delete the block of code that I commented on in my review. But TBH if we are going to merge the rest of the changes to the iam-eks-role module from this PR, then you might as well remove that code also because the module then becomes an extension of the EKS module. With the doc and example changes, all the boolean vars to add cluster component policies, then the iam-eks-role module will probably only be used by EKS admins.

Perhaps I should create a new module, called iam-eks-role-simple that explains better the use of the cluster name?

@philicious
Copy link

philicious commented Feb 14, 2022

I wouldn't argue it's necessarily an extension of the EKS module. You could as well use it standalone and pass in the clustername or OIDC issuer url. No matter where it originates from.

If you guys decide for keeping separate modules, it's probably more existing-user-friendly to keep this one as is and put the new stuff from this PR in a different and new module like "iam-eks-common" or such

@bryantbiggs
Copy link
Member Author

going with separate module for now I guess

@philicious
Copy link

@bryantbiggs the implementation of the policy submodules looks really great so far 👍🏼
Happy to see these modules coming

How do you feel about also adding aws-load-balancer-controller policy ?
It's quite common also and by now preferable over the in-tree one

Btw I will also give the policies a 3rd pair of eyes if you don't mind.

Copy link
Member

@antonbabenko antonbabenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very good. Minor comments here and there. :)

@@ -0,0 +1,174 @@
# IAM Role for Service Accounts in EKS
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update main README.md in the root of the module to include this new module&example into the lists there also.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done - also ordered them to match the sub-module/example directory

private_subnets = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
public_subnets = ["10.0.4.0/24", "10.0.5.0/24", "10.0.6.0/24"]

enable_nat_gateway = true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is not critical for the example, can we disable NAT and VPC flow logs in the example to be able to spin up things faster&cheaper?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, we actually do need the NAT gateway (or replace it with 3 private VPC endpoints which takes just as long to provision as a NAT gateway)

tags = local.tags
}

module "karpenter_controller_irsa_role" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add one module for a case where create_role = false because sometimes there can be some weird things we can't check in the happy path.

I don't remember the name of the module where this was a problem a few weeks/months ago.

Copy link
Member Author

@bryantbiggs bryantbiggs Feb 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done! (and it was Atlantis w/ EFS 😅 )

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bryantbiggs the implementation of the policy submodules looks really great so far 👍🏼 Happy to see these modules coming

How do you feel about also adding aws-load-balancer-controller policy ? It's quite common also and by now preferable over the in-tree one

Btw I will also give the policies a 3rd pair of eyes if you don't mind.

@philicious we'll add some more in a next PR, this one is getting big enough - hang tight

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done! (and it was Atlantis w/ EFS 😅 )

so it was so much recently :)

@bryantbiggs bryantbiggs marked this pull request as draft February 16, 2022 13:46
@bryantbiggs bryantbiggs marked this pull request as ready for review February 16, 2022 14:17
@bryantbiggs
Copy link
Member Author

ok should be all set now @antonbabenko

@antonbabenko antonbabenko merged commit e29b94f into terraform-aws-modules:master Feb 16, 2022
antonbabenko pushed a commit that referenced this pull request Feb 16, 2022
## [4.12.0](v4.11.0...v4.12.0) (2022-02-16)

### Features

* Add conditional policy statement attachments for EKS IAM role module ([#184](#184)) ([e29b94f](e29b94f))
@antonbabenko
Copy link
Member

This PR is included in version 4.12.0 🎉

@github-actions
Copy link

github-actions bot commented Nov 9, 2022

I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 9, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

VPC-CNI addon with managed IRSA Add optional IRSA provisioning for most common EKS components ?
4 participants