Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Add node group dependency for EKS addons resource creation #1840

Merged

Conversation

bryantbiggs
Copy link
Member

Description

  • Add node group dependency for EKS addons resource creation
  • Added note regarding future addons that may need to be provisioned ahead of node groups; we can add a separate resource for those addons or re-visit if the lifecycle of addons has been fixed upstream. However, for now I think we can live with this change without much issue

Motivation and Context

Breaking Changes

  • No

How Has This Been Tested?

  • I have tested and validated these changes using one or more of the provided examples/* projects
    • Tested with eks-managed-node-group example

@bryantbiggs
Copy link
Member Author

bryantbiggs commented Feb 3, 2022

to follow this up, I think the next "breaking" change will be to move addons out to a sub-module (not yet, but getting very close):

  • these dependencies are resolved once addons are externally managed/created from the cluster
  • this sub-module will contain other features related to addons (see VPC-CNI addon with managed IRSA #1841)

[Edit]
Actually, closing this for now. Instead of adding this here, I'll open a PR for that new addon sub-module and that will be the resolution for #1801

The current implementation for addons will stay for now with a TODO for removal at next breaking change, and documentation explaining what users should do today for addons

@bryantbiggs
Copy link
Member Author

bryantbiggs commented Feb 6, 2022

after looking into this further, I do think this is a valid change to consider:

  1. From the AWS docs

Amazon EKS add-ons run on the nodes that you provision or configure for your cluster. Node types include Amazon EC2 instances and Fargate.

This means that the hard dependency is valid and my prior assumption around conflicts that could arise from control plane addons was incorrect (addons run on nodes so nodes must come first)

  1. VPC-CNI addon with managed IRSA #1841 will be supported with the policy additions in feat: Add conditional policy statement attachments for EKS IAM role module terraform-aws-iam#184

@bryantbiggs bryantbiggs reopened this Feb 6, 2022
@bryantbiggs
Copy link
Member Author

@antonbabenko this is ready for review if you get some time

@antonbabenko antonbabenko merged commit 2515e0e into terraform-aws-modules:master Feb 7, 2022
antonbabenko pushed a commit that referenced this pull request Feb 7, 2022
### [18.4.1](v18.4.0...v18.4.1) (2022-02-07)

### Bug Fixes

* Add node group dependency for EKS addons resource creation ([#1840](#1840)) ([2515e0e](2515e0e))
@antonbabenko
Copy link
Member

This PR is included in version 18.4.1 🎉

@drodrigues3
Copy link

I`m facing an issue that I belive is related to this feature, I did create a new cluster and after 20 minutes all the node groups was failed to start. In the description of the node I did found this error message:

Ready False Fri, 11 Feb 2022 15:46:44 -0300 Fri, 11 Feb 2022 15:46:34 -0300 KubeletNotReady container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized

To solve this I just enabled manually the ADDON AWS CNI and KUBE-PROXY, so all the node groups was started.

May these two addons should be created at same time that the node group ?

@FernandoMiguel
Copy link
Contributor

my understanding is that at least AWS VPC CNI needs to be available before ec2 are spun up, or they wont join the cluster
coredns needs to be added only after the node groups are healthy and joined the cluster.

@bryantbiggs
Copy link
Member Author

@FernandoMiguel per the docs, addons come first. However, i do agree with your sentiment and have submitted a proposal for one possible resolution (see linked issues - they all related to the CNI quite a bit) aws/containers-roadmap#1666

@FernandoMiguel
Copy link
Contributor

On our module that depends on this, I've split up what goes to the module (at creation time of the workers) and another set of vars and addons that depend on the workers be already created....it's smelly code but works most of the times.
I still have issues with fargate profiles where coredns gets unhealthy a bunch of times... I have to manually delete it and reapply...
Need to spend time figuring out what's preventing it from get healthy at creation time.

@bryantbiggs
Copy link
Member Author

On our module that depends on this, I've split up what goes to the module (at creation time of the workers) and another set of vars and addons that depend on the workers be already created....it's smelly code but works most of the times. I still have issues with fargate profiles where coredns gets unhealthy a bunch of times... I have to manually delete it and reapply... Need to spend time figuring out what's preventing it from get healthy at creation time.

I am assuming you saw this https://docs.aws.amazon.com/eks/latest/userguide/fargate-getting-started.html#fargate-gs-coredns - but just FYI in case you didn't

@FernandoMiguel
Copy link
Contributor

On our module that depends on this, I've split up what goes to the module (at creation time of the workers) and another set of vars and addons that depend on the workers be already created....it's smelly code but works most of the times. I still have issues with fargate profiles where coredns gets unhealthy a bunch of times... I have to manually delete it and reapply... Need to spend time figuring out what's preventing it from get healthy at creation time.

I am assuming you saw this https://docs.aws.amazon.com/eks/latest/userguide/fargate-getting-started.html#fargate-gs-coredns - but just FYI in case you didn't

I did.. And something tells me the doc is outdated.
Recently seems you no longer need to perform the EDIT..
At least I've been able to deploy clusters just fine... But sometimes it fails... Not every time.
This started working with the newer version of the coredns addon.
If you don't specify the version, it will pick and older one, and I believe that one fails.
I've pinned ours to the latest and is mostly OK.

baibailiha added a commit to baibailiha/terraform-aws-eks that referenced this pull request Sep 13, 2022
@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 10, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add-ons management Race condition with Node Groups
4 participants