Skip to content

Commit

Permalink
- Worker locals/defaults moved to workers submodule
Browse files Browse the repository at this point in the history
- Create separate defaults for node groups
- Workers IAM management left outside of module as both node_group and worker_groups uses them
- Add option to migrate to worker group module
  • Loading branch information
Grzegorz Lisowski committed Jun 12, 2021
1 parent 9022013 commit 0ecfa80
Show file tree
Hide file tree
Showing 32 changed files with 1,265 additions and 384 deletions.
19 changes: 11 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,12 +57,12 @@ module "my-cluster" {
subnets = ["subnet-abcde012", "subnet-bcde012a", "subnet-fghi345a"]
vpc_id = "vpc-1234556abcdef"
worker_groups = [
{
worker_groups = {
group = {
instance_type = "m4.large"
asg_max_size = 5
}
]
}
}
```
## Conditional creation
Expand Down Expand Up @@ -161,8 +161,9 @@ Apache 2 Licensed. See [LICENSE](https://github.com/terraform-aws-modules/terraf

| Name | Source | Version |
|------|--------|---------|
| <a name="module_fargate"></a> [fargate](#module\_fargate) | ./modules/fargate | |
| <a name="module_node_groups"></a> [node\_groups](#module\_node\_groups) | ./modules/node_groups | |
| <a name="module_fargate"></a> [fargate](#module\_fargate) | ./modules/fargate | n/a |
| <a name="module_node_groups"></a> [node\_groups](#module\_node\_groups) | ./modules/node_groups | n/a |
| <a name="module_worker_groups"></a> [worker\_groups](#module\_worker\_groups) | ./modules/worker_groups | n/a |

## Resources

Expand Down Expand Up @@ -266,7 +267,7 @@ Apache 2 Licensed. See [LICENSE](https://github.com/terraform-aws-modules/terraf
| <a name="input_subnets"></a> [subnets](#input\_subnets) | A list of subnets to place the EKS cluster and workers within. | `list(string)` | n/a | yes |
| <a name="input_tags"></a> [tags](#input\_tags) | A map of tags to add to all resources. Tags added to launch configuration or templates override these values for ASG Tags only. | `map(string)` | `{}` | no |
| <a name="input_vpc_id"></a> [vpc\_id](#input\_vpc\_id) | VPC where the cluster and workers will be deployed. | `string` | n/a | yes |
| <a name="input_wait_for_cluster_timeout"></a> [wait\_for\_cluster\_timeout](#wait\_for\_cluster\_timeout) | Allows for a configurable timeout (in seconds) when waiting for a cluster to come up | `number` | `300` | no |
| <a name="input_wait_for_cluster_timeout"></a> [wait\_for\_cluster\_timeout](#input\_wait\_for\_cluster\_timeout) | A timeout (in seconds) to wait for cluster to be available. | `number` | `300` | no |
| <a name="input_worker_additional_security_group_ids"></a> [worker\_additional\_security\_group\_ids](#input\_worker\_additional\_security\_group\_ids) | A list of additional security group ids to attach to worker instances | `list(string)` | `[]` | no |
| <a name="input_worker_ami_name_filter"></a> [worker\_ami\_name\_filter](#input\_worker\_ami\_name\_filter) | Name filter for AWS EKS worker AMI. If not provided, the latest official AMI for the specified 'cluster\_version' is used. | `string` | `""` | no |
| <a name="input_worker_ami_name_filter_windows"></a> [worker\_ami\_name\_filter\_windows](#input\_worker\_ami\_name\_filter\_windows) | Name filter for AWS EKS Windows worker AMI. If not provided, the latest official AMI for the specified 'cluster\_version' is used. | `string` | `""` | no |
Expand All @@ -275,8 +276,9 @@ Apache 2 Licensed. See [LICENSE](https://github.com/terraform-aws-modules/terraf
| <a name="input_worker_create_cluster_primary_security_group_rules"></a> [worker\_create\_cluster\_primary\_security\_group\_rules](#input\_worker\_create\_cluster\_primary\_security\_group\_rules) | Whether to create security group rules to allow communication between pods on workers and pods using the primary cluster security group. | `bool` | `false` | no |
| <a name="input_worker_create_initial_lifecycle_hooks"></a> [worker\_create\_initial\_lifecycle\_hooks](#input\_worker\_create\_initial\_lifecycle\_hooks) | Whether to create initial lifecycle hooks provided in worker groups. | `bool` | `false` | no |
| <a name="input_worker_create_security_group"></a> [worker\_create\_security\_group](#input\_worker\_create\_security\_group) | Whether to create a security group for the workers or attach the workers to `worker_security_group_id`. | `bool` | `true` | no |
| <a name="input_worker_groups"></a> [worker\_groups](#input\_worker\_groups) | A list of maps defining worker group configurations to be defined using AWS Launch Configurations. See workers\_group\_defaults for valid keys. | `any` | `[]` | no |
| <a name="input_worker_groups_launch_template"></a> [worker\_groups\_launch\_template](#input\_worker\_groups\_launch\_template) | A list of maps defining worker group configurations to be defined using AWS Launch Templates. See workers\_group\_defaults for valid keys. | `any` | `[]` | no |
| <a name="input_worker_groups"></a> [worker\_groups](#input\_worker\_groups) | A map of maps defining worker group configurations to be defined using AWS Launch Templates. See workers\_group\_defaults for valid keys. | `any` | `{}` | no |
| <a name="input_worker_groups_launch_template_legacy"></a> [worker\_groups\_launch\_template\_legacy](#input\_worker\_groups\_launch\_template\_legacy) | A list of maps defining worker group configurations to be defined using AWS Launch Templates. See workers\_group\_defaults for valid keys. | `any` | `[]` | no |
| <a name="input_worker_groups_legacy"></a> [worker\_groups\_legacy](#input\_worker\_groups\_legacy) | A list of maps defining worker group configurations to be defined using AWS Launch Configurations. See workers\_group\_defaults for valid keys. | `any` | `[]` | no |
| <a name="input_worker_security_group_id"></a> [worker\_security\_group\_id](#input\_worker\_security\_group\_id) | If provided, all workers will be attached to this security group. If not given, a security group will be created with necessary ingress/egress to work with the EKS cluster. | `string` | `""` | no |
| <a name="input_worker_sg_ingress_from_port"></a> [worker\_sg\_ingress\_from\_port](#input\_worker\_sg\_ingress\_from\_port) | Minimum port number from which pods will accept communication. Must be changed to a lower value if some pods in your cluster will expose a port lower than 1025 (e.g. 22, 80, or 443). | `number` | `1025` | no |
| <a name="input_workers_additional_policies"></a> [workers\_additional\_policies](#input\_workers\_additional\_policies) | Additional policies to be added to workers | `list(string)` | `[]` | no |
Expand Down Expand Up @@ -311,6 +313,7 @@ Apache 2 Licensed. See [LICENSE](https://github.com/terraform-aws-modules/terraf
| <a name="output_node_groups"></a> [node\_groups](#output\_node\_groups) | Outputs from EKS node groups. Map of maps, keyed by var.node\_groups keys |
| <a name="output_oidc_provider_arn"></a> [oidc\_provider\_arn](#output\_oidc\_provider\_arn) | The ARN of the OIDC Provider if `enable_irsa = true`. |
| <a name="output_security_group_rule_cluster_https_worker_ingress"></a> [security\_group\_rule\_cluster\_https\_worker\_ingress](#output\_security\_group\_rule\_cluster\_https\_worker\_ingress) | Security group rule responsible for allowing pods to communicate with the EKS cluster API. |
| <a name="output_worker_groups"></a> [worker\_groups](#output\_worker\_groups) | Outputs from EKS worker groups. Map of maps, keyed by var.worker\_groups keys |
| <a name="output_worker_iam_instance_profile_arns"></a> [worker\_iam\_instance\_profile\_arns](#output\_worker\_iam\_instance\_profile\_arns) | default IAM instance profile ARN for EKS worker groups |
| <a name="output_worker_iam_instance_profile_names"></a> [worker\_iam\_instance\_profile\_names](#output\_worker\_iam\_instance\_profile\_names) | default IAM instance profile name for EKS worker groups |
| <a name="output_worker_iam_role_arn"></a> [worker\_iam\_role\_arn](#output\_worker\_iam\_role\_arn) | default IAM role ARN for EKS worker groups |
Expand Down
9 changes: 5 additions & 4 deletions aws_auth.tf
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
locals {
auth_launch_template_worker_roles = [
for index in range(0, var.create_eks ? local.worker_group_launch_template_count : 0) : {
for index in range(0, var.create_eks ? local.worker_group_launch_template_legacy_count : 0) : {
worker_role_arn = "arn:${data.aws_partition.current.partition}:iam::${data.aws_caller_identity.current.account_id}:role/${element(
coalescelist(
aws_iam_instance_profile.workers_launch_template.*.role,
Expand All @@ -10,15 +10,15 @@ locals {
index
)}"
platform = lookup(
var.worker_groups_launch_template[index],
var.worker_groups_launch_template_legacy[index],
"platform",
local.workers_group_defaults["platform"]
)
}
]

auth_worker_roles = [
for index in range(0, var.create_eks ? local.worker_group_count : 0) : {
for index in range(0, var.create_eks ? local.worker_group_legacy_count : 0) : {
worker_role_arn = "arn:${data.aws_partition.current.partition}:iam::${data.aws_caller_identity.current.account_id}:role/${element(
coalescelist(
aws_iam_instance_profile.workers.*.role,
Expand All @@ -28,7 +28,7 @@ locals {
index,
)}"
platform = lookup(
var.worker_groups[index],
var.worker_groups_legacy[index],
"platform",
local.workers_group_defaults["platform"]
)
Expand All @@ -40,6 +40,7 @@ locals {
for role in concat(
local.auth_launch_template_worker_roles,
local.auth_worker_roles,
module.worker_groups.aws_auth_roles,
module.node_groups.aws_auth_roles,
module.fargate.aws_auth_roles,
) :
Expand Down
1 change: 1 addition & 0 deletions cluster.tf
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@ resource "aws_security_group" "cluster" {
name_prefix = var.cluster_name
description = "EKS cluster security group."
vpc_id = var.vpc_id

tags = merge(
var.tags,
{
Expand Down
10 changes: 5 additions & 5 deletions data.tf
Original file line number Diff line number Diff line change
Expand Up @@ -64,23 +64,23 @@ data "aws_iam_policy_document" "cluster_assume_role_policy" {
}

data "aws_iam_role" "custom_cluster_iam_role" {
count = var.manage_cluster_iam_resources ? 0 : 1
count = var.create_eks && !var.manage_cluster_iam_resources ? 1 : 0
name = var.cluster_iam_role_name
}

data "aws_iam_instance_profile" "custom_worker_group_iam_instance_profile" {
count = var.manage_worker_iam_resources ? 0 : local.worker_group_count
count = var.create_eks && !var.manage_worker_iam_resources ? local.worker_group_legacy_count : 0
name = lookup(
var.worker_groups[count.index],
var.worker_groups_legacy[count.index],
"iam_instance_profile_name",
local.workers_group_defaults["iam_instance_profile_name"],
)
}

data "aws_iam_instance_profile" "custom_worker_group_launch_template_iam_instance_profile" {
count = var.manage_worker_iam_resources ? 0 : local.worker_group_launch_template_count
count = var.create_eks && !var.manage_worker_iam_resources ? local.worker_group_launch_template_legacy_count : 0
name = lookup(
var.worker_groups_launch_template[count.index],
var.worker_groups_launch_template_legacy[count.index],
"iam_instance_profile_name",
local.workers_group_defaults["iam_instance_profile_name"],
)
Expand Down
29 changes: 10 additions & 19 deletions docs/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

## How do I customize X on the worker group's settings?

All the options that can be customized for worker groups are listed in [local.tf](https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/local.tf) under `workers_group_defaults_defaults`.
All the options that can be customized for worker groups are listed in [local.tf](https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/modules/worker_groups/local.tf) under `workers_group_defaults_defaults`.

Please open Issues or PRs if you think something is missing.

Expand Down Expand Up @@ -61,12 +61,6 @@ You need to add the tags to the VPC and subnets yourself. See the [basic example

An alternative is to use the aws provider's [`ignore_tags` variable](https://www.terraform.io/docs/providers/aws/#ignore\_tags-configuration-block). However this can also cause terraform to display a perpetual difference.

## How do I safely remove old worker groups?

You've added new worker groups. Deleting worker groups from earlier in the list causes Terraform to want to recreate all worker groups. This is a limitation with how Terraform works and the module using `count` to create the ASGs and other resources.

The safest and easiest option is to set `asg_min_size` and `asg_max_size` to 0 on the worker groups to "remove".

## Why does changing the worker group's desired count not do anything?

The module is configured to ignore this value. Unfortunately Terraform does not support variables within the `lifecycle` block.
Expand All @@ -77,9 +71,9 @@ You can change the desired count via the CLI or console if you're not using the

If you are not using autoscaling and really want to control the number of nodes via terraform then set the `asg_min_size` and `asg_max_size` instead. AWS will remove a random instance when you scale down. You will have to weigh the risks here.

## Why are nodes not recreated when the `launch_configuration`/`launch_template` is recreated?
## Why are nodes not recreated when the `launch_configuration` is recreated?

By default the ASG is not configured to be recreated when the launch configuration or template changes. Terraform spins up new instances and then deletes all the old instances in one go as the AWS provider team have refused to implement rolling updates of autoscaling groups. This is not good for kubernetes stability.
By default the ASG is not configured to be recreated when the launch configuration changes. Terraform spins up new instances and then deletes all the old instances in one go as the AWS provider team have refused to implement rolling updates of autoscaling groups. This is not good for kubernetes stability.

You need to use a process to drain and cycle the workers.

Expand Down Expand Up @@ -137,35 +131,32 @@ Amazon EKS clusters must contain one or more Linux worker nodes to run core syst
1. Build AWS EKS cluster with the next workers configuration (default Linux):

```
worker_groups = [
{
name = "worker-group-linux"
worker_groups = {
worker-group-linux = {
instance_type = "m5.large"
platform = "linux"
asg_desired_capacity = 2
},
]
}
```

2. Apply commands from https://docs.aws.amazon.com/eks/latest/userguide/windows-support.html#enable-windows-support (use tab with name `Windows`)

3. Add one more worker group for Windows with required field `platform = "windows"` and update your cluster. Worker group example:

```
worker_groups = [
{
name = "worker-group-linux"
worker_groups = {
worker-group-linux = {
instance_type = "m5.large"
platform = "linux"
asg_desired_capacity = 2
},
{
name = "worker-group-windows"
worker-group-windows = {
instance_type = "m5.large"
platform = "windows"
asg_desired_capacity = 1
},
]
}
```

4. With `kubectl get nodes` you can see cluster with mixed (Linux/Windows) nodes support.
Expand Down
48 changes: 5 additions & 43 deletions docs/spot-instances.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,65 +22,27 @@ Notes:
- There is an AWS blog article about this [here](https://aws.amazon.com/blogs/compute/run-your-kubernetes-workloads-on-amazon-ec2-spot-instances-with-amazon-eks/).
- Consider using [k8s-spot-rescheduler](https://github.com/pusher/k8s-spot-rescheduler) to move pods from on-demand to spot instances.

## Using Launch Configuration

Example worker group configuration that uses an ASG with launch configuration for each worker group:

```hcl
worker_groups = [
{
name = "on-demand-1"
instance_type = "m4.xlarge"
asg_max_size = 1
kubelet_extra_args = "--node-labels=node.kubernetes.io/lifecycle=normal"
suspended_processes = ["AZRebalance"]
},
{
name = "spot-1"
spot_price = "0.199"
instance_type = "c4.xlarge"
asg_max_size = 20
kubelet_extra_args = "--node-labels=node.kubernetes.io/lifecycle=spot"
suspended_processes = ["AZRebalance"]
},
{
name = "spot-2"
spot_price = "0.20"
instance_type = "m4.xlarge"
asg_max_size = 20
kubelet_extra_args = "--node-labels=node.kubernetes.io/lifecycle=spot"
suspended_processes = ["AZRebalance"]
}
]
```

## Using Launch Templates

Launch Template support is a recent addition to both AWS and this module. It might not be as tried and tested but it's more suitable for spot instances as it allowed multiple instance types in the same worker group:

```hcl
worker_groups = [
{
name = "on-demand-1"
worker_groups = {
on-demand-1 = {
instance_type = "m4.xlarge"
asg_max_size = 10
kubelet_extra_args = "--node-labels=spot=false"
suspended_processes = ["AZRebalance"]
}
]
worker_groups_launch_template = [
{
name = "spot-1"
},
spot-1 = {
override_instance_types = ["m5.large", "m5a.large", "m5d.large", "m5ad.large"]
spot_instance_pools = 4
asg_max_size = 5
asg_desired_capacity = 5
kubelet_extra_args = "--node-labels=node.kubernetes.io/lifecycle=spot"
public_ip = true
},
]
}
```

## Using Launch Templates With Both Spot and On Demand
Expand Down
67 changes: 67 additions & 0 deletions docs/upgrades.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,3 +58,70 @@ Plan: 0 to add, 0 to change, 1 to destroy.
5. If everything sounds good to you, run `terraform apply`

After the first apply, we recommand you to create a new node group and let the module use the `node_group_name_prefix` (by removing the `name` argument) to generate names and avoid collision during node groups re-creation if needed, because the lifce cycle is `create_before_destroy = true`.

## Upgrade module to vXX.X.X for Worker Groups Managed as maps

In this release, we added ability to manage Worker Groups as maps (not lists) which improves the ability to add/remove worker groups.

>NOTE: The new functionality supports only creating groups using Launch Templates!
1. Run `terraform apply` with the previous module version. Make sure all changes are applied before proceeding.

2. Upgrade your module and configure your worker groups by renaming existing variable names as follows:

```
worker_groups = [...] => worker_groups_legacy = [...]
worker_groups_launch_template = [...] => worker_groups_launch_template_legacy = [...]
```

Example:

FROM:

```hcl
worker_groups_launch_template = [
{
name = "worker-group-1"
instance_type = "t3.small"
asg_desired_capacity = 2
public_ip = true
},
]
```

TO:

```hcl
worker_groups_launch_template_legacy = [
{
name = "worker-group-1"
instance_type = "t3.small"
asg_desired_capacity = 2
public_ip = true
},
]
```

3. Run `terraform plan`. No infrastructure changes expected

4. Starting from now on you could define worker groups in a new way and migrate your workload there. Eventually the legacy groups could be deleted.

Example:

```hcl
worker_groups_launch_template_legacy = [
{
name = "worker-group-1"
instance_type = "t3.small"
asg_desired_capacity = 2
},
]
worker_groups = {
worker-group-1 = {
instance_type = "t3.small"
asg_desired_capacity = 2
},
}
```
Loading

0 comments on commit 0ecfa80

Please sign in to comment.