Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Ability to manage worker groups as maps #858

Conversation

grzegorzlisowski
Copy link

@grzegorzlisowski grzegorzlisowski commented May 3, 2020

PR o'clock

Description

Resolves #774

The change is intended to improve the ability to manage worker groups using maps. Which should allow to more flexibly add/remove worker groups (improve this: https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/docs/faq.md#how-do-i-safely-remove-old-worker-groups).

The change includes suggested in #774 changes which include:

  • create separate submodule for worker_groups
  • the submodule includes only LaunchTemplate support

Change to the worker groups definitions:

  • to use new worker groups definitions use:
worker_groups = {}
  • old worker groups definitions should be renamed as below to avoid resources destruction (and allow migration)
worker_groups = [...] => worker_groups_legacy = [...]

worker_groups_launch_template = [...] => worker_groups_launch_template_legacy = [...]

Checklist

@barryib
Copy link
Member

barryib commented May 4, 2020

Thanks @grzegorzlisowski for opening this PR. Actually, this is something we want to add into this module and totally drop loops with count in favor of for/for_each.

With that said, it would be nice to split the feature into a new submodule. We started some discussion at #774.

@js-timbirkett is maybe busy actually, so if you can open a PR to address #774 it would be very appreciated.

aws_auth.tf Outdated
@@ -38,11 +37,27 @@ locals {
}
]

auth_launch_template_worker_roles_ext = [
for k, v in local.worker_group_launch_template_configurations_ext : {
worker_role_arn = "arn:aws:iam::${data.aws_caller_identity.current.account_id}:role/${var.manage_worker_iam_resources ? aws_iam_instance_profile.workers_launch_template_ext[k].role : data.aws_iam_instance_profile.custom_worker_group_launch_template_iam_instance_profile_ext[k].role_name}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to replace the :aws: portion of the ARN with :${data.aws_partition.current.partition}: (same in auth_worker_roles_ext below)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

data.tf Outdated
name = each.value["iam_instance_profile_name"]
}

data "aws_region" "current" {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't seem to be used. Remove?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@grzegorzlisowski grzegorzlisowski force-pushed the feature/devops-1421 branch 4 times, most recently from 9820bb7 to 695030c Compare May 7, 2020 09:13

dynamic mixed_instances_policy {
iterator = item
for_each = ((lookup(var.worker_groups[each.key], "override_instance_types", null) != null) || (lookup(var.worker_groups[each.key], "on_demand_allocation_strategy", null) != null)) ? list(each.value) : []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does var.worker_groups[each.key] different from each.value ? If not, I'll prefer to use that. It'll more consistant for readability.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, it is different according to my analysis. This is only to avoid changing old logic. See here: https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/workers_launch_template.tf#L93

Copy link
Author

@grzegorzlisowski grzegorzlisowski May 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, responded too early.
This is what came as an argument from user: var.worker_groups_launch_template[count.index] while "each.value" represents what is already a merge with module defaults. Which IMHO could change original behaviour


dynamic launch_template {
iterator = item
for_each = ((lookup(var.worker_groups[each.key], "override_instance_types", null) != null) || (lookup(var.worker_groups[each.key], "on_demand_allocation_strategy", null) != null)) ? [] : list(each.value)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does var.worker_groups[each.key] different from each.value ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same as the previous case above

@barryib
Copy link
Member

barryib commented May 8, 2020

the old way of defining worker groups still possible for some backward compatibility and easier migration to the new version (after some time old way of defining worker_groups could be dropped entirely)

I was wondering if we shouldn't drop this directly. Because for existing worker group, users will have to move resources in the state. So why not move them directly into a map ?

For locals, shouldn't we move https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/local.tf#L34-L100 and into the submodule ?

@grzegorzlisowski
Copy link
Author

My assumption was that users could apply the new module and add new worker groups using maps then migrate K8S resources to new nodes and then remove old groups simply by removing the list.

Those locals I have left in the original place as I assumed we still use them in "legacy" worker definitions. If we will just drop old worker_groups definitions then those should be moved.

@stevehipwell
Copy link
Contributor

Do we know what is blocking this?

@grzegorzlisowski
Copy link
Author

grzegorzlisowski commented May 21, 2020

I presume review and approval? I'm not sure why this "Semantic Pull Request" check is not executing

@venky999
Copy link

venky999 commented Jun 4, 2020

we can use worker_groups as a list of maps with unique key name.. instead of map so that it would be more cleaner and we can do ..

worker_groups = { for worker_group in var.worker_groups : worker_group.name => worker_group }

worker_group_configurations = {
    for k, v in local.worker_groups : k => merge(
      local.workers_group_defaults,
      v,
    ) if var.create_workers
  }
}

}

data "aws_ami" "eks_worker" {
filter {
Copy link

@venky999 venky999 Jun 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use for_each and add support for worker_node_version for each workers_group and the default value will be cluster_version if worker_node_version not specified

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by 'worker_node_version'?

@grzegorzlisowski
Copy link
Author

we can use worker_groups as a list of maps with unique key name.. instead of map so that it would be more cleaner and we can do ..

worker_groups = { for worker_group in var.worker_groups : worker_group.name => worker_group }

worker_group_configurations = {
    for k, v in local.worker_groups : k => merge(
      local.workers_group_defaults,
      v,
    ) if var.create_workers
  }
}

It could be done but I presume it might make sense to keep the same approach as for node_groups where maps are used as an input.

@grzegorzlisowski grzegorzlisowski force-pushed the feature/devops-1421 branch 3 times, most recently from bf35ebe to 8d8bd48 Compare June 24, 2020 21:42
@ZeroDeth
Copy link

ZeroDeth commented Jul 4, 2020

Error sometimes appear randomly without changing anything in code.

Error: Inconsistent conditional result types

  on .terraform/modules/eks_0/modules/worker_groups/data.tf line 5, in data "aws_iam_instance_profile" "custom_worker_group_iam_instance_profile":
   5:   for_each = var.manage_worker_iam_resources ? {} : local.worker_group_configurations
    |----------------
    | local.worker_group_configurations is object with 3 attributes
    | var.manage_worker_iam_resources is false

The true and false result expressions must have consistent types. The given
expressions are object and object, respectively.

@ZeroDeth
Copy link

ZeroDeth commented Jul 4, 2020

Error sometimes appear randomly without changing anything in code.

Error: Inconsistent conditional result types

  on .terraform/modules/eks_0/modules/worker_groups/data.tf line 5, in data "aws_iam_instance_profile" "custom_worker_group_iam_instance_profile":
   5:   for_each = var.manage_worker_iam_resources ? {} : local.worker_group_configurations
    |----------------
    | local.worker_group_configurations is object with 3 attributes
    | var.manage_worker_iam_resources is false

The true and false result expressions must have consistent types. The given
expressions are object and object, respectively.

Issues happen when adding a second group to worker_grpups

@grzegorzlisowski
Copy link
Author

@ZeroDeth
Could you share your setup so I will try to recreate the issue?

@grzegorzlisowski grzegorzlisowski force-pushed the feature/devops-1421 branch 3 times, most recently from 1cbf970 to 87edcca Compare August 3, 2020 21:40
@barryib
Copy link
Member

barryib commented Oct 6, 2020

Thanks @grzegorzlisowski for working on this. We have an terraform-aws-modules working session this friday. We'll discuss about the direction we want to take about this feature. We'll come back to you pretty soon.

@TBeijen
Copy link

TBeijen commented Jan 31, 2021

@TBeijen That was my original plan but got suggested that this interim step might not be needed and state might be just migrated: #858 (comment)
So I have dropped support for this.

Ok. Since I use Terraform via an in-house service, I think I can't modify tfstate. So probably I need to create a temp. fork supporting both mechanisms, if at all possible. Or switch to managed nodes, which we're considering anyway.

Otoh: Migrating all of the current resources locks us out of certain clean-up & simplifications in the old worker group implementation. Moving all code based on maps into a submodule could be a good moment to do so. I'm mainly thinking of sticking to the EKS-created security_group and ditching the now redundant sg created by this module. See #1196 (comment) and my follow up comment.

@barryib What's your thought on this? I know this has the risk of re-activiting long done discussions, which can slow down momentum. That's obviously not my intention so I'm fine with any answer.

@grzegorzlisowski grzegorzlisowski force-pushed the feature/devops-1421 branch 5 times, most recently from a429b61 to aa8f722 Compare February 8, 2021 01:07
@grzegorzlisowski
Copy link
Author

grzegorzlisowski commented Feb 8, 2021

@TBeijen @barryib

In the additional commit, I have added the option to migrate from old (legacy) worker groups to the new one.
It requires introducing some mess but it could be clean up when the old way of managing worker groups will be dropped.
examples/basic and examples/launch_templates contains proper examples.

The module migrates existing variables (lists) appropriately:

worker_groups -> worker_groups_legacy
worker_groups_launch_template - > worker_groups_launch_template_legacy

This way existing users could upgrade to the new module version and slowly migrate compute to the new approach with the map via:

worker_groups

Not sure if this approach is acceptable...

@stale
Copy link

stale bot commented May 9, 2021

This PR has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
To track this PR (even if closed), please open a corresponding issue if one does not already exist.

@stale stale bot added the stale label May 9, 2021
@barryib barryib removed the stale label May 9, 2021
@stevehipwell
Copy link
Contributor

@barryib is this something that is likely to happen soon?

@grzegorzlisowski grzegorzlisowski force-pushed the feature/devops-1421 branch 2 times, most recently from af59eb0 to f31281e Compare June 12, 2021 21:14
- Create separate defaults for node groups
- Workers IAM management left outside of module as both node_group and worker_groups uses them
- Add option to migrate to worker group module
@NeckBeardPrince
Copy link

Any chance for a review and merge?

@stale
Copy link

stale bot commented Oct 14, 2021

This PR has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
To track this PR (even if closed), please open a corresponding issue if one does not already exist.

@antonbabenko
Copy link
Member

This issue has been resolved in version 18.0.0 🎉

@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 11, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Newly refactored module