Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Create SG rule for each new cluster_endpoint_private_access_cidr block #1549

Merged
merged 2 commits into from
Sep 6, 2021

Conversation

lisfo4ka
Copy link
Contributor

Create separate sg rule for each new cluster_endpoint_private_access_cidr block in order to fix A duplicate Security Group rule error mentioned in #984

PR o'clock

Description

At the moment, it isn't possible to update the cluster_endpoint_private_access_cidrs variable since terraform apply fails with the error described in the #984

Initial settings:

module "eks" {
  source = "github.com/terraform-aws-modules/terraform-aws-eks.git?ref=v17.3.0"
...
  cluster_endpoint_private_access                           = true
  cluster_endpoint_private_access_cidrs                 = ["104.145.4.34/32"]
  cluster_create_endpoint_private_access_sg_rule = true

Updated settings:

module "eks" {
  source = "github.com/terraform-aws-modules/terraform-aws-eks.git?ref=v17.3.0"
...
  cluster_endpoint_private_access                           = true
  cluster_endpoint_private_access_cidrs                 = ["104.145.4.34/32", "64.114.102.2/32"]
  cluster_create_endpoint_private_access_sg_rule = true

terraform plan:

  # module.eks.aws_security_group_rule.cluster_private_access_cidrs_source[0] must be replaced
+/- resource "aws_security_group_rule" "cluster_private_access_cidrs_source" {
      ~ cidr_blocks              = [ # forces replacement
            "104.145.4.34/32",
          + "64.114.102.2/32",
        ]
      ~ id                       = "sgrule-3690564928" -> (known after apply)
      - ipv6_cidr_blocks         = [] -> null
      - prefix_list_ids          = [] -> null
      + source_security_group_id = (known after apply)
        # (7 unchanged attributes hidden)
    }

terraform apply:

module.eks.aws_security_group_rule.cluster_private_access_cidrs_source[0]: Creating...

Error: [WARN] A duplicate Security Group rule was found on (sg-00e1d0309e2738140). This may be
a side effect of a now-fixed Terraform issue causing two security groups with
identical attributes but different source_security_group_ids to overwrite each
other in the state. See https://github.com/hashicorp/terraform/pull/2376 for more
information and instructions for recovery. Error: InvalidPermission.Duplicate: the specified rule "peer: 104.145.4.34/32, TCP, from port: 443, to port: 443, ALLOW" already exists
	status code: 400, request id: ec240586-97ee-4d49-8153-4624cc3f0a41

  on .terraform/modules/eks/cluster.tf line 90, in resource "aws_security_group_rule" "cluster_private_access_cidrs_source":
  90: resource "aws_security_group_rule" "cluster_private_access_cidrs_source" {

After proposed fix applied the separate SG rules will be created for each CIDR block in a list resolving the described issue:

  # module.eks.aws_security_group_rule.cluster_private_access_cidrs_source will be destroyed
  - resource "aws_security_group_rule" "cluster_private_access_cidrs_source" {
      - cidr_blocks       = [
          - "104.145.4.34/32",
        ] -> null
      - description       = "Allow private K8S API ingress from custom CIDR source." -> null
      - from_port         = 443 -> null
      - id                = "sgrule-3690564928" -> null
      - ipv6_cidr_blocks  = [] -> null
      - prefix_list_ids   = [] -> null
      - protocol          = "tcp" -> null
      - security_group_id = "sg-00e1d0309e2738140" -> null
      - self              = false -> null
      - to_port           = 443 -> null
      - type              = "ingress" -> null
    }

  # module.eks.aws_security_group_rule.cluster_private_access_cidrs_source["104.145.4.34/32"] will be created
  + resource "aws_security_group_rule" "cluster_private_access_cidrs_source" {
      + cidr_blocks              = [
          + "104.145.4.34/32",
        ]
      + description              = "Allow private K8S API ingress from custom CIDR source."
      + from_port                = 443
      + id                       = (known after apply)
      + protocol                 = "tcp"
      + security_group_id        = "sg-00e1d0309e2738140"
      + self                     = false
      + source_security_group_id = (known after apply)
      + to_port                  = 443
      + type                     = "ingress"
    }

  # module.eks.aws_security_group_rule.cluster_private_access_cidrs_source["64.114.102.2/32"] will be created
  + resource "aws_security_group_rule" "cluster_private_access_cidrs_source" {
      + cidr_blocks              = [
          + "64.114.102.2/32",
        ]
      + description              = "Allow private K8S API ingress from custom CIDR source."
      + from_port                = 443
      + id                       = (known after apply)
      + protocol                 = "tcp"
      + security_group_id        = "sg-00e1d0309e2738140"
      + self                     = false
      + source_security_group_id = (known after apply)
      + to_port                  = 443
      + type                     = "ingress"
    }

Checklist

…cess_cidr block

In order to fix A duplicate Security Group rule error
@daroga0002
Copy link
Contributor

I have tested this PR for described scenarios (before change and then repeated on new code) and everything works as described. Migration to new module version after those changes also happening without issue and without impacting current EKS workload.

Good work @lisfo4ka 👍

@antonbabenko it can be merged

@antonbabenko
Copy link
Member

If this would be a small module with a couple of users I would merge this right away and explain how to run terraform state mv ... in a comment but this is not so easy with this module. Let's document the upgrade path and plan a bit.

I propose that we group this one together with other breaking changes and bump a new major release with the described upgrade path.

Here is one of the latest examples of a good upgrade doc created by @bryantbiggs in another module - https://github.com/terraform-aws-modules/terraform-aws-ec2-instance/blob/master/UPGRADE-3.0.md

@lisfo4ka Could you please make a draft for the upgrade guide and link it to this one?

@daroga0002 Are there any other PRs with the breaking changes?

@daroga0002
Copy link
Contributor

daroga0002 commented Sep 6, 2021

this is not a breaking change, this migration doesnt require any terraform state mv or etc. This is working without any changes and limited to a security group rules changes.

Technically it is:

  1. removing existing security group rule
  2. attach new rules (same as removed earlier)

Security group is exactly same, Kubernetes cluster is not impacted as nodes are communicating thru this security group rule.

Main purpose of cluster_private_access_cidrs_source is to open some automation/tooling/users to access a private endpoint API, potentially in worst case there is 1-2 seconds of break until security rules will be rebuild by AWS API, which as per me is acceptable.

@daroga0002
Copy link
Contributor

adding plan and execution:

Terraform will perform the following actions:

  # module.eks_cluster.aws_security_group_rule.cluster_private_access_cidrs_source will be destroyed
  - resource "aws_security_group_rule" "cluster_private_access_cidrs_source" {
      - cidr_blocks       = [
          - "192.167.0.0/16",
          - "192.168.0.0/16"
        ] -> null
      - description       = "Allow private K8S API ingress from custom CIDR source." -> null
      - from_port         = 443 -> null
      - id                = "sgrule-3759482716" -> null
      - ipv6_cidr_blocks  = [] -> null
      - prefix_list_ids   = [] -> null
      - protocol          = "tcp" -> null
      - security_group_id = "sg-04ad765483ce726b6" -> null
      - self              = false -> null
      - to_port           = 443 -> null
      - type              = "ingress" -> null
    }

  # module.eks_cluster.aws_security_group_rule.cluster_private_access_cidrs_source["192.167.0.0/16"] will be created
  + resource "aws_security_group_rule" "cluster_private_access_cidrs_source" {
      + cidr_blocks              = [
          + "192.167.0.0/16",
        ]
      + description              = "Allow private K8S API ingress from custom CIDR source."
      + from_port                = 443
      + id                       = (known after apply)
      + protocol                 = "tcp"
      + security_group_id        = "sg-04ad765483ce726b6"
      + self                     = false
      + source_security_group_id = (known after apply)
      + to_port                  = 443
      + type                     = "ingress"
    }

  # module.eks_cluster.aws_security_group_rule.cluster_private_access_cidrs_source["192.168.0.0/16"] will be created
  + resource "aws_security_group_rule" "cluster_private_access_cidrs_source" {
      + cidr_blocks              = [
          + "192.168.0.0/16",
        ]
      + description              = "Allow private K8S API ingress from custom CIDR source."
      + from_port                = 443
      + id                       = (known after apply)
      + protocol                 = "tcp"
      + security_group_id        = "sg-04ad765483ce726b6"
      + self                     = false
      + source_security_group_id = (known after apply)
      + to_port                  = 443
      + type                     = "ingress"
    }

Plan: 2 to add, 0 to change, 1 to destroy.

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

module.eks_cluster.aws_security_group_rule.cluster_private_access_cidrs_source: Destroying... [id=sgrule-3759482716]
module.eks_cluster.aws_security_group_rule.cluster_private_access_cidrs_source["192.167.0.0/16"]: Creating...
module.eks_cluster.aws_security_group_rule.cluster_private_access_cidrs_source["192.168.0.0/16"]: Creating...
module.eks_cluster.aws_security_group_rule.cluster_private_access_cidrs_source: Destruction complete after 3s
module.eks_cluster.aws_security_group_rule.cluster_private_access_cidrs_source["192.167.0.0/16"]: Creation complete after 6s [id=sgrule-3759482716]
module.eks_cluster.aws_security_group_rule.cluster_private_access_cidrs_source["192.168.0.0/16"]: Still creating... [10s elapsed]
module.eks_cluster.aws_security_group_rule.cluster_private_access_cidrs_source["192.168.0.0/16"]: Creation complete after 14s [id=sgrule-1068296333]
Releasing state lock. This may take a few moments...

Apply complete! Resources: 2 added, 0 changed, 1 destroyed.

@antonbabenko
Copy link
Member

I see! You are right, this is not as bad as renaming the whole security group.

We don't need to make an upgrade path for this then.

I will merge it now.

@antonbabenko antonbabenko changed the title fix: Create separate sg rule for each new cluster_endpoint_private_access_cidr block feat: Create SG rule for each new cluster_endpoint_private_access_cidr block Sep 6, 2021
@antonbabenko antonbabenko merged commit 7109031 into terraform-aws-modules:master Sep 6, 2021
@antonbabenko
Copy link
Member

Thanks, @lisfo4ka !

Apologies for the misunderstanding of the proposed PR. :)

v17.14.0 has been just released.

@lisfo4ka
Copy link
Contributor Author

lisfo4ka commented Sep 6, 2021

@daroga0002 @antonbabenko thanks a lot for your active involvement)

@jaimehrubiks
Copy link
Contributor

jaimehrubiks commented Sep 10, 2021

@lisfo4ka
@antonbabenko
@daroga0002

Not sure why but I think this is causing complete outage in my test clusters. I will come back with further information.

I upgraded from v17.1.0 to v17.18.0 and terraform showed me a SG duplicate error (I don't have it right now, but will try to replicate it soon) and now I can't connect to the cluster (both kubectl and terraform timeout, meaning that this change made locked me out of the cluster with security groups)

@daroga0002
Copy link
Contributor

hmm, this change is just touching a security group rules, doesnt change anything with security groups.

I tested this earlier and this was quite transparent move from v17.1 to this version.

When you will get chance please post a error and plan from your execution maybe we will be able to help you.

@jaimehrubiks
Copy link
Contributor

jaimehrubiks commented Sep 10, 2021

I fixed it with a targeted apply:

terraform apply --target=module.eks.aws_security_group_rule.cluster_private_access_cidrs_source

But, the thing is, that it failed on the first run, effectively leaving me with no security group at all.

[jhidalgo@ip-10-16-35-7 eks-ms-test-apps]$ terraform0-14-11 apply --target=module.eks.aws_security_group_rule.cluster_private_access_cidrs_source
aws_kms_key.eks: Refreshing state... [id=00b5sssssb-18b3ef359a56]
module.eks.aws_security_group.workers[0]: Refreshing state... [id=sg-09e2323cbd81d78]
module.eks.aws_security_group.cluster[0]: Refreshing state... [id=sg-0e6a823232131e492]
module.eks.aws_security_group_rule.cluster_egress_internet[0]: Refreshing state... [id=sgrule-36423239611]
module.eks.aws_security_group_rule.cluster_https_worker_ingress[0]: Refreshing state... [id=sgrule-2562323733]
module.eks.aws_eks_cluster.this[0]: Refreshing state... [id=ms-test-apps]

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # module.eks.aws_security_group_rule.cluster_private_access_cidrs_source["10.0.0.0/8"] will be created
  + resource "aws_security_group_rule" "cluster_private_access_cidrs_source" {
      + cidr_blocks              = [
          + "10.0.0.0/8",
        ]
      + description              = "Allow private K8S API ingress from custom CIDR source."
      + from_port                = 443
      + id                       = (known after apply)
      + protocol                 = "tcp"
      + security_group_id        = "sg-04442323452d"
      + self                     = false
      + source_security_group_id = (known after apply)
      + to_port                  = 443
      + type                     = "ingress"
    }

  # module.eks.aws_security_group_rule.cluster_private_access_cidrs_source["172.16.0.0/12"] will be created
  + resource "aws_security_group_rule" "cluster_private_access_cidrs_source" {
      + cidr_blocks              = [
          + "172.16.0.0/12",
        ]
      + description              = "Allow private K8S API ingress from custom CIDR source."
      + from_port                = 443
      + id                       = (known after apply)
      + protocol                 = "tcp"
      + security_group_id        = "sg-0444232343452d"
      + self                     = false
      + source_security_group_id = (known after apply)
      + to_port                  = 443
      + type                     = "ingress"
    }

  # module.eks.aws_security_group_rule.cluster_private_access_cidrs_source["192.168.0.0/16"] will be created
  + resource "aws_security_group_rule" "cluster_private_access_cidrs_source" {
      + cidr_blocks              = [
          + "192.168.0.0/16",
        ]
      + description              = "Allow private K8S API ingress from custom CIDR source."
      + from_port                = 443
      + id                       = (known after apply)
      + protocol                 = "tcp"
      + security_group_id        = "sg-0444d232343452d"
      + self                     = false
      + source_security_group_id = (known after apply)
      + to_port                  = 443
      + type                     = "ingress"
    }

Plan: 3 to add, 0 to change, 0 to destroy.


Warning: Resource targeting is in effect

You are creating a plan with the -target option, which means that the result
of this plan may not represent all of the changes requested by the current
configuration.

The -target option is not for routine use, and is provided only for
exceptional situations such as recovering from errors or mistakes, or when
Terraform specifically suggests to use it as part of an error message.

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

module.eks.aws_security_group_rule.cluster_private_access_cidrs_source["172.16.0.0/12"]: Creating...
module.eks.aws_security_group_rule.cluster_private_access_cidrs_source["192.168.0.0/16"]: Creating...
module.eks.aws_security_group_rule.cluster_private_access_cidrs_source["10.0.0.0/8"]: Creating...
module.eks.aws_security_group_rule.cluster_private_access_cidrs_source["192.168.0.0/16"]: Creation complete after 0s [id=sgrule-4223232572]
module.eks.aws_security_group_rule.cluster_private_access_cidrs_source["172.16.0.0/12"]: Creation complete after 1s [id=sgrule-3302323622]
module.eks.aws_security_group_rule.cluster_private_access_cidrs_source["10.0.0.0/8"]: Creation complete after 2s [id=sgrule-349723236]

I'll replicate it from zero whenever I can. Just wanted to put it here meanwhile.
Sadly my terminal lost the initial error message, but I think it was one of those duplicate entry in security groups or similar

@jaimehrubiks
Copy link
Contributor

I can not replicate the issue right now... !

When I do a full upgrade from scratch I'll let you know if I see the issue again.

Else, in case it happens randomly, the above targeted apply solution might help someone in the future.

@jaimehrubiks
Copy link
Contributor

jaimehrubiks commented Sep 30, 2021

I tried a fresh upgrade again:

Terraform will perform the following actions:

  # module.eks.data.http.wait_for_cluster[0] will be read during apply
  # (config refers to values not yet known)
 <= data "http" "wait_for_cluster"  {
      ~ body             = "ok" -> (known after apply)
      ~ id               = "https://sdafasdfsadf.sk1.us-east-1.eks.amazonaws.com/healthz" -> (known after apply)
      - insecure         = false -> null
      ~ response_headers = {
          - "Cache-Control"          = "no-cache, private"
          - "Content-Length"         = "2"
          - "Content-Type"           = "text/plain; charset=utf-8"
          - "Date"                   = "Thu, 30 Sep 2021 14:45:12 GMT"
          - "X-Content-Type-Options" = "nosniff"
        } -> (known after apply)
        # (3 unchanged attributes hidden)
    }

  # module.eks.aws_security_group_rule.cluster_private_access_cidrs_source will be destroyed
  - resource "aws_security_group_rule" "cluster_private_access_cidrs_source" {
      - cidr_blocks       = [
          - "10.0.0.0/8",
          - "172.16.0.0/12",
          - "192.168.0.0/16",
        ] -> null
      - description       = "Allow private K8S API ingress from custom CIDR source." -> null
      - from_port         = 443 -> null
      - id                = "sgrule-91sdfa596104" -> null
      - ipv6_cidr_blocks  = [] -> null
      - prefix_list_ids   = [] -> null
      - protocol          = "tcp" -> null
      - security_group_id = "sg-asdf" -> null
      - self              = false -> null
      - to_port           = 443 -> null
      - type              = "ingress" -> null
    }

  # module.eks.aws_security_group_rule.cluster_private_access_cidrs_source["10.0.0.0/8"] will be created
  + resource "aws_security_group_rule" "cluster_private_access_cidrs_source" {
      + cidr_blocks              = [
          + "10.0.0.0/8",
        ]
      + description              = "Allow private K8S API ingress from custom CIDR source."
      + from_port                = 443
      + id                       = (known after apply)
      + protocol                 = "tcp"
      + security_group_id        = "sg-asdfasdf"
      + self                     = false
      + source_security_group_id = (known after apply)
      + to_port                  = 443
      + type                     = "ingress"
    }

  # module.eks.aws_security_group_rule.cluster_private_access_cidrs_source["172.16.0.0/12"] will be created
  + resource "aws_security_group_rule" "cluster_private_access_cidrs_source" {
      + cidr_blocks              = [
          + "172.16.0.0/12",
        ]
      + description              = "Allow private K8S API ingress from custom CIDR source."
      + from_port                = 443
      + id                       = (known after apply)
      + protocol                 = "tcp"
      + security_group_id        = "sg-asdfasdf"
      + self                     = false
      + source_security_group_id = (known after apply)
      + to_port                  = 443
      + type                     = "ingress"
    }

  # module.eks.aws_security_group_rule.cluster_private_access_cidrs_source["192.168.0.0/16"] will be created
  + resource "aws_security_group_rule" "cluster_private_access_cidrs_source" {
      + cidr_blocks              = [
          + "192.168.0.0/16",
        ]
      + description              = "Allow private K8S API ingress from custom CIDR source."
      + from_port                = 443
      + id                       = (known after apply)
      + protocol                 = "tcp"
      + security_group_id        = "sg-asdfasdf"
      + self                     = false
      + source_security_group_id = (known after apply)
      + to_port                  = 443
      + type                     = "ingress"
    }

Result

module.eks.aws_security_group_rule.cluster_private_access_cidrs_source["192.168.0.0/16"]: Creating...
module.eks.aws_security_group_rule.cluster_private_access_cidrs_source["172.16.0.0/12"]: Creating...
module.eks.aws_security_group_rule.cluster_private_access_cidrs_source["10.0.0.0/8"]: Creating...
module.eks.module.node_groups.aws_launch_template.workers["elasticsearch-b"]: Modifying... [id=lt-sdf]
module.eks.module.node_groups.aws_launch_template.workers["infrastructure"]: Modifying... [id=lt-asdf]
module.eks.module.node_groups.aws_launch_template.workers["general_purpose"]: Modifying... [id=lt-asdf]
module.eks.module.node_groups.aws_launch_template.workers["elasticsearch-a"]: Modifying... [id=lt-asdf]
module.eks.module.node_groups.aws_launch_template.workers["elasticsearch-a"]: Modifications complete after 1s [id=lt-asdf]
module.eks.module.node_groups.aws_launch_template.workers["general_purpose"]: Modifications complete after 1s [id=lt-asdf]
module.eks.module.node_groups.aws_launch_template.workers["elasticsearch-b"]: Modifications complete after 1s [id=lt-asdf]
module.eks.module.node_groups.aws_launch_template.workers["infrastructure"]: Modifications complete after 1s [id=lt-asdf]

Warning: Applied changes may be incomplete

The plan was created with the -target option in effect, so some changes
requested in the configuration may have been ignored and the output values may
not be fully updated. Run the following command to verify that no other
changes are pending:
    terraform plan

Note that the -target option is not suitable for routine use, and is provided
only for exceptional situations such as recovering from errors or mistakes, or
when Terraform specifically suggests to use it as part of an error message.


Error: [WARN] A duplicate Security Group rule was found on (sg-asdfasdfdf). This may be
a side effect of a now-fixed Terraform issue causing two security groups with
identical attributes but different source_security_group_ids to overwrite each
other in the state. See https://github.com/hashicorp/terraform/pull/2376 for more
information and instructions for recovery. Error: InvalidPermission.Duplicate: the specified rule "peer: 172.16.0.0/12, TCP, from port: 443, to port: 443, ALLOW" already exists
        status code: 400, request id: 61bfd6a9-2bbc-45a2-a91a-eb752fc7e8ad

  on .terraform/modules/eks/main.tf line 102, in resource "aws_security_group_rule" "cluster_private_access_cidrs_source":
 102: resource "aws_security_group_rule" "cluster_private_access_cidrs_source" {



Error: [WARN] A duplicate Security Group rule was found on (sg-aasdfsdf). This may be
a side effect of a now-fixed Terraform issue causing two security groups with
identical attributes but different source_security_group_ids to overwrite each
other in the state. See https://github.com/hashicorp/terraform/pull/2376 for more
information and instructions for recovery. Error: InvalidPermission.Duplicate: the specified rule "peer: 192.168.0.0/16, TCP, from port: 443, to port: 443, ALLOW" already exists
        status code: 400, request id: 06cbc724-07d1-4287-aaad-c98253b2ce27

  on .terraform/modules/eks/main.tf line 102, in resource "aws_security_group_rule" "cluster_private_access_cidrs_source":
 102: resource "aws_security_group_rule" "cluster_private_access_cidrs_source" {



Error: [WARN] A duplicate Security Group rule was found on (sg-asdfasdf). This may be
a side effect of a now-fixed Terraform issue causing two security groups with
identical attributes but different source_security_group_ids to overwrite each
other in the state. See https://github.com/hashicorp/terraform/pull/2376 for more
information and instructions for recovery. Error: InvalidPermission.Duplicate: the specified rule "peer: 10.0.0.0/8, TCP, from port: 443, to port: 443, ALLOW" already exists
        status code: 400, request id: cddf8a4d-b987-471c-a3f9-e9976b0f2ff2

  on .terraform/modules/eks/main.tf line 102, in resource "aws_security_group_rule" "cluster_private_access_cidrs_source":
 102: resource "aws_security_group_rule" "cluster_private_access_cidrs_source" {

And again, I could fix it with a targeted apply

terraform0-14-11 apply --target=module.eks.aws_security_group_rule.cluster_private_access_cidrs_source

@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 12, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants