Timeout on initial apply on wait_for_cluster null_resource, then inconsistent final plan on re-apply #990

glueric · 2020-08-27T21:12:02Z

I have issues

module.dev-eks-module.null_resource.wait_for_cluster[0] (local-exec): TIMEOUT


Error: Error running command 'for i in `seq 1 60`; do wget --no-check-certificate -O - -q $ENDPOINT/healthz >/dev/null && exit 0 || true; sleep 5; done; echo TIMEOUT && exit 1': exit status 1. Output: TIMEOUT

I'm submitting a...

bug report
feature request
support request - read the FAQ first!
kudos, thank you, warm fuzzy

What is the current behavior?

When I apply the module, seemingly everything gets created, but the wait_for_cluster null resource times out. After checking the cluster on EKS, the node group wasn't created. When I re-apply the module, I get an 'Inconsistent final plan' error:

Terraform will perform the following actions:

  # module.dev-eks-module.kubernetes_config_map.aws_auth[0] will be created
  + resource "kubernetes_config_map" "aws_auth" {
      + data = {
          + "mapAccounts" = jsonencode([])
          + "mapRoles"    = <<~EOT
                - "groups":
                  - "system:bootstrappers"
                  - "system:nodes"
                  "rolearn": "arn:aws:iam::519026510774:role/dev-eks-module20200827205656039700000007"
                  "username": "system:node:{{EC2PrivateDNSName}}"
            EOT
          + "mapUsers"    = jsonencode([])
        }
      + id   = (known after apply)

      + metadata {
          + generation       = (known after apply)
          + name             = "aws-auth"
          + namespace        = "kube-system"
          + resource_version = (known after apply)
          + self_link        = (known after apply)
          + uid              = (known after apply)
        }
    }

  # module.dev-eks-module.null_resource.wait_for_cluster[0] is tainted, so must be replaced
+/- resource "null_resource" "wait_for_cluster" {
      ~ id = "145238471061234140" -> (known after apply)
    }

Plan: 2 to add, 0 to change, 1 to destroy.

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes


Error: Provider produced inconsistent final plan

When expanding the plan for
module.dev-eks-module.null_resource.wait_for_cluster[0] to include new values
learned so far during apply, provider "registry.terraform.io/hashicorp/null"
changed the planned action from CreateThenDelete to DeleteThenCreate.

If this is a bug, how to reproduce? Please include a code sample if relevant.

I just copied the sample code from the readme but updated it with the version locking from the latest basic sample.

terraform {
  required_version = ">= 0.12.0"
}

provider "aws" {
  version = ">= 2.28.1"
  region  = var.aws_region
}

provider "random" {
  version = "~> 2.1"
}

provider "local" {
  version = "~> 1.2"
}

provider "null" {
  version = "~> 2.1"
}

provider "template" {
  version = "~> 2.1"
}

data "aws_eks_cluster" "cluster" {
  name = module.dev-eks-module.cluster_id
}

data "aws_eks_cluster_auth" "cluster" {
  name = module.dev-eks-module.cluster_id
}

provider "kubernetes" {
  host                   = data.aws_eks_cluster.cluster.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority.0.data)
  token                  = data.aws_eks_cluster_auth.cluster.token
  load_config_file       = false
  version                = "~> 1.11"
}

module "dev-eks-module" {
  source          = "terraform-aws-modules/eks/aws"
  version         = "v12.2.0"
  cluster_name    = "dev-eks-module"
  cluster_version = "1.17"
  subnets         = ["subnet-4abbbf66", "subnet-9599a0cf", "subnet-c977b382"]
  vpc_id          = "vpc-b6a006d0"

  worker_groups = [
    {
      instance_type = "t3.medium"
      asg_max_size  = 5
    }
  ]
}

What's the expected behavior?

The cluster should get created properly with a node group and terraform should apply successfully.

Are you able to fix this problem and submit a PR? Link here if you have already.

Environment details

Affected module version: v12.2.0
OS: MacOS Mojave 10.14.6
Terraform version: v0.13.0

Any other relevant info

The text was updated successfully, but these errors were encountered:

dpiddockcmp · 2020-08-28T08:18:10Z

There's a Terraform bug here that we're triggering in a few ways. This is the same bug as #939 but that had a logical fix. This resource should not be being tagged as create before destroy by Terraform. Not sure if related to #984 as that's also happening to TF 0.12.29.

In your particular case, does dropping the existing null resource allow the plan to finish applying?
terraform state rm module.dev-eks-module.null_resource.wait_for_cluster[0]

glueric · 2020-08-28T14:21:44Z

I deleted the existing null resource and ran an apply again:

module.dev-eks-module.null_resource.wait_for_cluster[0]: Still creating... [5m0s elapsed]
module.dev-eks-module.null_resource.wait_for_cluster[0] (local-exec): TIMEOUT


Error: Error running command 'for i in `seq 1 60`; do wget --no-check-certificate -O - -q $ENDPOINT/healthz >/dev/null && exit 0 || true; sleep 5; done; echo TIMEOUT && exit 1': exit status 1. Output: TIMEOUT

I wonder if there is some problem with my configuration? I specified only private subnets in the subnet var, as that is what I saw the example doing. Should I provide my public subnets as well?

ayush-sharma-devops · 2020-08-31T15:20:22Z

I have the same problem:

module.my-cluster.null_resource.wait_for_cluster[0]: Still creating... [27m50s elapsed]
^CInterrupt received.
Please wait for Terraform to exit or data loss may occur.
Gracefully shutting down...
Stopping operation...


Error: Error running command 'for i in `seq 1 60`; do wget --no-check-certificate -O - -q $ENDPOINT/healthz >/dev/null && exit 0 || true; sleep 5; done; echo TIMEOUT && exit 1': signal: interrupt. Output:

dpiddockcmp · 2020-08-31T15:39:38Z

What happens when you run the wget --no-check-certificate -O - -q $ENDPOINT/healthz command manually from your deployment environment?

glueric · 2020-08-31T16:20:31Z

Hmm it doesn't print anything. It seems like there's nothing in the $ENDPOINT var.

to-m-mbperic:dev-eks-module eric.rosendale$ wget --no-check-certificate -O - -q $ENDPOINT/healthz
to-m-mbperic:dev-eks-module eric.rosendale$ echo $ENDPOINT

to-m-mbperic:dev-eks-module eric.rosendale$

dpiddockcmp · 2020-08-31T16:44:11Z

You need to set ENDPOINT to the your kubernetes API endpoint's address when testing manually.

glueric · 2020-08-31T17:23:18Z

Ok after setting the endpoint var, I also realized the -q flag means no output so I wouldn't see the response anyway. So I ran it again and it seems like it's failing to get an SSL connection:

OpenSSL: error:1407742E:SSL routines:SSL23_GET_SERVER_HELLO:tlsv1 alert protocol version
Unable to establish SSL connection.

dpiddockcmp · 2020-09-01T08:25:01Z

Your wget is too old. Either update the environment you're running or switch to curl by changing wait_for_cluster_cmd

ayush-sharma-devops · 2020-09-01T08:59:34Z

My issue was resolved. Turns out it was a networking issue for me. My execution environment couldn't reach the master on the CIDRs that I configured. As a test, I configured my cluster as open to public and everything worked as expected. I then sorted out the networking between my execution environment and the cluster and everything works.

glueric · 2020-09-01T13:53:31Z

That was it! Thank you! I updated wget to v 1.20.3 and the terraform apply was able to finish.

github-actions · 2022-11-25T02:20:59Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

glueric closed this as completed Sep 1, 2020

github-actions bot locked as resolved and limited conversation to collaborators Nov 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Timeout on initial apply on wait_for_cluster null_resource, then inconsistent final plan on re-apply #990

Timeout on initial apply on wait_for_cluster null_resource, then inconsistent final plan on re-apply #990

glueric commented Aug 27, 2020 •

edited

Loading

dpiddockcmp commented Aug 28, 2020

glueric commented Aug 28, 2020

ayush-sharma-devops commented Aug 31, 2020

dpiddockcmp commented Aug 31, 2020

glueric commented Aug 31, 2020

dpiddockcmp commented Aug 31, 2020

glueric commented Aug 31, 2020

dpiddockcmp commented Sep 1, 2020

ayush-sharma-devops commented Sep 1, 2020

glueric commented Sep 1, 2020

github-actions bot commented Nov 25, 2022

Timeout on initial apply on wait_for_cluster null_resource, then inconsistent final plan on re-apply #990

Timeout on initial apply on wait_for_cluster null_resource, then inconsistent final plan on re-apply #990

Comments

glueric commented Aug 27, 2020 • edited Loading

I have issues

I'm submitting a...

What is the current behavior?

If this is a bug, how to reproduce? Please include a code sample if relevant.

What's the expected behavior?

Are you able to fix this problem and submit a PR? Link here if you have already.

Environment details

Any other relevant info

dpiddockcmp commented Aug 28, 2020

glueric commented Aug 28, 2020

ayush-sharma-devops commented Aug 31, 2020

dpiddockcmp commented Aug 31, 2020

glueric commented Aug 31, 2020

dpiddockcmp commented Aug 31, 2020

glueric commented Aug 31, 2020

dpiddockcmp commented Sep 1, 2020

ayush-sharma-devops commented Sep 1, 2020

glueric commented Sep 1, 2020

github-actions bot commented Nov 25, 2022

glueric commented Aug 27, 2020 •

edited

Loading