Cluster does not recreate or reattach node pools if a change requires cluster respin. #9220

huang-jy · 2021-05-24T11:55:59Z

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request.
Please do not leave +1 or me too comments, they generate extra noise for issue followers and do not help prioritize the request.
If you are interested in working on this issue or have submitted a pull request, please leave a comment.
If an issue is assigned to the modular-magician user, it is either in the process of being autogenerated, or is planned to be autogenerated soon. If an issue is assigned to a user, that user is claiming responsibility for the issue. If an issue is assigned to hashibot, a community member has claimed the issue already.

Terraform Version

Terraform v0.15.4
on linux_amd64
+ provider registry.terraform.io/hashicorp/google v3.68.0
+ provider registry.terraform.io/hashicorp/google-beta v3.68.0

Affected Resource(s)

google_container_cluster

Terraform Configuration Files

(Please see notes later)

resource "google_service_account" "terraform-sa" {
  account_id   = "terraform-gke"
  display_name = "Service Account For Terraform To Make GKE Cluster"
}

resource "google_container_cluster" "cluster" {
  name               = "bug-report"
  location           = var.zone
  min_master_version = var.cluster_version
  project            = var.project

  node_config {
    preemptible  = true
    machine_type = "e2-micro"

    # Google recommends custom service accounts that have cloud-platform scope and permissions granted via IAM Roles.
    service_account = google_service_account.terraform-sa.email
    oauth_scopes = [
      "https://www.googleapis.com/auth/cloud-platform"
    ]

    metadata = {
      disable-legacy-endpoints = "true"
    }

  }


  # We can't create a cluster with no node pool defined, but we want to only use
  # separately managed node pools. So we create the smallest possible default
  # node pool and immediately delete it.
  remove_default_node_pool = true
  initial_node_count       = 1

  addons_config {
    horizontal_pod_autoscaling {
      disabled = false
    }
    http_load_balancing {
      disabled = false
    }
  }

  maintenance_policy {
    daily_maintenance_window {
      start_time = "00:00"
    }
  }

  release_channel {
    channel = "UNSPECIFIED"
  }

  vertical_pod_autoscaling {
    enabled = true
  }


  workload_identity_config {
    identity_namespace = "${var.project}.svc.id.goog"
  }
}

resource "google_container_node_pool" "nodes" {
  name       = "general-purpose"
  location   = var.zone
  project    = var.project
  cluster    = google_container_cluster.cluster.name
  node_count = 1
  autoscaling {
    min_node_count = 1
    max_node_count = 10
  }
  management {
    ## Both must be true on STABLE channel
    auto_repair  = false
    auto_upgrade = false
  }

  # version = var.cluster_version

  node_config {
    preemptible  = false
    machine_type = "e2-standard-4"

    # Google recommends custom service accounts that have cloud-platform scope and permissions granted via IAM Roles.
    service_account = google_service_account.terraform-sa.email
    oauth_scopes = [
      "https://www.googleapis.com/auth/cloud-platform"
    ]

    metadata = {
      disable-legacy-endpoints = "true"
    }

  }

  upgrade_settings {
    max_surge       = 1 ## Only allow one extra node during upgrade
    max_unavailable = 0 ## Don't take out any curren nodes until a new one comes good
  }

  lifecycle {
    ignore_changes = [
      # Ignore changes to initial_node_count,
      # otherwise node pool will be recreated if there is drift between initial_node_count
      # in the environment and terraform
      initial_node_count,
      node_count,
      version
    ]
  }


}

provider "google" {
  project     = var.project
  region      = var.region
  zone        = var.zone
  credentials = file("credentials.json")
}

provider "google-beta" {
  project     = var.project
  region      = var.region
  zone        = var.zone
  credentials = file("credentials.json")
}

variable "project" {
  default = "REPLACE_ME"
}

variable "region" {
  default = "europe-west2"
}

variable "zone" {
  default = "europe-west2-a"
}

variable "cluster_version" {
  default = "1.19.9-gke.1900"
}

Debug Output

Panic Output

Expected Behavior

Terraform should create/recreate or reattach the node pool

Actual Behavior

Terraform deleted the cluster and node pool, respun the cluster, but did not (re)create the node pool

Steps to Reproduce

Save the files above
terraform plan -out tfplan and terraform apply tfplan
Check in console, and you will have a cluster with nodes. All good so far ✔️
Run terraform plan -out tfplan again, you will see the cluster will be respun (the tf files are structured to induce a cluster respin each time to show this issue)
Run terraform apply tfplan. Terraform will delete the cluster, but does not indicate the node pools are being deleted (they are)
The cluster respins, but no node-pools are added to the cluster, giving a zero-node cluster
Run terraform plan -out tfplan and terraform apply tfplan
Terraform will delete and recreate the cluster again, and this time, it will create and attach a new node pool to the cluster

Important Factoids

This problem appears to stem on the fact there does not appear to be a link between the cluster and the node pools. A change that induces the cluster to delete and be recreated should either also induce the node-pool to delete and be recreated, or the node pool should automatically be reconnected to the new cluster after creation.

References

#0000

The text was updated successfully, but these errors were encountered:

melinath · 2021-06-03T21:07:12Z

I think this is a consequence of the behavior described in hashicorp/terraform#24663. We could allow a workaround here by supporting cluster = google_container_cluster.cluster.id (currently doesn't work.)

huang-jy · 2021-06-03T21:15:11Z

Yes indeed it does sound similar to the example referenced in that ticket. Does it sound like this might be a TF core issue or something in the Google provider?

melinath · 2021-06-03T21:20:54Z

From the provider side, I think we can make it so that people can supply google_container_cluster.cluster.id to the node pool to get the behavior you're looking for. Being able to get that behavior with google_container_cluster.cluster.name would definitely be a core issue.

huang-jy · 2021-06-03T21:23:15Z

Thank you

github-actions · 2021-07-05T02:05:46Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

huang-jy added the bug label May 24, 2021

venkykuberan assigned venkykuberan and melinath and unassigned venkykuberan May 25, 2021

melinath mentioned this issue Jun 4, 2021

Allow specifying node pool cluster as a long-form id GoogleCloudPlatform/magic-modules#4842

Merged

5 tasks

melinath closed this as completed in GoogleCloudPlatform/magic-modules#4842 Jun 4, 2021

This was referenced Jun 4, 2021

Allow specifying node pool cluster as a long-form id hashicorp/terraform-provider-google-beta#3314

Merged

Allow specifying node pool cluster as a long-form id #9309

Merged

github-actions bot locked as resolved and limited conversation to collaborators Jul 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cluster does not recreate or reattach node pools if a change requires cluster respin. #9220

Cluster does not recreate or reattach node pools if a change requires cluster respin. #9220

huang-jy commented May 24, 2021 •

edited

Loading

melinath commented Jun 3, 2021 •

edited

Loading

huang-jy commented Jun 3, 2021

melinath commented Jun 3, 2021

huang-jy commented Jun 3, 2021

github-actions bot commented Jul 5, 2021

Cluster does not recreate or reattach node pools if a change requires cluster respin. #9220

Cluster does not recreate or reattach node pools if a change requires cluster respin. #9220

Comments

huang-jy commented May 24, 2021 • edited Loading

Community Note

Terraform Version

Affected Resource(s)

Terraform Configuration Files

Debug Output

Panic Output

Expected Behavior

Actual Behavior

Steps to Reproduce

Important Factoids

References

melinath commented Jun 3, 2021 • edited Loading

huang-jy commented Jun 3, 2021

melinath commented Jun 3, 2021

huang-jy commented Jun 3, 2021

github-actions bot commented Jul 5, 2021

huang-jy commented May 24, 2021 •

edited

Loading

melinath commented Jun 3, 2021 •

edited

Loading