Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autoscale and masters node break the terraform plan when you have more than 6 nodes #651

Closed
4 tasks done
kuisathaverat opened this issue May 16, 2023 · 2 comments
Closed
4 tasks done
Labels
bug Something isn't working

Comments

@kuisathaverat
Copy link
Contributor

kuisathaverat commented May 16, 2023

An ESS deployment configured for autoscaling will not apply the terraform plan when it reaches the 6 nodes in the cluster.

Readiness Checklist

  • I am running the latest version
  • I checked the documentation and found no answer
  • I checked to make sure that this issue has not already been filed
  • I am reporting the issue to the correct repository (for multi-repository projects)

Expected Behavior

To not have to update the terraform plan manually to add master nodes

Current Behavior

The original plan is no longer valid and you have to modify the plan to include master nodes.

## Terraform definition

variable "ess_apikey" {
  type = string
}

terraform {
  required_version = ">= 0.12.29"

  required_providers {
    ec = {
      source  = "elastic/ec"
      version = "0.7.0"
    }
  }
}

provider "ec" {
  endpoint = "https://cloud.elastic.co"
  insecure = true
  apikey = var.ess_apikey
  verbose = true
}

resource "ec_deployment" "main" {
  name = "release-oblt"
  region                 = "gcp-us-west2"
  version                = "8.8.0"
  deployment_template_id = "gcp-io-optimized-v3"
  alias                  = "release-oblt"

  elasticsearch = {
    autoscale = true
    hot = {
      autoscaling = {
        max_size = "64g"
      }
      zone_count = 3
    }
    ml = {
      autoscaling = {
        max_size = "64g"
      }
      zone_count = 1
    }
    warm = {
      autoscaling = {
        max_size = "64g"
      }
      zone_count = 3
    }
    cold = {
      autoscaling = {
        max_size = "64g"
      }
      zone_count = 3
    }
  }

  integrations_server = {
    size = "2g"
    zone_count = 1
  }
  kibana = {
    size = "4g"
    zone_count = 1
  }
}

Steps to Reproduce

  1. Create a cluster with the terraform file provided
  2. Ingest data to scale the cluster up to 6 nodes. Configuring ILM to have data on hot, warm, and cold tiers will be enough to have 6 nodes.
  3. Change the terraform file to use higher autoscale memory for any of the tiers
  4. try to apply the plan, it will fail
fatal: [localhost]: FAILED! => changed=false 
  cmd: /usr/local/bin/terraform apply -no-color -input=false -auto-approve -lock=true /tmp/tmpjwuuy4hu.tfplan
  msg: |2-
  
    Error: failed updating deployment
  
      with ec_deployment.main,
      on main.tf line 33, in resource "ec_deployment" "main":
      33: resource "ec_deployment" "main" {
  
    api error: 2 errors occurred:
            * cluster.missing_dedicated_master: Deployment template [I/O Optimized]
    requires a dedicated master after [6] nodes. Found [8] nodes in the
    deployment (resources.elasticsearch[0])
            * clusters.cluster_invalid_plan: Cluster must contain at least a master
    topology element and a data topology element. 'master' node type is
    missing,'master' node type exists in more than one topology element
    (resources.elasticsearch[0].cluster_topology)
  rc: 1
  stderr: |2-
  
    Error: failed updating deployment
  
      with ec_deployment.main,
      on main.tf line 33, in resource "ec_deployment" "main":
      33: resource "ec_deployment" "main" {
  
    api error: 2 errors occurred:
            * cluster.missing_dedicated_master: Deployment template [I/O Optimized]
    requires a dedicated master after [6] nodes. Found [8] nodes in the
    deployment (resources.elasticsearch[0])
            * clusters.cluster_invalid_plan: Cluster must contain at least a master
    topology element and a data topology element. 'master' node type is
    missing,'master' node type exists in more than one topology element
    (resources.elasticsearch[0].cluster_topology)
  stderr_lines: <omitted>
  stdout: |-
    ec_deployment.main: Modifying... [id=1111111111111111111111111111111111]
  stdout_lines: <omitted>

To fix the issue you have to modify the terraform file to include master nodes

variable "ess_apikey" {
  type = string
}

terraform {
  required_version = ">= 0.12.29"

  required_providers {
    ec = {
      source  = "elastic/ec"
      version = "0.7.0"
    }
  }
}

provider "ec" {
  endpoint = "https://cloud.elastic.co"
  insecure = true
  apikey = var.ess_apikey
  verbose = true
}

resource "ec_deployment" "main" {
  name = "release-oblt"
  region                 = "gcp-us-west2"
  version                = "8.8.0"
  deployment_template_id = "gcp-io-optimized-v3"
  alias                  = "release-oblt"

  elasticsearch = {
    autoscale = true
    hot = {
      autoscaling = {
        max_size = "64g"
      }
      zone_count = 3
    }
    master = {
      autoscaling = {}
      size = "8g"
      zone_count = 3
    }
    ml = {
      autoscaling = {
        max_size = "64g"
      }
      zone_count = 1
    }
    warm = {
      autoscaling = {
        max_size = "64g"
      }
      zone_count = 3
    }
    cold = {
      autoscaling = {
        max_size = "64g"
      }
      zone_count = 3
    }
  }

  integrations_server = {
    size = "2g"
    zone_count = 1
  }
  kibana = {
    size = "4g"
    zone_count = 1
  }
}

Context

This breaks any automation, it is impossible to apply the same plan several times, and force to have something outside of terraform that updates the plan when there are more than six nodes. It is not possible to add the master nodes in the first place because you have the opposite error, you have less than six nodes so you can not provide master nodes.

Possible Solution

Not having to care about master nodes, is something ESS needs, not something I should care about it, like the ESS UI does.

#468

Check number_of_data_nodes to enable master nodes or not could make the thing

GET _cluster/health

{
  "cluster_name": "11111111111111111111111111",
  "status": "green",
  "timed_out": false,
  "number_of_nodes": 10,
  "number_of_data_nodes": 6,
  "active_primary_shards": 7075,
  "active_shards": 11991,
  "relocating_shards": 0,
  "initializing_shards": 0,
  "unassigned_shards": 0,
  "delayed_unassigned_shards": 0,
  "number_of_pending_tasks": 0,
  "number_of_in_flight_fetch": 0,
  "task_max_waiting_in_queue_millis": 0,
  "active_shards_percent_as_number": 100
}

Your Environment

  • Version used: 0.7.0
  • Running against Elastic Cloud SaaS or Elastic Cloud Enterprise and version: Running against Elastic Cloud SaaS
  • Environment name and version (e.g. Go 1.9):
  • Server type and version:
  • Operating System and version:
  • Link to your project:
@kuisathaverat kuisathaverat added the bug Something isn't working label May 16, 2023
@shermanericts
Copy link

shermanericts commented Jun 2, 2023

Hmm.
As @kuisathaverat says you get somewhat hosed.
I can get into a situation where I try to add the master node I get this

│ Error: failed updating deployment
│
│   with module.customer_env.module.elastic.ec_deployment.customer,
│   on .terraform/modules/customer_env.elastic/deployment.tf line 31, in resource "ec_deployment" "customer":
│   31: resource "ec_deployment" "customer" {
│
│ api error: 1 error occurred:
│ 	* cluster.dedicated_master_prohibited: Deployment template [General purpose] requires at least [6] nodes before dedicated master can be specified. Found only [3] nodes in the deployment (resources.elasticsearch[0])
│
│

If I change the master section of the deployment to 0g, the provider complains of no master block found. I'm not sure what to do in this situation.

│ Error: failed updating deployment
│
│   with module.customer_env.module.elastic.ec_deployment.customer,
│   on ../terraform-customer-elastic/deployment.tf line 31, in resource "ec_deployment" "customer":
│   31: resource "ec_deployment" "customer" {
│
│ api error: 1 error occurred:
│ 	* clusters.cluster_invalid_plan: Cluster must contain at least a master topology element and a data topology
│ element. 'master' node type is missing,'master' node type exists in more than one topology element
│ (resources.elasticsearch[0].cluster_topology)
│
│
╵

I'm not sure how I recover at this point except for removing the elastic deployment from the state and re-importing the elastic deployment. In my case, I kept my zone count at 2 for each tier and I'm ok for the moment.

May relate to #635


Last Edit: What I wound up having to do (which is what I think was said initially) is leverage other means of lowering the zone count outside of Terraform until the master node was not in play anymore (automatically removed by Elastic Cloud) and then I was able to drive the terraform plan/apply sequence.

@tobio
Copy link
Member

tobio commented Aug 3, 2023

Duplicates #635

@tobio tobio closed this as completed Aug 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants