Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix local_ssd_config issue that forces node-pool recreation #2968

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions modules/compute/gke-node-pool/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -276,8 +276,8 @@ No modules.
| <a name="input_image_type"></a> [image\_type](#input\_image\_type) | The default image type used by NAP once a new node pool is being created. Use either COS\_CONTAINERD or UBUNTU\_CONTAINERD. | `string` | `"COS_CONTAINERD"` | no |
| <a name="input_kubernetes_labels"></a> [kubernetes\_labels](#input\_kubernetes\_labels) | Kubernetes labels to be applied to each node in the node group. Key-value pairs. <br>(The `kubernetes.io/` and `k8s.io/` prefixes are reserved by Kubernetes Core components and cannot be specified) | `map(string)` | `null` | no |
| <a name="input_labels"></a> [labels](#input\_labels) | GCE resource labels to be applied to resources. Key-value pairs. | `map(string)` | n/a | yes |
| <a name="input_local_ssd_count_ephemeral_storage"></a> [local\_ssd\_count\_ephemeral\_storage](#input\_local\_ssd\_count\_ephemeral\_storage) | The number of local SSDs to attach to each node to back ephemeral storage.<br>Uses NVMe interfaces. Must be supported by `machine_type`.<br>When set to null, GKE decides about default value.<br>[See above](#local-ssd-storage) for more info. | `number` | `null` | no |
| <a name="input_local_ssd_count_nvme_block"></a> [local\_ssd\_count\_nvme\_block](#input\_local\_ssd\_count\_nvme\_block) | The number of local SSDs to attach to each node to back block storage.<br>Uses NVMe interfaces. Must be supported by `machine_type`.<br>When set to null, GKE decides about default value.<br>[See above](#local-ssd-storage) for more info. | `number` | `null` | no |
| <a name="input_local_ssd_count_ephemeral_storage"></a> [local\_ssd\_count\_ephemeral\_storage](#input\_local\_ssd\_count\_ephemeral\_storage) | The number of local SSDs to attach to each node to back ephemeral storage.<br>Uses NVMe interfaces. Must be supported by `machine_type`.<br>When set to null, default value either is [set based on machine\_type](https://cloud.google.com/compute/docs/disks/local-ssd#choose_number_local_ssds) or GKE decides about default value.<br>[See above](#local-ssd-storage) for more info. | `number` | `null` | no |
| <a name="input_local_ssd_count_nvme_block"></a> [local\_ssd\_count\_nvme\_block](#input\_local\_ssd\_count\_nvme\_block) | The number of local SSDs to attach to each node to back block storage.<br>Uses NVMe interfaces. Must be supported by `machine_type`.<br>When set to null, default value either is [set based on machine\_type](https://cloud.google.com/compute/docs/disks/local-ssd#choose_number_local_ssds) or GKE decides about default value.<br>[See above](#local-ssd-storage) for more info. | `number` | `null` | no |
| <a name="input_machine_type"></a> [machine\_type](#input\_machine\_type) | The name of a Google Compute Engine machine type. | `string` | `"c2-standard-60"` | no |
| <a name="input_name"></a> [name](#input\_name) | The name of the node pool. If left blank, will default to the machine type. | `string` | `null` | no |
| <a name="input_project_id"></a> [project\_id](#input\_project\_id) | The project ID to host the cluster in. | `string` | n/a | yes |
Expand Down
36 changes: 36 additions & 0 deletions modules/compute/gke-node-pool/disk_definitions.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
/**
* Copyright 2024 Google LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

## Required variables:
# local_ssd_count_ephemeral_storage
# local_ssd_count_nvme_block
# machine_type

locals {

local_ssd_machines = {
"a3-highgpu-8g" = { local_ssd_count_ephemeral_storage = 16, local_ssd_count_nvme_block = null },
"a3-megagpu-8g" = { local_ssd_count_ephemeral_storage = 16, local_ssd_count_nvme_block = null },
}

generated_local_ssd_config = lookup(local.local_ssd_machines, var.machine_type, { local_ssd_count_ephemeral_storage = null, local_ssd_count_nvme_block = null })

# Select in priority order:
# (1) var.local_ssd_count_ephemeral_storage and var.local_ssd_count_nvme_block if any is not null
# (2) local.local_ssd_machines if not empty
# (3) default to null value for both local_ssd_count_ephemeral_storage and local_ssd_count_nvme_block
sharabiani marked this conversation as resolved.
Show resolved Hide resolved
local_ssd_config = (var.local_ssd_count_ephemeral_storage == null && var.local_ssd_count_nvme_block == null) ? local.generated_local_ssd_config : { local_ssd_count_ephemeral_storage = var.local_ssd_count_ephemeral_storage, local_ssd_count_nvme_block = var.local_ssd_count_nvme_block }
}
10 changes: 5 additions & 5 deletions modules/compute/gke-node-pool/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -105,16 +105,16 @@ resource "google_container_node_pool" "node_pool" {
}

dynamic "ephemeral_storage_local_ssd_config" {
for_each = var.local_ssd_count_ephemeral_storage != null ? [1] : []
for_each = local.local_ssd_config.local_ssd_count_ephemeral_storage != null ? [1] : []
content {
local_ssd_count = var.local_ssd_count_ephemeral_storage
local_ssd_count = local.local_ssd_config.local_ssd_count_ephemeral_storage
}
}

dynamic "local_nvme_ssd_block_config" {
for_each = var.local_ssd_count_nvme_block != null ? [1] : []
for_each = local.local_ssd_config.local_ssd_count_nvme_block != null ? [1] : []
content {
local_ssd_count = var.local_ssd_count_nvme_block
local_ssd_count = local.local_ssd_config.local_ssd_count_nvme_block
}
}

Expand Down Expand Up @@ -189,7 +189,7 @@ resource "google_container_node_pool" "node_pool" {
error_message = "static_node_count cannot be set with either autoscaling_total_min_nodes or autoscaling_total_max_nodes."
}
precondition {
condition = !(coalesce(var.local_ssd_count_ephemeral_storage, 0) > 0 && coalesce(var.local_ssd_count_nvme_block, 0) > 0)
condition = !(coalesce(local.local_ssd_config.local_ssd_count_ephemeral_storage, 0) > 0 && coalesce(local.local_ssd_config.local_ssd_count_nvme_block, 0) > 0)
error_message = "Only one of local_ssd_count_ephemeral_storage or local_ssd_count_nvme_block can be set to a non-zero value."
}
precondition {
Expand Down
4 changes: 2 additions & 2 deletions modules/compute/gke-node-pool/variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,7 @@ variable "local_ssd_count_ephemeral_storage" {
description = <<-EOT
The number of local SSDs to attach to each node to back ephemeral storage.
Uses NVMe interfaces. Must be supported by `machine_type`.
When set to null, GKE decides about default value.
When set to null, default value either is [set based on machine_type](https://cloud.google.com/compute/docs/disks/local-ssd#choose_number_local_ssds) or GKE decides about default value.
[See above](#local-ssd-storage) for more info.
EOT
type = number
Expand All @@ -104,7 +104,7 @@ variable "local_ssd_count_nvme_block" {
description = <<-EOT
The number of local SSDs to attach to each node to back block storage.
Uses NVMe interfaces. Must be supported by `machine_type`.
When set to null, GKE decides about default value.
When set to null, default value either is [set based on machine_type](https://cloud.google.com/compute/docs/disks/local-ssd#choose_number_local_ssds) or GKE decides about default value.
[See above](#local-ssd-storage) for more info.

EOT
Expand Down
Loading