Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix local_ssd_config issue that forces node-pool recreation #2968

Merged

Conversation

sharabiani
Copy link
Collaborator

@sharabiani sharabiani commented Aug 22, 2024

This PR fixes the issue that when ephemeral_storage_local_ssd_config is not set, it forces node-pool recreation on re-deploy.

Key Changes:

disk_definitions implemented:

It contain default values for ephemeral_storage_local_ssd_config and local_ssd_count_nvme_block per machine type. It's defined based on the values here.

Selection priority order:

  1. variable local_ssd_count_ephemeral_storage and local_ssd_count_nvme_block if any is not null
  2. disk_definitions if not empty for the machine_type
  3. default to null value for both local_ssd_count_ephemeral_storage and local_ssd_count_nvme_block

Required changes to use the new local variable

Manual testing:

Scenario 1:

  1. Deployed examples/gke-a3-highgpu.yaml
  2. Run the deploy again.
  3. Verified NO destruction and recreation of the node-pool is required.

Scenario 2:

  1. Deployed examples/gke-a3-highgpu.yaml
  2. Set local_ssd_count_ephemeral_storage to 16 explicitly
  3. Run the deploy again.
  4. Verified NO destruction and recreation of the node-pool is required.

Scenario 2:

  1. Deployed examples/gke-a3-highgpu.yaml
  2. Removed "a3-highgpu-8g" from disk_definitions
  3. Run the deploy again.
  4. Verified destruction and recreation of the node-pool is required.

Additional Notes:

  • Made changes to relevant documentation.

@sharabiani sharabiani added the bug Something isn't working label Aug 22, 2024
@sharabiani sharabiani self-assigned this Aug 22, 2024
@sharabiani sharabiani changed the title Fix local_ssd_config issue with forcing node-pool recreation Fix local_ssd_config issue that forces node-pool recreation Aug 22, 2024
@mr0re1 mr0re1 assigned nick-stroud and unassigned sharabiani Aug 22, 2024
@sharabiani sharabiani added the release-bugfix Added to release notes under the "Bug fixes" heading. label Aug 23, 2024
@ankitkinra ankitkinra assigned sharabiani and unassigned nick-stroud Aug 29, 2024
modules/compute/gke-node-pool/variables.tf Outdated Show resolved Hide resolved
modules/compute/gke-node-pool/variables.tf Outdated Show resolved Hide resolved
@sharabiani sharabiani merged commit f6535c6 into GoogleCloudPlatform:develop Aug 29, 2024
8 of 51 checks passed
@rohitramu rohitramu mentioned this pull request Sep 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working release-bugfix Added to release notes under the "Bug fixes" heading.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants