Every plan run resets upgrade_settings.max_surge for default_node_pool #24020

mloskot · 2023-11-24T15:26:19Z

Is there an existing issue for this?

I have searched the existing issues

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment and review the contribution guide to help.

Terraform Version

1.6.4

AzureRM Provider Version

3.82.0

Affected Resource(s)/Data Source(s)

azurerm_kubernetes_cluster

Terraform Configuration Files

// https://registry.terraform.io/providers/hashicorp/azurerm/3.82.0/docs/resources/kubernetes_cluster#example-usage
resource "azurerm_kubernetes_cluster" "example" {
  name                = "example-aks1"
  location            = azurerm_resource_group.example.location
  resource_group_name = azurerm_resource_group.example.name
  dns_prefix          = "exampleaks1"

  default_node_pool {
    name       = "default"
    node_count = 1
    vm_size    = "Standard_D2_v2"
  }

  //...
}

Debug Output/Panic Output

~ default_node_pool {
            name                         = "default"
            ...
            # (23 unchanged attributes hidden)

          - upgrade_settings {
              - max_surge = "10%" -> null
            }
        }

        # (7 unchanged blocks hidden)
    }

Plan: 0 to add, 1 to change, 0 to destroy.

Expected Behaviour

Subsequent runs of terraform plan not reporting any changes to upgrade_settings.max_surge property.

Actual Behaviour

I use azurerm_resource_group to create new cluster without specifying custom upgrade_settings in default_node_pool and Azure calculates for me default "Maximum surge":

Then, every time I run terraform plan, still without customised upgrade_settings in my `.tf, then the provider always try to modify my cluster with

- upgrade_settings {
      - max_surge = "10%" -> null
  }

This is not an expected behaviour, is it?

Steps to Reproduce

terraform plan
terraform apply
terraform plan
terraform plan
...

Important Factoids

No response

References

No response

The text was updated successfully, but these errors were encountered:

paulgmiller · 2024-01-11T17:47:52Z

AKS changed the default of max surge in october release so that if you are k8s > 1.28. then max surge is defaulted to 10% (previously it was left blank which implied 1 under the covers)

Release Release 2023-10-01 · Azure/AKS (github.com)

Is there a way we could have done this better for terraform?

aa2811 · 2024-01-12T09:57:01Z

Just to add, we've worked around this by explicitly setting max_surge = "10%" in our terraform files , to avoid the unnecessary changes on plan/apply cycles.

Azure Kubernetes Service changed the default max surge in October release, so that if for clusters based on Kubernetes >1.28 max surge defaults to 10%, see https://github.com/Azure/AKS/releases/tag/2023-10-01 Previously it was left blank which implied use of value 1 under the bonnet. Using the current version of Terraform AzureRM 3.86.0 leads to implicit resetting of the max_surge: max_surge = "10%" -> null The only workaround to avoid such confusing annoyance is to set the max_surge with explicit value e.g. default "10%". But, this requires max_surge to be exposed to end-users of this module. See also hashicorp/terraform-provider-azurerm#24020 Closes claranet#6 Signed-off-by: Mateusz Łoskot <mateusz@loskot.net>

jayctran · 2024-01-25T08:09:55Z

EDIT: Ignore me - rookie error, was working off the wrong duplicated file , this also resolved my issue when I modified the correct file

Just to add, we've worked around this by explicitly setting max_surge = "10%" in our terraform files , to avoid the unnecessary changes on plan/apply cycles.

hi @aa2811, do you mind sharing how you did this?
I've added this without any luck:

default_node_pool { upgrade_settings { max_surge = "10%" } }

brunogomes1 · 2024-01-29T20:46:24Z

I have been temporarily avoiding this with:

in the resource "azurerm_kubernetes_cluster" I add a line in the lifecycle block(considering you do not change the default node pool like me). eg:

lifecycle {
    ignore_changes = [
      default_node_pool[0],
    ]
  }

and for the resource "azurerm_kubernetes_cluster_node_pool" I do the same, eg:

lifecycle {
   prevent_destroy = false
   ignore_changes = [
     upgrade_settings,
   ]
 }

finkinfridom · 2024-06-10T06:49:07Z

lifecycle {
   prevent_destroy = false
   ignore_changes = [
     upgrade_settings,
   ]
 }

this indeed is a workaround to the issue.
What if I want to change this value? I should remove the lifecycle.ignore_changes metadata, apply the change and then re-apply the ignore_changes?

Our team decided to always specify the upgrade_settings.max_surge info (and this avoid useless plan and apply operation) but unfortunately this cannot be applied to Spot instances nodes as the max_surge could not be specified, an empty upgrade_settings block will trigger the initial scenario (where everytime an update is seen) and providing a null-based max_surge value will trigger an error because max_surge is required by the provider.

Any suggestion?

Pionerd · 2024-06-10T13:37:09Z

What if I want to change this value? I should remove the lifecycle.ignore_changes metadata, apply the change and then re-apply the ignore_changes?

Yes, but if you want to change it, there is no longer a need for the ignore_changes anyway, since from that moment onward you do define a value for it (instead of the missing 10% where others are suffering from).

Any suggestion?

This issue does not apply to Spot instances since, as you already mentioned, it cannot be specified. So you are probably referring to tf code that tries to create both spot and on-demand node pools within the same logic, for which the solution would be to split it up in separate code for spot and on-demand.

finkinfridom · 2024-06-11T10:15:36Z

Yes, but if you want to change it, there is no longer a need for the ignore_changes anyway, since from that moment onward you do define a value for it (instead of the missing 10% where others are suffering from).

Yeah, you're right. Sorry but I was in a rush and didn't think properly to the question asked. My bad.

This issue does not apply to Spot instances since, as you already mentioned, it cannot be specified. So you are probably referring to tf code that tries to create both spot and on-demand node pools within the same logic, for which the solution would be to split it up in separate code for spot and on-demand.

Well, kind of. I mean, we have a dedicated module for aks node pool management but indeed we're executing it twice. So this lead to have 2 different and (already) separate resources to be created. Am I wrong?

I was thinking in adding a dynamic ignore_changes block for when priority is set to Spot. This should work, right?

[EDIT] Just found an open discussion and a closed PR (hashicorp/terraform#32608) about my idea of having dynamic lifecycle but it's not implemented (as it throws Blocks of type "lifecycle" are not expected here.
I think your suggested solution is the only feasible one where we'll create 2 separate modules for Regular node pools and Spot node pools.

…ure value for all nodepools to avoid permanent update (#727) as per - hashicorp/terraform-provider-azurerm#24020 we are having the same repetitive update on azure, so setting the default should avoir permanent update override

…r-ending updated attribute (#732) Follow up of #727 , #730 and #731 This fixes the never-ending changed attribute `upgrade_settings {}` for the 2 Spot node pools on `privatek8s`. It follows the tip found in hashicorp/terraform-provider-azurerm#24020 (comment) to avoid having all of our plans trying to change the 2 node pools. Signed-off-by: Damien Duportal <damien.duportal@gmail.com>

FrancoisPoinsot · 2024-07-05T09:05:17Z

so I faced the same problem.
I am also in the situation of attempting to use the same module for spot and non-spot nodepools.

If anyone reads that, here is a workaround.
That config is valid for spot node pools:

upgrade_settings {
  max_surge: ""
}

so you can always specify upgrade_settings even for spot node pool, to work around the issue of this thread.
And you can set max_surge's value dynamically to workaround that issue: #19355

github-actions bot added the v/3.x label Nov 24, 2023

rcskosir added the service/kubernetes-cluster label Nov 27, 2023

mloskot mentioned this issue Dec 6, 2023

[FEAT] Allow customization of upgrade_settings.max_surge claranet/terraform-azurerm-aks-light#6

Closed

mloskot mentioned this issue Jan 12, 2024

feat: Allow users set upgrade_settings.max_surge property claranet/terraform-azurerm-aks-light#9

Closed

ms-henglu mentioned this issue Mar 11, 2024

azurerm_kubernetes_cluster - fix tests #25200

Merged

Israphel mentioned this issue Mar 11, 2024

State drift due to upgrade_settings -> max_surge Azure/terraform-azurerm-aks#522

Closed

1 task

stephybun mentioned this issue Apr 2, 2024

importer-rest-api-specs - update aks test template hashicorp/pandora#4030

Merged

cveld mentioned this issue Apr 16, 2024

Starting from AKS 1.28 max_surge is set to 10% CloudNationHQ/terraform-azure-aks#51

Closed

smerle33 mentioned this issue Jun 14, 2024

chore(AKS clusters) explicit upgrade_settings.max_surge to default Azure value for all nodepools to avoid permanent update jenkins-infra/azure#727

Merged

dduportal mentioned this issue Jun 18, 2024

chore(privatek8s) ignore changes for upgrade_settings to avoid never-ending updated attribute jenkins-infra/azure#732

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Every plan run resets upgrade_settings.max_surge for default_node_pool #24020

Every plan run resets upgrade_settings.max_surge for default_node_pool #24020

mloskot commented Nov 24, 2023 •

edited

Loading

paulgmiller commented Jan 11, 2024

aa2811 commented Jan 12, 2024

jayctran commented Jan 25, 2024 •

edited

Loading

brunogomes1 commented Jan 29, 2024 •

edited

Loading

finkinfridom commented Jun 10, 2024

Pionerd commented Jun 10, 2024 •

edited

Loading

finkinfridom commented Jun 11, 2024 •

edited

Loading

FrancoisPoinsot commented Jul 5, 2024 •

edited

Loading

Every plan run resets upgrade_settings.max_surge for default_node_pool #24020

Every plan run resets upgrade_settings.max_surge for default_node_pool #24020

Comments

mloskot commented Nov 24, 2023 • edited Loading

Is there an existing issue for this?

Community Note

Terraform Version

AzureRM Provider Version

Affected Resource(s)/Data Source(s)

Terraform Configuration Files

Debug Output/Panic Output

Expected Behaviour

Actual Behaviour

Steps to Reproduce

Important Factoids

References

paulgmiller commented Jan 11, 2024

aa2811 commented Jan 12, 2024

jayctran commented Jan 25, 2024 • edited Loading

brunogomes1 commented Jan 29, 2024 • edited Loading

finkinfridom commented Jun 10, 2024

Pionerd commented Jun 10, 2024 • edited Loading

finkinfridom commented Jun 11, 2024 • edited Loading

FrancoisPoinsot commented Jul 5, 2024 • edited Loading

mloskot commented Nov 24, 2023 •

edited

Loading

jayctran commented Jan 25, 2024 •

edited

Loading

brunogomes1 commented Jan 29, 2024 •

edited

Loading

Pionerd commented Jun 10, 2024 •

edited

Loading

finkinfridom commented Jun 11, 2024 •

edited

Loading

FrancoisPoinsot commented Jul 5, 2024 •

edited

Loading