Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Every plan run resets upgrade_settings.max_surge for default_node_pool #24020

Open
1 task done
mloskot opened this issue Nov 24, 2023 · 8 comments
Open
1 task done

Every plan run resets upgrade_settings.max_surge for default_node_pool #24020

mloskot opened this issue Nov 24, 2023 · 8 comments

Comments

@mloskot
Copy link
Contributor

mloskot commented Nov 24, 2023

Is there an existing issue for this?

  • I have searched the existing issues

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment and review the contribution guide to help.

Terraform Version

1.6.4

AzureRM Provider Version

3.82.0

Affected Resource(s)/Data Source(s)

azurerm_kubernetes_cluster

Terraform Configuration Files

// https://registry.terraform.io/providers/hashicorp/azurerm/3.82.0/docs/resources/kubernetes_cluster#example-usage
resource "azurerm_kubernetes_cluster" "example" {
  name                = "example-aks1"
  location            = azurerm_resource_group.example.location
  resource_group_name = azurerm_resource_group.example.name
  dns_prefix          = "exampleaks1"

  default_node_pool {
    name       = "default"
    node_count = 1
    vm_size    = "Standard_D2_v2"
  }

  //...
}

Debug Output/Panic Output

~ default_node_pool {
            name                         = "default"
            ...
            # (23 unchanged attributes hidden)

          - upgrade_settings {
              - max_surge = "10%" -> null
            }
        }

        # (7 unchanged blocks hidden)
    }

Plan: 0 to add, 1 to change, 0 to destroy.

Expected Behaviour

Subsequent runs of terraform plan not reporting any changes to upgrade_settings.max_surge property.

Actual Behaviour

I use azurerm_resource_group to create new cluster without specifying custom upgrade_settings in default_node_pool and Azure calculates for me default "Maximum surge":

image

Then, every time I run terraform plan, still without customised upgrade_settings in my `.tf, then the provider always try to modify my cluster with

- upgrade_settings {
      - max_surge = "10%" -> null
  }

This is not an expected behaviour, is it?

Steps to Reproduce

  1. terraform plan
  2. terraform apply
  3. terraform plan
  4. terraform plan
  5. ...

Important Factoids

No response

References

No response

@paulgmiller
Copy link

AKS changed the default of max surge in october release so that if you are k8s > 1.28. then max surge is defaulted to 10% (previously it was left blank which implied 1 under the covers)

Release Release 2023-10-01 · Azure/AKS (github.com)

Is there a way we could have done this better for terraform?

@aa2811
Copy link

aa2811 commented Jan 12, 2024

Just to add, we've worked around this by explicitly setting max_surge = "10%" in our terraform files , to avoid the unnecessary changes on plan/apply cycles.

mloskot added a commit to mloskot/terraform-azurerm-aks-light that referenced this issue Jan 12, 2024
Azure Kubernetes Service changed the default max surge
in October release, so that if for clusters based
on Kubernetes >1.28 max surge defaults to 10%, see
https://github.com/Azure/AKS/releases/tag/2023-10-01

Previously it was left blank which implied use of
value 1 under the bonnet.

Using the current version of Terraform AzureRM 3.86.0
leads to implicit resetting of the max_surge:

  max_surge = "10%" -> null

The only workaround to avoid such confusing annoyance is to
set the max_surge with explicit value e.g. default "10%".
But, this requires max_surge to be exposed to end-users of
this module. See also
hashicorp/terraform-provider-azurerm#24020

Closes claranet#6

Signed-off-by: Mateusz Łoskot <mateusz@loskot.net>
@jayctran
Copy link

jayctran commented Jan 25, 2024

EDIT: Ignore me - rookie error, was working off the wrong duplicated file , this also resolved my issue when I modified the correct file

Just to add, we've worked around this by explicitly setting max_surge = "10%" in our terraform files , to avoid the unnecessary changes on plan/apply cycles.

hi @aa2811, do you mind sharing how you did this?
I've added this without any luck:

default_node_pool { upgrade_settings { max_surge = "10%" } }

@brunogomes1
Copy link

brunogomes1 commented Jan 29, 2024

I have been temporarily avoiding this with:

in the resource "azurerm_kubernetes_cluster" I add a line in the lifecycle block(considering you do not change the default node pool like me). eg:

lifecycle {
    ignore_changes = [
      default_node_pool[0],
    ]
  }

and for the resource "azurerm_kubernetes_cluster_node_pool" I do the same, eg:

lifecycle {
   prevent_destroy = false
   ignore_changes = [
     upgrade_settings,
   ]
 }

@finkinfridom
Copy link

lifecycle {
   prevent_destroy = false
   ignore_changes = [
     upgrade_settings,
   ]
 }

this indeed is a workaround to the issue.
What if I want to change this value? I should remove the lifecycle.ignore_changes metadata, apply the change and then re-apply the ignore_changes?

Our team decided to always specify the upgrade_settings.max_surge info (and this avoid useless plan and apply operation) but unfortunately this cannot be applied to Spot instances nodes as the max_surge could not be specified, an empty upgrade_settings block will trigger the initial scenario (where everytime an update is seen) and providing a null-based max_surge value will trigger an error because max_surge is required by the provider.

Any suggestion?

@Pionerd
Copy link

Pionerd commented Jun 10, 2024

What if I want to change this value? I should remove the lifecycle.ignore_changes metadata, apply the change and then re-apply the ignore_changes?

Yes, but if you want to change it, there is no longer a need for the ignore_changes anyway, since from that moment onward you do define a value for it (instead of the missing 10% where others are suffering from).

Any suggestion?

This issue does not apply to Spot instances since, as you already mentioned, it cannot be specified. So you are probably referring to tf code that tries to create both spot and on-demand node pools within the same logic, for which the solution would be to split it up in separate code for spot and on-demand.

@finkinfridom
Copy link

finkinfridom commented Jun 11, 2024

Yes, but if you want to change it, there is no longer a need for the ignore_changes anyway, since from that moment onward you do define a value for it (instead of the missing 10% where others are suffering from).

Yeah, you're right. Sorry but I was in a rush and didn't think properly to the question asked. My bad.

This issue does not apply to Spot instances since, as you already mentioned, it cannot be specified. So you are probably referring to tf code that tries to create both spot and on-demand node pools within the same logic, for which the solution would be to split it up in separate code for spot and on-demand.

Well, kind of. I mean, we have a dedicated module for aks node pool management but indeed we're executing it twice. So this lead to have 2 different and (already) separate resources to be created. Am I wrong?

I was thinking in adding a dynamic ignore_changes block for when priority is set to Spot. This should work, right?

[EDIT] Just found an open discussion and a closed PR (hashicorp/terraform#32608) about my idea of having dynamic lifecycle but it's not implemented (as it throws Blocks of type "lifecycle" are not expected here.
I think your suggested solution is the only feasible one where we'll create 2 separate modules for Regular node pools and Spot node pools.

smerle33 added a commit to jenkins-infra/azure that referenced this issue Jun 18, 2024
…ure value for all nodepools to avoid permanent update (#727)

as per 
- hashicorp/terraform-provider-azurerm#24020

we are having the same repetitive update on azure, so setting the
default should avoir permanent update override
dduportal added a commit to jenkins-infra/azure that referenced this issue Jun 18, 2024
…r-ending updated attribute (#732)

Follow up of #727 , #730 and #731

This fixes the never-ending changed attribute `upgrade_settings {}` for
the 2 Spot node pools on `privatek8s`.

It follows the tip found in
hashicorp/terraform-provider-azurerm#24020 (comment)
to avoid having all of our plans trying to change the 2 node pools.

Signed-off-by: Damien Duportal <damien.duportal@gmail.com>
@FrancoisPoinsot
Copy link

FrancoisPoinsot commented Jul 5, 2024

so I faced the same problem.
I am also in the situation of attempting to use the same module for spot and non-spot nodepools.

If anyone reads that, here is a workaround.
That config is valid for spot node pools:

upgrade_settings {
  max_surge: ""
}

so you can always specify upgrade_settings even for spot node pool, to work around the issue of this thread.
And you can set max_surge's value dynamically to workaround that issue: #19355

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants