Explicitly set aws-node-termination-handler queue region so crash-loops are avoided, allowing faster startup #977

AndiDog · 2024-12-17T12:39:31Z

What this PR does / why we need it

Until now, NTH only started operating minutes after the cluster came up, or in unhealthy cluster conditions, even later. That could slow down ASG instance refreshes, node termination, etc. NTH only came up because the AWS_REGION environment variable is injected by IRSA.

The crash-looping message FTL Unable to find the AWS region to process queue events. goes away with this fix, but the pod still requires IRSA credentials injection to operate, so it may still take a few minutes to start up. But at least the error becomes clearer with this fix, and we avoid getting alerted.

Checklist

Updated CHANGELOG.md.

AndiDog · 2024-12-17T12:39:38Z

/run cluster-test-suites

…ps are avoided, allowing faster startup

tinkerers-ci · 2024-12-17T13:39:52Z

cluster-test-suites

Run name	`pr-cluster-aws-977-cluster-test-suitesk6tsn`
Commit SHA	`bc143a5`
Result	Failed ❌

📋 View full results in Tekton Dashboard

Rerun trigger:
/run cluster-test-suites

Tip

To only re-run the failed test suites you can provide a TARGET_SUITES parameter with your trigger that points to the directory path of the test suites to run, e.g. /run cluster-test-suites TARGET_SUITES=./providers/capa/standard to re-run the CAPA standard test suite. This supports multiple test suites with each path separated by a comma.

Gacko · 2024-12-17T15:56:13Z

/run cluster-test-suites TARGET_SUITES=./providers/capa/china,./providers/capa/private

tinkerers-ci · 2024-12-17T16:38:49Z

cluster-test-suites

Run name	`pr-cluster-aws-977-cluster-test-suiteslt2mv`
Commit SHA	`7129e40`
Result	Failed ❌

📋 View full results in Tekton Dashboard

Rerun trigger:
/run cluster-test-suites

Tip

To only re-run the failed test suites you can provide a TARGET_SUITES parameter with your trigger that points to the directory path of the test suites to run, e.g. /run cluster-test-suites TARGET_SUITES=./providers/capa/standard to re-run the CAPA standard test suite. This supports multiple test suites with each path separated by a comma.

Gacko · 2024-12-17T18:37:22Z

/run cluster-test-suites TARGET_SUITES=./providers/capa/china

tinkerers-ci · 2024-12-17T19:07:52Z

cluster-test-suites

Run name	`pr-cluster-aws-977-cluster-test-suiteskw2qn`
Commit SHA	`7129e40`
Result	Failed ❌

📋 View full results in Tekton Dashboard

Rerun trigger:
/run cluster-test-suites

Tip

To only re-run the failed test suites you can provide a TARGET_SUITES parameter with your trigger that points to the directory path of the test suites to run, e.g. /run cluster-test-suites TARGET_SUITES=./providers/capa/standard to re-run the CAPA standard test suite. This supports multiple test suites with each path separated by a comma.

AndiDog · 2024-12-18T14:29:43Z

/run cluster-test-suites TARGET_SUITES=./providers/capa/china

tinkerers-ci · 2024-12-18T15:00:16Z

cluster-test-suites

Run name	`pr-cluster-aws-977-cluster-test-suites8bnqz`
Commit SHA	`7129e40`
Result	Failed ❌

📋 View full results in Tekton Dashboard

Rerun trigger:
/run cluster-test-suites

Tip

To only re-run the failed test suites you can provide a TARGET_SUITES parameter with your trigger that points to the directory path of the test suites to run, e.g. /run cluster-test-suites TARGET_SUITES=./providers/capa/standard to re-run the CAPA standard test suite. This supports multiple test suites with each path separated by a comma.

AndiDog · 2024-12-30T11:57:13Z

/run cluster-test-suites

github-actions · 2024-12-30T11:57:28Z

There were differences in the rendered Helm template, please check! ⚠️

Output

=== Differences when rendered with values file helm/cluster-aws/ci/test-auditd-values.yaml ===

/data/values  (v1/ConfigMap/org-giantswarm/test-wc-minimal-aws-nth-bundle-user-values)
  ± value change in multiline text (one insert, no deletions)
    +     awsRegion: eu-west-1




=== Differences when rendered with values file helm/cluster-aws/ci/test-eni-mode-values.yaml ===

/data/values  (v1/ConfigMap/org-giantswarm/test-wc-aws-nth-bundle-user-values)
  ± value change in multiline text (one insert, no deletions)
    +     awsRegion: eu-west-1




=== Differences when rendered with values file helm/cluster-aws/ci/test-lifecycle-hook-heartbeattimeout-values.yaml ===

/data/values  (v1/ConfigMap/org-giantswarm/test-wc-minimal-aws-nth-bundle-user-values)
  ± value change in multiline text (one insert, no deletions)
    +     awsRegion: eu-west-1




=== Differences when rendered with values file helm/cluster-aws/ci/test-local-registry-cache-values.yaml ===

/data/values  (v1/ConfigMap/org-giantswarm/test-wc-aws-nth-bundle-user-values)
  ± value change in multiline text (one insert, no deletions)
    +     awsRegion: eu-west-1




=== Differences when rendered with values file helm/cluster-aws/ci/test-mc-proxy-values.yaml ===

/data/values  (v1/ConfigMap/org-giantswarm/test-mc-proxy-aws-nth-bundle-user-values)
  ± value change in multiline text (one insert, no deletions)
    +     awsRegion: eu-west-1




=== Differences when rendered with values file helm/cluster-aws/ci/test-multiple-authenticated-mirrors-values.yaml ===

/data/values  (v1/ConfigMap/org-giantswarm/test-wc-aws-nth-bundle-user-values)
  ± value change in multiline text (one insert, no deletions)
    +     awsRegion: eu-west-1




=== Differences when rendered with values file helm/cluster-aws/ci/test-multiple-service-account-issuers-values.yaml ===

/data/values  (v1/ConfigMap/org-giantswarm/test-wc-aws-nth-bundle-user-values)
  ± value change in multiline text (one insert, no deletions)
    +     awsRegion: eu-west-1




=== Differences when rendered with values file helm/cluster-aws/ci/test-network-topology-values.yaml ===

/data/values  (v1/ConfigMap/org-giantswarm/test-wc-minimal-aws-nth-bundle-user-values)
  ± value change in multiline text (one insert, no deletions)
    +     awsRegion: eu-west-1




=== Differences when rendered with values file helm/cluster-aws/ci/test-spot-instances-values.yaml ===

/data/values  (v1/ConfigMap/org-giantswarm/test-wc-minimal-aws-nth-bundle-user-values)
  ± value change in multiline text (one insert, no deletions)
    +     awsRegion: eu-west-1




=== Differences when rendered with values file helm/cluster-aws/ci/test-subnet-tags-values.yaml ===

/data/values  (v1/ConfigMap/org-giantswarm/test-wc-aws-nth-bundle-user-values)
  ± value change in multiline text (one insert, no deletions)
    +     awsRegion: eu-west-1




=== Differences when rendered with values file helm/cluster-aws/ci/test-wc-minimal-values.yaml ===

/data/values  (v1/ConfigMap/org-giantswarm/test-wc-minimal-aws-nth-bundle-user-values)
  ± value change in multiline text (one insert, no deletions)
    +     awsRegion: eu-west-1

tinkerers-ci · 2024-12-30T12:58:38Z

Oh No! 😱 At least one test suite has failed during the AfterSuite cleanup stage and might have left around some resources on the MC!

Be sure to check the full results in Tekton Dashboard to see which test suite has failed and then run the following on the associated MC to list all leftover resources:

PIPELINE_RUN="pr-cluster-aws-977-cluster-test-suiteslksjr"

NAMES="$(kubectl api-resources --verbs list -o name | tr '\n' ,)"
kubectl get "${NAMES:0:${#NAMES}-1}" --show-kind --ignore-not-found -l cicd.giantswarm.io/pipelinerun=${PIPELINE_RUN} -A 2>/dev/null

tinkerers-ci · 2024-12-30T12:58:45Z

cluster-test-suites

Run name	`pr-cluster-aws-977-cluster-test-suiteslksjr`
Commit SHA	`9a1af07`
Result	Failed ❌

📋 View full results in Tekton Dashboard

Rerun trigger:
/run cluster-test-suites

Tip

To only re-run the failed test suites you can provide a TARGET_SUITES parameter with your trigger that points to the directory path of the test suites to run, e.g. /run cluster-test-suites TARGET_SUITES=./providers/capa/standard to re-run the CAPA standard test suite. This supports multiple test suites with each path separated by a comma.

Explicitly set aws-node-termination-handler queue region so crash-loo…

7129e40

…ps are avoided, allowing faster startup

AndiDog force-pushed the nth-region branch from bc143a5 to 7129e40 Compare December 17, 2024 13:15

AndiDog changed the title ~~Explicitly set aws-node-termination-handler queue region so it starts up faster~~ Explicitly set aws-node-termination-handler queue region so crash-loops are avoided, allowing faster startup Dec 17, 2024

AndiDog marked this pull request as ready for review December 17, 2024 13:17

AndiDog requested a review from a team as a code owner December 17, 2024 13:17

fiunchinho approved these changes Dec 17, 2024

View reviewed changes

Merge branch 'main' into nth-region

9a1af07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explicitly set aws-node-termination-handler queue region so crash-loops are avoided, allowing faster startup #977

Explicitly set aws-node-termination-handler queue region so crash-loops are avoided, allowing faster startup #977

AndiDog commented Dec 17, 2024 •

edited

Loading

AndiDog commented Dec 17, 2024

tinkerers-ci bot commented Dec 17, 2024

Gacko commented Dec 17, 2024

tinkerers-ci bot commented Dec 17, 2024

Gacko commented Dec 17, 2024

tinkerers-ci bot commented Dec 17, 2024

AndiDog commented Dec 18, 2024

tinkerers-ci bot commented Dec 18, 2024

AndiDog commented Dec 30, 2024

github-actions bot commented Dec 30, 2024

tinkerers-ci bot commented Dec 30, 2024

tinkerers-ci bot commented Dec 30, 2024

Explicitly set aws-node-termination-handler queue region so crash-loops are avoided, allowing faster startup #977

Are you sure you want to change the base?

Explicitly set aws-node-termination-handler queue region so crash-loops are avoided, allowing faster startup #977

Conversation

AndiDog commented Dec 17, 2024 • edited Loading

What this PR does / why we need it

Checklist

AndiDog commented Dec 17, 2024

tinkerers-ci bot commented Dec 17, 2024

cluster-test-suites

Gacko commented Dec 17, 2024

tinkerers-ci bot commented Dec 17, 2024

cluster-test-suites

Gacko commented Dec 17, 2024

tinkerers-ci bot commented Dec 17, 2024

cluster-test-suites

AndiDog commented Dec 18, 2024

tinkerers-ci bot commented Dec 18, 2024

cluster-test-suites

AndiDog commented Dec 30, 2024

github-actions bot commented Dec 30, 2024

tinkerers-ci bot commented Dec 30, 2024

tinkerers-ci bot commented Dec 30, 2024

cluster-test-suites

AndiDog commented Dec 17, 2024 •

edited

Loading