Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explicitly set aws-node-termination-handler queue region so crash-loops are avoided, allowing faster startup #977

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

AndiDog
Copy link
Contributor

@AndiDog AndiDog commented Dec 17, 2024

What this PR does / why we need it

Towards giantswarm/roadmap#3802

Until now, NTH only started operating minutes after the cluster came up, or in unhealthy cluster conditions, even later. That could slow down ASG instance refreshes, node termination, etc. NTH only came up because the AWS_REGION environment variable is injected by IRSA.

The crash-looping message FTL Unable to find the AWS region to process queue events. goes away with this fix, but the pod still requires IRSA credentials injection to operate, so it may still take a few minutes to start up. But at least the error becomes clearer with this fix, and we avoid getting alerted.

Checklist

  • Updated CHANGELOG.md.

@AndiDog
Copy link
Contributor Author

AndiDog commented Dec 17, 2024

/run cluster-test-suites

@AndiDog AndiDog changed the title Explicitly set aws-node-termination-handler queue region so it starts up faster Explicitly set aws-node-termination-handler queue region so crash-loops are avoided, allowing faster startup Dec 17, 2024
@AndiDog AndiDog marked this pull request as ready for review December 17, 2024 13:17
@AndiDog AndiDog requested a review from a team as a code owner December 17, 2024 13:17
@tinkerers-ci
Copy link

tinkerers-ci bot commented Dec 17, 2024

cluster-test-suites

Run name pr-cluster-aws-977-cluster-test-suitesk6tsn
Commit SHA bc143a5
Result Failed ❌

📋 View full results in Tekton Dashboard

Rerun trigger:
/run cluster-test-suites


Tip

To only re-run the failed test suites you can provide a TARGET_SUITES parameter with your trigger that points to the directory path of the test suites to run, e.g. /run cluster-test-suites TARGET_SUITES=./providers/capa/standard to re-run the CAPA standard test suite. This supports multiple test suites with each path separated by a comma.

@Gacko
Copy link
Member

Gacko commented Dec 17, 2024

/run cluster-test-suites TARGET_SUITES=./providers/capa/china,./providers/capa/private

@tinkerers-ci
Copy link

tinkerers-ci bot commented Dec 17, 2024

cluster-test-suites

Run name pr-cluster-aws-977-cluster-test-suiteslt2mv
Commit SHA 7129e40
Result Failed ❌

📋 View full results in Tekton Dashboard

Rerun trigger:
/run cluster-test-suites


Tip

To only re-run the failed test suites you can provide a TARGET_SUITES parameter with your trigger that points to the directory path of the test suites to run, e.g. /run cluster-test-suites TARGET_SUITES=./providers/capa/standard to re-run the CAPA standard test suite. This supports multiple test suites with each path separated by a comma.

@Gacko
Copy link
Member

Gacko commented Dec 17, 2024

/run cluster-test-suites TARGET_SUITES=./providers/capa/china

@tinkerers-ci
Copy link

tinkerers-ci bot commented Dec 17, 2024

cluster-test-suites

Run name pr-cluster-aws-977-cluster-test-suiteskw2qn
Commit SHA 7129e40
Result Failed ❌

📋 View full results in Tekton Dashboard

Rerun trigger:
/run cluster-test-suites


Tip

To only re-run the failed test suites you can provide a TARGET_SUITES parameter with your trigger that points to the directory path of the test suites to run, e.g. /run cluster-test-suites TARGET_SUITES=./providers/capa/standard to re-run the CAPA standard test suite. This supports multiple test suites with each path separated by a comma.

@AndiDog
Copy link
Contributor Author

AndiDog commented Dec 18, 2024

/run cluster-test-suites TARGET_SUITES=./providers/capa/china

@tinkerers-ci
Copy link

tinkerers-ci bot commented Dec 18, 2024

cluster-test-suites

Run name pr-cluster-aws-977-cluster-test-suites8bnqz
Commit SHA 7129e40
Result Failed ❌

📋 View full results in Tekton Dashboard

Rerun trigger:
/run cluster-test-suites


Tip

To only re-run the failed test suites you can provide a TARGET_SUITES parameter with your trigger that points to the directory path of the test suites to run, e.g. /run cluster-test-suites TARGET_SUITES=./providers/capa/standard to re-run the CAPA standard test suite. This supports multiple test suites with each path separated by a comma.

@AndiDog
Copy link
Contributor Author

AndiDog commented Dec 30, 2024

/run cluster-test-suites

Copy link
Contributor

There were differences in the rendered Helm template, please check! ⚠️

Output
=== Differences when rendered with values file helm/cluster-aws/ci/test-auditd-values.yaml ===

/data/values  (v1/ConfigMap/org-giantswarm/test-wc-minimal-aws-nth-bundle-user-values)
  ± value change in multiline text (one insert, no deletions)
    +     awsRegion: eu-west-1




=== Differences when rendered with values file helm/cluster-aws/ci/test-eni-mode-values.yaml ===

/data/values  (v1/ConfigMap/org-giantswarm/test-wc-aws-nth-bundle-user-values)
  ± value change in multiline text (one insert, no deletions)
    +     awsRegion: eu-west-1




=== Differences when rendered with values file helm/cluster-aws/ci/test-lifecycle-hook-heartbeattimeout-values.yaml ===

/data/values  (v1/ConfigMap/org-giantswarm/test-wc-minimal-aws-nth-bundle-user-values)
  ± value change in multiline text (one insert, no deletions)
    +     awsRegion: eu-west-1




=== Differences when rendered with values file helm/cluster-aws/ci/test-local-registry-cache-values.yaml ===

/data/values  (v1/ConfigMap/org-giantswarm/test-wc-aws-nth-bundle-user-values)
  ± value change in multiline text (one insert, no deletions)
    +     awsRegion: eu-west-1




=== Differences when rendered with values file helm/cluster-aws/ci/test-mc-proxy-values.yaml ===

/data/values  (v1/ConfigMap/org-giantswarm/test-mc-proxy-aws-nth-bundle-user-values)
  ± value change in multiline text (one insert, no deletions)
    +     awsRegion: eu-west-1




=== Differences when rendered with values file helm/cluster-aws/ci/test-multiple-authenticated-mirrors-values.yaml ===

/data/values  (v1/ConfigMap/org-giantswarm/test-wc-aws-nth-bundle-user-values)
  ± value change in multiline text (one insert, no deletions)
    +     awsRegion: eu-west-1




=== Differences when rendered with values file helm/cluster-aws/ci/test-multiple-service-account-issuers-values.yaml ===

/data/values  (v1/ConfigMap/org-giantswarm/test-wc-aws-nth-bundle-user-values)
  ± value change in multiline text (one insert, no deletions)
    +     awsRegion: eu-west-1




=== Differences when rendered with values file helm/cluster-aws/ci/test-network-topology-values.yaml ===

/data/values  (v1/ConfigMap/org-giantswarm/test-wc-minimal-aws-nth-bundle-user-values)
  ± value change in multiline text (one insert, no deletions)
    +     awsRegion: eu-west-1




=== Differences when rendered with values file helm/cluster-aws/ci/test-spot-instances-values.yaml ===

/data/values  (v1/ConfigMap/org-giantswarm/test-wc-minimal-aws-nth-bundle-user-values)
  ± value change in multiline text (one insert, no deletions)
    +     awsRegion: eu-west-1




=== Differences when rendered with values file helm/cluster-aws/ci/test-subnet-tags-values.yaml ===

/data/values  (v1/ConfigMap/org-giantswarm/test-wc-aws-nth-bundle-user-values)
  ± value change in multiline text (one insert, no deletions)
    +     awsRegion: eu-west-1




=== Differences when rendered with values file helm/cluster-aws/ci/test-wc-minimal-values.yaml ===

/data/values  (v1/ConfigMap/org-giantswarm/test-wc-minimal-aws-nth-bundle-user-values)
  ± value change in multiline text (one insert, no deletions)
    +     awsRegion: eu-west-1


@tinkerers-ci
Copy link

tinkerers-ci bot commented Dec 30, 2024

Oh No! 😱 At least one test suite has failed during the AfterSuite cleanup stage and might have left around some resources on the MC!

Be sure to check the full results in Tekton Dashboard to see which test suite has failed and then run the following on the associated MC to list all leftover resources:

PIPELINE_RUN="pr-cluster-aws-977-cluster-test-suiteslksjr"

NAMES="$(kubectl api-resources --verbs list -o name | tr '\n' ,)"
kubectl get "${NAMES:0:${#NAMES}-1}" --show-kind --ignore-not-found -l cicd.giantswarm.io/pipelinerun=${PIPELINE_RUN} -A 2>/dev/null

@tinkerers-ci
Copy link

tinkerers-ci bot commented Dec 30, 2024

cluster-test-suites

Run name pr-cluster-aws-977-cluster-test-suiteslksjr
Commit SHA 9a1af07
Result Failed ❌

📋 View full results in Tekton Dashboard

Rerun trigger:
/run cluster-test-suites


Tip

To only re-run the failed test suites you can provide a TARGET_SUITES parameter with your trigger that points to the directory path of the test suites to run, e.g. /run cluster-test-suites TARGET_SUITES=./providers/capa/standard to re-run the CAPA standard test suite. This supports multiple test suites with each path separated by a comma.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants