CI: eks cluster creation fails with "The maximum number of internet gateways/VPCs has been reached." #1171

orfeas-k · 2024-12-03T09:37:25Z

Bug Description

As seen here, eks cluster fails with the following error https://github.com/canonical/bundle-kubeflow/actions/runs/12130614512/job/33821313708#step:11:47.

To Reproduce

Rerun CI with all 3 supported versions

Environment

EKS 1.29

Relevant Log Output

2024-12-03 00:39:55 [ℹ]  eksctl version 0.196.0
2024-12-03 00:39:55 [ℹ]  using region eu-central-1
2024-12-03 00:39:55 [ℹ]  subnets for eu-central-1a - public:192.168.0.0/19 private:192.168.64.0/19
2024-12-03 00:39:55 [ℹ]  subnets for eu-central-1b - public:192.168.32.0/19 private:192.168.96.0/19
2024-12-03 00:39:55 [ℹ]  nodegroup "ng-d06bd84e" will use "ami-015db95d8173273e9" [Ubuntu2004/1.29]
2024-12-03 00:39:55 [ℹ]  using Kubernetes version 1.29
2024-12-03 00:39:55 [ℹ]  creating EKS cluster "kubeflow-test-latest" in "eu-central-1" region with managed nodes
2024-12-03 00:39:55 [ℹ]  1 nodegroup (ng-d06bd84e) was included (based on the include/exclude rules)
2024-12-03 00:39:55 [ℹ]  will create a CloudFormation stack for cluster itself and 1 managed nodegroup stack(s)
2024-12-03 00:39:55 [ℹ]  if you encounter any issues, check CloudFormation console or try 'eksctl utils describe-stacks --region=eu-central-1 --cluster=kubeflow-test-latest'
2024-12-03 00:39:55 [ℹ]  Kubernetes API endpoint access will use default of {publicAccess=true, privateAccess=false} for cluster "kubeflow-test-latest" in "eu-central-1"
2024-12-03 00:39:55 [ℹ]  CloudWatch logging will not be enabled for cluster "kubeflow-test-latest" in "eu-central-1"
2024-12-03 00:39:55 [ℹ]  you can enable it with 'eksctl utils update-cluster-logging --enable-types={SPECIFY-YOUR-LOG-TYPES-HERE (e.g. all)} --region=eu-central-1 --cluster=kubeflow-test-latest'
2024-12-03 00:39:55 [ℹ]  default addons coredns, vpc-cni, kube-proxy were not specified, will install them as EKS addons
2024-12-03 00:39:55 [ℹ]  
2 sequential tasks: { create cluster control plane "kubeflow-test-latest", 
    2 sequential sub-tasks: { 
        2 sequential sub-tasks: { 
            1 task: { create addons },
            wait for control plane to become ready,
        },
        create managed nodegroup "ng-d06bd84e",
    } 
}
2024-12-03 00:39:55 [ℹ]  building cluster stack "eksctl-kubeflow-test-latest-cluster"
2024-12-03 00:39:56 [ℹ]  deploying stack "eksctl-kubeflow-test-latest-cluster"
2024-12-03 00:40:26 [ℹ]  waiting for CloudFormation stack "eksctl-kubeflow-test-latest-cluster"
2024-12-03 00:40:27 [✖]  unexpected status "ROLLBACK_IN_PROGRESS" while waiting for CloudFormation stack "eksctl-kubeflow-test-latest-cluster"
2024-12-03 00:40:27 [✖]  unexpected status "ROLLBACK_IN_PROGRESS" while waiting for CloudFormation stack "eksctl-kubeflow-test-latest-cluster"
2024-12-03 00:40:27 [ℹ]  fetching stack events in attempt to troubleshoot the root cause of the failure
2024-12-03 00:40:27 [!]  AWS::EC2::EIP/NATIP: DELETE_IN_PROGRESS
Error: failed to create cluster "kubeflow-test-latest"
2024-12-03 00:40:27 [!]  AWS::IAM::Role/ServiceRole: DELETE_IN_PROGRESS
2024-12-03 00:40:27 [✖]  AWS::IAM::Role/ServiceRole: CREATE_FAILED – "Resource creation cancelled"
2024-12-03 00:40:27 [✖]  AWS::EC2::EIP/NATIP: CREATE_FAILED – "Resource creation cancelled"
2024-12-03 00:40:27 [✖]  AWS::EC2::InternetGateway/InternetGateway: CREATE_FAILED – "Resource handler returned message: \"The maximum number of internet gateways has been reached. (Service: Ec2, Status Code: 400, Request ID: a97f20de-a1fa-4fd2-8a2f-f83ef2ccfaf9)\" (RequestToken: 933fc990-1bed-f543-4a69-ac24808072f5, HandlerErrorCode: ServiceLimitExceeded)"
2024-12-03 00:40:27 [✖]  AWS::EC2::VPC/VPC: CREATE_FAILED – "Resource handler returned message: \"The maximum number of VPCs has been reached. (Service: Ec2, Status Code: 400, Request ID: d75bf6ea-669a-4723-b056-cfa10de61ad8)\" (RequestToken: 99f79551-eabf-93fe-f5f8-4a65e599edb6, HandlerErrorCode: GeneralServiceException)"
2024-12-03 00:40:27 [!]  1 error(s) occurred and cluster hasn't been created properly, you may wish to check CloudFormation console
2024-12-03 00:40:27 [ℹ]  to cleanup resources, run 'eksctl delete cluster --region=eu-central-1 --name=kubeflow-test-latest'
2024-12-03 00:40:27 [✖]  ResourceNotReady: failed waiting for successful resource state
Error: Process completed with exit code 1.

Additional Context

No response

syncronize-issues-to-jira · 2024-12-03T09:37:33Z

Thank you for reporting us your feedback!

The internal ticket has been created: https://warthogs.atlassian.net/browse/KF-6636.

This message was autogenerated

orfeas-k added the bug Something isn't working label Dec 3, 2024

This was referenced Dec 3, 2024

CI: Leftover resources when EKS cluster creations fails unexpectedly #1173

Closed

ci: Rerun CI in all charm repos + bundle #1158

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CI: eks cluster creation fails with "The maximum number of internet gateways/VPCs has been reached." #1171

CI: eks cluster creation fails with "The maximum number of internet gateways/VPCs has been reached." #1171

orfeas-k commented Dec 3, 2024

syncronize-issues-to-jira bot commented Dec 3, 2024

CI: eks cluster creation fails with "The maximum number of internet gateways/VPCs has been reached." #1171

CI: eks cluster creation fails with "The maximum number of internet gateways/VPCs has been reached." #1171

Comments

orfeas-k commented Dec 3, 2024

Bug Description

To Reproduce

Environment

Relevant Log Output

Additional Context

syncronize-issues-to-jira bot commented Dec 3, 2024