Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[aws-eks] EKS update failure causes the cluster to be deleted #4310

Closed
lkoniecz opened this issue Oct 1, 2019 · 1 comment · Fixed by #4696
Closed

[aws-eks] EKS update failure causes the cluster to be deleted #4310

lkoniecz opened this issue Oct 1, 2019 · 1 comment · Fixed by #4696
Assignees
Labels
@aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service bug This issue is a bug. p0

Comments

@lkoniecz
Copy link

lkoniecz commented Oct 1, 2019

Tried to change EKS cluster version from 1.12 to 1.13.

The resource:

eks_cluster = aws_eks.Cluster(
            scope=self,
            id=id,
            cluster_name=id,
            default_capacity=0,
            masters_role=cluster_masters_role,
            version='1.13',
            vpc=vpc
        )

Resources
[~] AWS::AutoScaling::LaunchConfiguration DevEksCluster/DevEksCluster/EksClusterAsgEuWest1A/LaunchConfig DevEksClusterEksClusterAsgEuWest1ALaunchConfig5ACFE7AF replace
 └─ [~] ImageId (requires replacement)
     ├─ [-] ami-0404d23c7e8188740
     └─ [+] ami-00ac2e6b3cb38a9b9
[~] AWS::Serverless::Application kubectl-layer-8C2542BC-BF2B-4DFE-B765-E181FD30A9A0 kubectllayer8C2542BCBF2B4DFEB765E181FD30A9A0617C4ADA replace
[~] AWS::AutoScaling::LaunchConfiguration DevEksCluster/DevEksCluster/EksClusterAsgEuWest1B/LaunchConfig DevEksClusterEksClusterAsgEuWest1BLaunchConfig868911E1 replace
 └─ [~] ImageId (requires replacement)
     ├─ [-] ami-0404d23c7e8188740
     └─ [+] ami-00ac2e6b3cb38a9b9
[~] Custom::AWSCDK-EKS-Cluster DevEksCluster/DevEksCluster/Resource/Resource DevEksCluster6F41DD8A 
 └─ [~] Config
     └─ [~] .version:
         ├─ [-] 1.12
         └─ [+] 1.13
[~] AWS::AutoScaling::LaunchConfiguration DevEksCluster/DevEksCluster/EksClusterAsgEuWest1C/LaunchConfig DevEksClusterEksClusterAsgEuWest1CLaunchConfigDF5DF8DE replace
 └─ [~] ImageId (requires replacement)
     ├─ [-] ami-0404d23c7e8188740
     └─ [+] ami-00ac2e6b3cb38a9b9

Stack failed to update. See error log attached.

cdk marked the cluster resource for deletion
4/53 | 11:12:05 | DELETE_IN_PROGRESS | AWS::CloudFormation::CustomResource | DevEksCluster/DevEksCluster/Resource/Resource/Default (DevEksCluster6F41DD8A
and in result the cluster was deleted instead of being brought back to previous state.

Reproduction Steps

Change eks cluster version from 1.12 to 1.13 with kubectl enabled

Error Log

dev-eks-cluster: deploying...
dev-eks-cluster: creating CloudFormation changeset...
  0/53 | 11:11:11 | UPDATE_IN_PROGRESS   | AWS::CloudFormation::Stack            | kubectl-layer-8C2542BC-BF2B-4DFE-B765-E181FD30A9A0 (kubectllayer8C2542BCBF2B4DFEB765E181FD30A9A0617C4ADA) 
  1/53 | 11:11:33 | UPDATE_COMPLETE      | AWS::CloudFormation::Stack            | kubectl-layer-8C2542BC-BF2B-4DFE-B765-E181FD30A9A0 (kubectllayer8C2542BCBF2B4DFEB765E181FD30A9A0617C4ADA) 
  1/53 | 11:11:41 | UPDATE_IN_PROGRESS   | Custom::AWSCDK-EKS-Cluster            | DevEksCluster/DevEksCluster/Resource/Resource/Default (DevEksCluster6F41DD8A) 
  1/53 | 11:11:44 | UPDATE_IN_PROGRESS   | Custom::AWSCDK-EKS-Cluster            | DevEksCluster/DevEksCluster/Resource/Resource/Default (DevEksCluster6F41DD8A) Requested update required the provider to create a new physical resource
  2/53 | 11:11:44 | UPDATE_FAILED        | Custom::AWSCDK-EKS-Cluster            | DevEksCluster/DevEksCluster/Resource/Resource/Default (DevEksCluster6F41DD8A) Failed to update resource. Parameter validation failed:
Unknown parameter in input: "roleArn", must be one of: name, resourcesVpcConfig, logging, clientRequestToken
Unknown parameter in input: "version", must be one of: name, resourcesVpcConfig, logging, clientRequestToken
	new CustomResource (/tmp/jsii-kernel-m0A7Nm/node_modules/@aws-cdk/aws-cloudformation/lib/custom-resource.js:32:25)
	\_ new ClusterResource (/tmp/jsii-kernel-m0A7Nm/node_modules/@aws-cdk/aws-eks/lib/cluster-resource.js:46:26)
	\_ new Cluster (/tmp/jsii-kernel-m0A7Nm/node_modules/@aws-cdk/aws-eks/lib/cluster.js:81:24)
	\_ obj._wrapSandboxCode (/home/lky/Repositories/HuuugeStarsCfConfig/contrib/cloudformation/cdk/eks/cluster/.env/lib/python3.7/site-packages/jsii/_embedded/jsii/jsii-runtime.js:6666:49)

Environment

  • CLI Version : 1.4.0
  • Framework Version: 1.4.0
  • OS : all
  • Language : all

Other

This is 🐛 Bug Report

@lkoniecz lkoniecz added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Oct 1, 2019
@RomainMuller RomainMuller added the @aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service label Oct 1, 2019
@NGL321 NGL321 added in-progress This issue is being actively worked on. and removed needs-triage This issue or PR still needs to be triaged. labels Oct 4, 2019
@eladb eladb added the p0 label Oct 23, 2019
@eladb eladb closed this as completed Oct 25, 2019
@eladb eladb reopened this Oct 25, 2019
@eladb
Copy link
Contributor

eladb commented Oct 25, 2019

The root cause for cluster deletion upon failed updates is that we do not mirror the physical resource ID we received as an input in case of a failure.

eladb pushed a commit that referenced this issue Oct 26, 2019
Our custom resource naively tried to call the UpdateCluster API for updates, but this is in fact not inline with how AWS::EKS::Cluster is implemented. This change modifies the custom resource handler to handle updates based on the same specification as the official CloudFormation resource:

- Changes the cluster name, VPC or role will cause a replacement (creation of a cluster with a new name and removal of the old cluster).
- Changes to the version will use the UpdateClusterVersion API to update the version in-place.

This fixes #4311.

This commit also fixes #4310 which caused cluster deletions when updates failed. The root cause was that when errors were reported to CFN we always used the log stream name as the physical resource ID, and CFN thought we wanted to replace the resource. Oouch.

This change was manually tested since we still don't have a good unit test harness for this resource so we manually tested all types of updates and observed that the appropriate behaviour was taken (replacements, in-place).
@mergify mergify bot closed this as completed in #4696 Oct 28, 2019
mergify bot pushed a commit that referenced this issue Oct 28, 2019
* fix(eks): cannot update cluster configuration

Our custom resource naively tried to call the UpdateCluster API for updates, but this is in fact not inline with how AWS::EKS::Cluster is implemented. This change modifies the custom resource handler to handle updates based on the same specification as the official CloudFormation resource:

- Changes the cluster name, VPC or role will cause a replacement (creation of a cluster with a new name and removal of the old cluster).
- Changes to the version will use the UpdateClusterVersion API to update the version in-place.

This fixes #4311.

This commit also fixes #4310 which caused cluster deletions when updates failed. The root cause was that when errors were reported to CFN we always used the log stream name as the physical resource ID, and CFN thought we wanted to replace the resource. Oouch.

This change was manually tested since we still don't have a good unit test harness for this resource so we manually tested all types of updates and observed that the appropriate behaviour was taken (replacements, in-place).

* handle cluster name updates

* add provisional unit tests for the cluster resource (just a test plan)

* update expectations
@iliapolo iliapolo changed the title EKS update failure causes the cluster to be deleted [aws-eks] EKS update failure causes the cluster to be deleted Aug 16, 2020
@iliapolo iliapolo removed the in-progress This issue is being actively worked on. label Aug 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service bug This issue is a bug. p0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants