[aws-eks] EKS update failure causes the cluster to be deleted #4310

lkoniecz · 2019-10-01T09:25:30Z

Tried to change EKS cluster version from 1.12 to 1.13.

The resource:

eks_cluster = aws_eks.Cluster(
            scope=self,
            id=id,
            cluster_name=id,
            default_capacity=0,
            masters_role=cluster_masters_role,
            version='1.13',
            vpc=vpc
        )

Resources
[~] AWS::AutoScaling::LaunchConfiguration DevEksCluster/DevEksCluster/EksClusterAsgEuWest1A/LaunchConfig DevEksClusterEksClusterAsgEuWest1ALaunchConfig5ACFE7AF replace
 └─ [~] ImageId (requires replacement)
     ├─ [-] ami-0404d23c7e8188740
     └─ [+] ami-00ac2e6b3cb38a9b9
[~] AWS::Serverless::Application kubectl-layer-8C2542BC-BF2B-4DFE-B765-E181FD30A9A0 kubectllayer8C2542BCBF2B4DFEB765E181FD30A9A0617C4ADA replace
[~] AWS::AutoScaling::LaunchConfiguration DevEksCluster/DevEksCluster/EksClusterAsgEuWest1B/LaunchConfig DevEksClusterEksClusterAsgEuWest1BLaunchConfig868911E1 replace
 └─ [~] ImageId (requires replacement)
     ├─ [-] ami-0404d23c7e8188740
     └─ [+] ami-00ac2e6b3cb38a9b9
[~] Custom::AWSCDK-EKS-Cluster DevEksCluster/DevEksCluster/Resource/Resource DevEksCluster6F41DD8A 
 └─ [~] Config
     └─ [~] .version:
         ├─ [-] 1.12
         └─ [+] 1.13
[~] AWS::AutoScaling::LaunchConfiguration DevEksCluster/DevEksCluster/EksClusterAsgEuWest1C/LaunchConfig DevEksClusterEksClusterAsgEuWest1CLaunchConfigDF5DF8DE replace
 └─ [~] ImageId (requires replacement)
     ├─ [-] ami-0404d23c7e8188740
     └─ [+] ami-00ac2e6b3cb38a9b9

Stack failed to update. See error log attached.

cdk marked the cluster resource for deletion
4/53 | 11:12:05 | DELETE_IN_PROGRESS | AWS::CloudFormation::CustomResource | DevEksCluster/DevEksCluster/Resource/Resource/Default (DevEksCluster6F41DD8A
and in result the cluster was deleted instead of being brought back to previous state.

Reproduction Steps

Change eks cluster version from 1.12 to 1.13 with kubectl enabled

Error Log

dev-eks-cluster: deploying...
dev-eks-cluster: creating CloudFormation changeset...
  0/53 | 11:11:11 | UPDATE_IN_PROGRESS   | AWS::CloudFormation::Stack            | kubectl-layer-8C2542BC-BF2B-4DFE-B765-E181FD30A9A0 (kubectllayer8C2542BCBF2B4DFEB765E181FD30A9A0617C4ADA) 
  1/53 | 11:11:33 | UPDATE_COMPLETE      | AWS::CloudFormation::Stack            | kubectl-layer-8C2542BC-BF2B-4DFE-B765-E181FD30A9A0 (kubectllayer8C2542BCBF2B4DFEB765E181FD30A9A0617C4ADA) 
  1/53 | 11:11:41 | UPDATE_IN_PROGRESS   | Custom::AWSCDK-EKS-Cluster            | DevEksCluster/DevEksCluster/Resource/Resource/Default (DevEksCluster6F41DD8A) 
  1/53 | 11:11:44 | UPDATE_IN_PROGRESS   | Custom::AWSCDK-EKS-Cluster            | DevEksCluster/DevEksCluster/Resource/Resource/Default (DevEksCluster6F41DD8A) Requested update required the provider to create a new physical resource
  2/53 | 11:11:44 | UPDATE_FAILED        | Custom::AWSCDK-EKS-Cluster            | DevEksCluster/DevEksCluster/Resource/Resource/Default (DevEksCluster6F41DD8A) Failed to update resource. Parameter validation failed:
Unknown parameter in input: "roleArn", must be one of: name, resourcesVpcConfig, logging, clientRequestToken
Unknown parameter in input: "version", must be one of: name, resourcesVpcConfig, logging, clientRequestToken
	new CustomResource (/tmp/jsii-kernel-m0A7Nm/node_modules/@aws-cdk/aws-cloudformation/lib/custom-resource.js:32:25)
	\_ new ClusterResource (/tmp/jsii-kernel-m0A7Nm/node_modules/@aws-cdk/aws-eks/lib/cluster-resource.js:46:26)
	\_ new Cluster (/tmp/jsii-kernel-m0A7Nm/node_modules/@aws-cdk/aws-eks/lib/cluster.js:81:24)
	\_ obj._wrapSandboxCode (/home/lky/Repositories/HuuugeStarsCfConfig/contrib/cloudformation/cdk/eks/cluster/.env/lib/python3.7/site-packages/jsii/_embedded/jsii/jsii-runtime.js:6666:49)

Environment

CLI Version : 1.4.0
Framework Version: 1.4.0
OS : all
Language : all

Other

This is 🐛 Bug Report

The text was updated successfully, but these errors were encountered:

eladb · 2019-10-25T15:50:57Z

The root cause for cluster deletion upon failed updates is that we do not mirror the physical resource ID we received as an input in case of a failure.

Our custom resource naively tried to call the UpdateCluster API for updates, but this is in fact not inline with how AWS::EKS::Cluster is implemented. This change modifies the custom resource handler to handle updates based on the same specification as the official CloudFormation resource: - Changes the cluster name, VPC or role will cause a replacement (creation of a cluster with a new name and removal of the old cluster). - Changes to the version will use the UpdateClusterVersion API to update the version in-place. This fixes #4311. This commit also fixes #4310 which caused cluster deletions when updates failed. The root cause was that when errors were reported to CFN we always used the log stream name as the physical resource ID, and CFN thought we wanted to replace the resource. Oouch. This change was manually tested since we still don't have a good unit test harness for this resource so we manually tested all types of updates and observed that the appropriate behaviour was taken (replacements, in-place).

* fix(eks): cannot update cluster configuration Our custom resource naively tried to call the UpdateCluster API for updates, but this is in fact not inline with how AWS::EKS::Cluster is implemented. This change modifies the custom resource handler to handle updates based on the same specification as the official CloudFormation resource: - Changes the cluster name, VPC or role will cause a replacement (creation of a cluster with a new name and removal of the old cluster). - Changes to the version will use the UpdateClusterVersion API to update the version in-place. This fixes #4311. This commit also fixes #4310 which caused cluster deletions when updates failed. The root cause was that when errors were reported to CFN we always used the log stream name as the physical resource ID, and CFN thought we wanted to replace the resource. Oouch. This change was manually tested since we still don't have a good unit test harness for this resource so we manually tested all types of updates and observed that the appropriate behaviour was taken (replacements, in-place). * handle cluster name updates * add provisional unit tests for the cluster resource (just a test plan) * update expectations

lkoniecz added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Oct 1, 2019

lkoniecz mentioned this issue Oct 1, 2019

[aws-eks] Unable to make updates to cluster #4311

Closed

RomainMuller added the @aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service label Oct 1, 2019

SomayaB assigned eladb Oct 1, 2019

NGL321 added in-progress This issue is being actively worked on. and removed needs-triage This issue or PR still needs to be triaged. labels Oct 4, 2019

eladb added the p0 label Oct 23, 2019

eladb closed this as completed Oct 25, 2019

eladb reopened this Oct 25, 2019

eladb mentioned this issue Oct 26, 2019

fix(eks): cannot update cluster configuration #4696

Merged

eladb added the fix-ready label Oct 26, 2019

mergify bot closed this as completed in #4696 Oct 28, 2019

iliapolo changed the title ~~EKS update failure causes the cluster to be deleted~~ [aws-eks] EKS update failure causes the cluster to be deleted Aug 16, 2020

iliapolo removed the in-progress This issue is being actively worked on. label Aug 16, 2020

github-actions bot assigned iliapolo Aug 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[aws-eks] EKS update failure causes the cluster to be deleted #4310

[aws-eks] EKS update failure causes the cluster to be deleted #4310

lkoniecz commented Oct 1, 2019 •

edited

Loading

eladb commented Oct 25, 2019

[aws-eks] EKS update failure causes the cluster to be deleted #4310

[aws-eks] EKS update failure causes the cluster to be deleted #4310

Comments

lkoniecz commented Oct 1, 2019 • edited Loading

Reproduction Steps

Error Log

Environment

Other

eladb commented Oct 25, 2019

lkoniecz commented Oct 1, 2019 •

edited

Loading