[Bug]: AWS RDS global cluster very long-delay in minor version upgrade #36107
Labels
bug
Addresses a defect in current functionality.
service/rds
Issues and PRs that pertain to the rds service.
service/vpc
Issues and PRs that pertain to the vpc service.
Milestone
Originally posted by @catcharbind in #30358 (comment)
Terraform Core Version
1.7.4
AWS Provider Version
5.39.1
Affected Resource(s)
aws_rds_global_cluster
Expected Behavior
The
aws_rds_global_cluster
will perform minor upgrade without error and without delay.Actual Behavior
An attempt to perform a minor version update of a global RDS cluster DB results in the error being repeated for 90 minutes, until timeout.
Relevant Error/Panic Output Snippet
Terraform Configuration Files
Steps to Reproduce
Debug Output
Panic Output
No response
Important Factoids
This was introduced in #30996 when the logic was flipped for retry behavior for errors. A specific AWS error is used to determine when to do a minor version upgrade vs. a major. As of #30996, the logic will keep retrying a major upgrade when the error indicates a minor version is required. This is why there is such a long delay. Then, at the very end, it tries the minor upgrade, which succeeds.
The workaround mentioned above also doesn't work for me. Upgraded the secondary cluster using AWS console. Then applied Terraform. But its just stuck in "Still modifying.." phase with no action on the AWS resource.
I also tried this on the latest AWS provider version 5.30.0. But seeing the same issue.
Update 12/16/2023:
Now the Minor version upgrade was successfully completed using Terraform. But it took a very long time and was stuck in the modifying the global cluster node for 1hr and 38 minutes! The upgrade should have just attempted to upgrade the secondary cluster first and then the primary cluster. The log below shows it modified the primary first followed by the secondary. But in AWS console, I see that the secondary cluster was upgraded first and then the primary. Otherwise the Aurora global database minor version upgrade wont work!
References
global_cluster_identifier
property on a global cluster #30996The text was updated successfully, but these errors were encountered: