Terraform plan fails while AWS Elasticache Redis cluster is scaling out #18116

fromz · 2021-03-16T03:55:14Z

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform CLI and Terraform AWS Provider Version

Terraform v0.14.5

provider registry.terraform.io/hashicorp/aws v3.31.0
provider registry.terraform.io/hashicorp/local v2.1.0
provider registry.terraform.io/hashicorp/null v3.1.0

Affected Resource(s)

aws_elasticache_replication_group

Terraform Configuration Files

Please include all Terraform configurations required to reproduce the bug. Bug reports without a functional reproduction may be closed without investigation.

resource "aws_elasticache_replication_group" "this" {
  count = 1

  at_rest_encryption_enabled    = true
  multi_az_enabled              = true
  automatic_failover_enabled    = true
  replication_group_id          = "users-cache"
  replication_group_description = "Users Redis cache"
  node_type                     = "cache.t3.medium"
  parameter_group_name          = "default.redis6.x.cluster.on"
  port                          = 6379

  cluster_mode {
    num_node_groups         = 1 # Number of initial shards
    replicas_per_node_group = 1 # Number of initial replicas within each shard
  }

  apply_immediately = true

  lifecycle {
    ignore_changes = [
      # Scaling the instances in AWS will change cluster_mode.num_node_groups, custer_mode.replicas_per_node_group;
      # disregard drift from initial configuration.
      cluster_mode,
    ]
  }
}

Debug Output

Running terraform cloud which doesn't allow running debug, but I get:

Error: error listing tags for resource (arn:aws:elasticache:ap-southeast-2::cluster:users-cache-0001-001): CacheClusterNotFound: users-cache-0001-001 is either not present or not available.
        status code: 404, request id: b6cfcff3-dfa7-41cf-b099-0eb0c9767990

Expected Behavior

When cluster status is not 'available', e.g. due to adding shards, terraform plan/apply should work without error.

Actual Behavior

Whenever cluster is not available due to online resizing, terraform plan/apply fail.

Steps to Reproduce

terraform apply
wait for operation to complete
log into AWS UI
find generated Elasticache cluster
scale out the cluster (e.g. click "Add shard")
note that the cluster goes into 'modifying' state
run terraform plan
observe failure

Important Factoids

References

#0000

The text was updated successfully, but these errors were encountered:

gdavison · 2021-03-27T05:24:26Z

Implementation note: Based on the error message, this is likely related to how the resource manages tags on the individual cluster nodes. The attempt to read tags on the node has failed because the node has been removed or (possibly) is scaling.

ktham · 2021-04-16T20:28:41Z

Is it advisable to catch this error and proceed while skipping any changes based on tags during Elasticache scale up operations? Otherwise, we are effectively DOS-ed from running Terraform for hours (or however long the scale up takes) 😢

rawrgulmuffins · 2021-06-04T16:56:24Z

I've hit this a few times recently. I would also be interested in the answer to the catch question above.

github-actions · 2021-10-08T00:59:07Z

This functionality has been released in v3.62.0 of the Terraform AWS Provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

For further feature requests or bug reports with this functionality, please create a new GitHub issue following the template. Thank you!

ktham · 2021-10-21T16:14:29Z

Hi @gdavison , this issue is still not fixed. Terraform plans continue to fail during the "list tags" operation when the Elasticache cluster is not "available" due to some cluster operation. I'm hoping that perhaps when Elasticache is in the middle of the operation, we can skip refreshing the tag state perhaps.

 Error: error listing tags for resource (arn:aws:elasticache:us-east-1:xxx:xxx): timeout while waiting for state to become 'available' (last state: 'snapshotting', timeout: 40m0s)

ktham · 2021-10-21T16:34:30Z

From https://docs.aws.amazon.com/cli/latest/reference/elasticache/list-tags-for-resource.html

If the cluster is not in the available state, ListTagsForResource returns an error.

The AWS provider ideally should be able to handle this situation gracefully during the plan stage, so that TF plans can continue to run even when Elasticache Redis is undergoing routine nightly snapshotting, or when Elasticache is scaling up.

@gdavison - I would propose re-opening this ticket as I think #21185 does not address this. (cc @ewbankkit who reviewed the PR)

jeffery-jen · 2021-12-23T09:28:30Z

Please reopen this issue as this is creating a problem for any elasticache provisioned with TF and happened to run into snapshot state

okelitse · 2022-04-07T21:16:12Z

Hi.

I am facing a similar issue, is there a fix for this.

Error: error listing tags for ElastiCache Cluster (cache_instance_name_here-dev): CacheClusterNotFound: cache_instance_name_here-dev is either not present or not available.

Thanks?

github-actions · 2022-05-08T02:31:54Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

ghost added the service/elasticache Issues and PRs that pertain to the elasticache service. label Mar 16, 2021

github-actions bot added the needs-triage Waiting for first response or review from a maintainer. label Mar 16, 2021

bill-rich added bug Addresses a defect in current functionality. and removed needs-triage Waiting for first response or review from a maintainer. labels Mar 17, 2021

gdavison mentioned this issue Mar 29, 2021

resource/aws_elasticache_replication_group: new nodes not tagged when scaling up #18449

Closed

gdavison self-assigned this Oct 5, 2021

gdavison mentioned this issue Oct 6, 2021

resource/aws_elasticache_replication_group: Fix updating tags #21185

Merged

gdavison closed this as completed in #21185 Oct 7, 2021

github-actions bot added this to the v3.62.0 milestone Oct 7, 2021

gdavison removed their assignment Oct 7, 2021

github-actions bot locked as resolved and limited conversation to collaborators May 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Terraform plan fails while AWS Elasticache Redis cluster is scaling out #18116

Terraform plan fails while AWS Elasticache Redis cluster is scaling out #18116

fromz commented Mar 16, 2021

gdavison commented Mar 27, 2021

ktham commented Apr 16, 2021

rawrgulmuffins commented Jun 4, 2021

github-actions bot commented Oct 8, 2021

ktham commented Oct 21, 2021 •

edited

Loading

ktham commented Oct 21, 2021 •

edited

Loading

jeffery-jen commented Dec 23, 2021

okelitse commented Apr 7, 2022

github-actions bot commented May 8, 2022

Terraform plan fails while AWS Elasticache Redis cluster is scaling out #18116

Terraform plan fails while AWS Elasticache Redis cluster is scaling out #18116

Comments

fromz commented Mar 16, 2021

Community Note

Terraform CLI and Terraform AWS Provider Version

Affected Resource(s)

Terraform Configuration Files

Debug Output

Expected Behavior

Actual Behavior

Steps to Reproduce

Important Factoids

References

gdavison commented Mar 27, 2021

ktham commented Apr 16, 2021

rawrgulmuffins commented Jun 4, 2021

github-actions bot commented Oct 8, 2021

ktham commented Oct 21, 2021 • edited Loading

ktham commented Oct 21, 2021 • edited Loading

jeffery-jen commented Dec 23, 2021

okelitse commented Apr 7, 2022

github-actions bot commented May 8, 2022

ktham commented Oct 21, 2021 •

edited

Loading

ktham commented Oct 21, 2021 •

edited

Loading