`make deploy` fails when ES domain is absent #1998

hannes-ucsc · 2020-07-10T03:14:26Z

Probably a regression from #1704

Manually deleted the ES domain and kicked of a build on Gitlab.

https://gitlab.dev.singlecell.gi.ucsc.edu/ucsc/azul/-/jobs/4218

fail with

Error: Invalid index
  on modules.tf.json line 8, in module.chalice_indexer.es_endpoint:
   8:                 "${aws_elasticsearch_domain.elasticsearch[0].endpoint}",
    |----------------
    | aws_elasticsearch_domain.elasticsearch is empty tuple
The given key does not identify an element in this collection value.
Error: Invalid index
  on modules.tf.json line 11, in module.chalice_indexer:
  11:             "es_instance_count": "${aws_elasticsearch_domain.elasticsearch[0].cluster_config[0].instance_count}"
    |----------------
    | aws_elasticsearch_domain.elasticsearch is empty tuple
The given key does not identify an element in this collection value.
Error: Invalid index
  on modules.tf.json line 18, in module.chalice_service.es_endpoint:
  18:                 "${aws_elasticsearch_domain.elasticsearch[0].endpoint}",
    |----------------
    | aws_elasticsearch_domain.elasticsearch is empty tuple
The given key does not identify an element in this collection value.
Error: Invalid index
  on modules.tf.json line 21, in module.chalice_service:
  21:             "es_instance_count": "${aws_elasticsearch_domain.elasticsearch[0].cluster_config[0].instance_count}"
    |----------------
    | aws_elasticsearch_domain.elasticsearch is empty tuple

Retrying the job did not fix it:

https://gitlab.dev.singlecell.gi.ucsc.edu/ucsc/azul/-/jobs/4225

I had to run

cd terraform
make config
terraform apply -target  aws_elasticsearch_domain.elasticsearch

locally to create the domain again.

Step one here is to determine if this only occurs after a manual deletion of the ES domain or if this also happens in new deployments or after Terraform deletes the domain.

The text was updated successfully, but these errors were encountered:

hannes-ucsc · 2020-07-10T03:33:48Z

After the above workaround, a retry of the deploy job succeeds:

https://gitlab.dev.singlecell.gi.ucsc.edu/ucsc/azul/-/jobs/4226

achave11-ucsc · 2020-08-11T17:56:15Z

Reproducibility:
This issues is only reproducible when a user modifies terraformed managed resources and configurations not using Terraform (via the AWS console or CLI). This Terraform bug is not reproducible using a 0.12.10 Terraform version and perhaps older versions too. Currently version 0.12.24 and up to the most recent 0.12.29 have this bug, it has not been fixed yet. A Terraform issue having a similar problem to what Azul encountered is currently open.

Workaround/Solution:
If resources originally managed by Terraform are modified or configured not using Terraform, it will cause its remote state file to fall out of sync. For deleting an Elasticsearch instance domain, the remote state file must also be updated, currently a bug in Terraform prevents it from happening automatically. Running terraform state rm aws_elasticsearch_domain.elasticsearch removes the manually deleted instance from the Terraform state file. Then, by running make deploy it allows Terraform to update and configure any new resources.

Notes:
When attempting to reproduce by deleting the resource through Terraform, Terraform failed to delete only the targeted resource. It instead attempted to delete the targeted resource and all resources referencing it. This demonstrates that it would be difficult for Terraform to be able to get into a state where the failure described is observed.
Similarly, when attempting to reproduce by referencing the deleted resource variables outside of modules (dumping the instance count variable into an S3 bucket) the bug persisted. Indicating that it is not a bug having to do with modules.

…1998)

…1998, PR #2141)

github-actions bot added the orange [process] Done by the Azul team label Jul 10, 2020

theathorn added bug [type] A defect preventing use of the system as specified infra [subject] Project infrastructure like CI/CD, build and deployment scripts code [subject] Production code and removed infra [subject] Project infrastructure like CI/CD, build and deployment scripts labels Jul 10, 2020

theathorn assigned achave11-ucsc Jul 10, 2020

hannes-ucsc pushed a commit that referenced this issue Sep 4, 2020

Work around make deploy failing after manually deleting ES domain (#…

df5120f

…1998)

hannes-ucsc added a commit that referenced this issue Sep 4, 2020

Work around make deploy failing after manually deleting ES domain (#…

1beb517

…1998, PR #2141)

theathorn added the demoed [process] Successfully demonstrated to team label Sep 8, 2020

theathorn closed this as completed Sep 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`make deploy` fails when ES domain is absent #1998

`make deploy` fails when ES domain is absent #1998

hannes-ucsc commented Jul 10, 2020 •

edited

Loading

hannes-ucsc commented Jul 10, 2020

achave11-ucsc commented Aug 11, 2020

make deploy fails when ES domain is absent #1998

make deploy fails when ES domain is absent #1998

Comments

hannes-ucsc commented Jul 10, 2020 • edited Loading

hannes-ucsc commented Jul 10, 2020

achave11-ucsc commented Aug 11, 2020

`make deploy` fails when ES domain is absent #1998

`make deploy` fails when ES domain is absent #1998

hannes-ucsc commented Jul 10, 2020 •

edited

Loading