Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make deploy fails when ES domain is absent #1998

Closed
hannes-ucsc opened this issue Jul 10, 2020 · 2 comments
Closed

make deploy fails when ES domain is absent #1998

hannes-ucsc opened this issue Jul 10, 2020 · 2 comments
Assignees
Labels
bug [type] A defect preventing use of the system as specified code [subject] Production code demoed [process] Successfully demonstrated to team orange [process] Done by the Azul team

Comments

@hannes-ucsc
Copy link
Member

hannes-ucsc commented Jul 10, 2020

Probably a regression from #1704

Manually deleted the ES domain and kicked of a build on Gitlab.

https://gitlab.dev.singlecell.gi.ucsc.edu/ucsc/azul/-/jobs/4218

fail with

Error: Invalid index
  on modules.tf.json line 8, in module.chalice_indexer.es_endpoint:
   8:                 "${aws_elasticsearch_domain.elasticsearch[0].endpoint}",
    |----------------
    | aws_elasticsearch_domain.elasticsearch is empty tuple
The given key does not identify an element in this collection value.
Error: Invalid index
  on modules.tf.json line 11, in module.chalice_indexer:
  11:             "es_instance_count": "${aws_elasticsearch_domain.elasticsearch[0].cluster_config[0].instance_count}"
    |----------------
    | aws_elasticsearch_domain.elasticsearch is empty tuple
The given key does not identify an element in this collection value.
Error: Invalid index
  on modules.tf.json line 18, in module.chalice_service.es_endpoint:
  18:                 "${aws_elasticsearch_domain.elasticsearch[0].endpoint}",
    |----------------
    | aws_elasticsearch_domain.elasticsearch is empty tuple
The given key does not identify an element in this collection value.
Error: Invalid index
  on modules.tf.json line 21, in module.chalice_service:
  21:             "es_instance_count": "${aws_elasticsearch_domain.elasticsearch[0].cluster_config[0].instance_count}"
    |----------------
    | aws_elasticsearch_domain.elasticsearch is empty tuple

Retrying the job did not fix it:

https://gitlab.dev.singlecell.gi.ucsc.edu/ucsc/azul/-/jobs/4225

I had to run

cd terraform
make config
terraform apply -target  aws_elasticsearch_domain.elasticsearch

locally to create the domain again.

Step one here is to determine if this only occurs after a manual deletion of the ES domain or if this also happens in new deployments or after Terraform deletes the domain.

@github-actions github-actions bot added the orange [process] Done by the Azul team label Jul 10, 2020
@hannes-ucsc
Copy link
Member Author

After the above workaround, a retry of the deploy job succeeds:

https://gitlab.dev.singlecell.gi.ucsc.edu/ucsc/azul/-/jobs/4226

@theathorn theathorn added bug [type] A defect preventing use of the system as specified infra [subject] Project infrastructure like CI/CD, build and deployment scripts code [subject] Production code and removed infra [subject] Project infrastructure like CI/CD, build and deployment scripts labels Jul 10, 2020
@achave11-ucsc
Copy link
Member

Reproducibility:
This issues is only reproducible when a user modifies terraformed managed resources and configurations not using Terraform (via the AWS console or CLI). This Terraform bug is not reproducible using a 0.12.10 Terraform version and perhaps older versions too. Currently version 0.12.24 and up to the most recent 0.12.29 have this bug, it has not been fixed yet. A Terraform issue having a similar problem to what Azul encountered is currently open.

Workaround/Solution:
If resources originally managed by Terraform are modified or configured not using Terraform, it will cause its remote state file to fall out of sync. For deleting an Elasticsearch instance domain, the remote state file must also be updated, currently a bug in Terraform prevents it from happening automatically. Running terraform state rm aws_elasticsearch_domain.elasticsearch removes the manually deleted instance from the Terraform state file. Then, by running make deploy it allows Terraform to update and configure any new resources.

Notes:
When attempting to reproduce by deleting the resource through Terraform, Terraform failed to delete only the targeted resource. It instead attempted to delete the targeted resource and all resources referencing it. This demonstrates that it would be difficult for Terraform to be able to get into a state where the failure described is observed.
Similarly, when attempting to reproduce by referencing the deleted resource variables outside of modules (dumping the instance count variable into an S3 bucket) the bug persisted. Indicating that it is not a bug having to do with modules.

@theathorn theathorn added the demoed [process] Successfully demonstrated to team label Sep 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug [type] A defect preventing use of the system as specified code [subject] Production code demoed [process] Successfully demonstrated to team orange [process] Done by the Azul team
Projects
None yet
Development

No branches or pull requests

3 participants