-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
resource with create_before_destroy is both created and destroyed before updating dependent resource when dependent resource is in a parent module #17735
Comments
Hi @samjgalbraith, Thanks for the complete configuration example to replicate the issue here! If you run this and watch the logs, you can see that the order is correct, and the new This is an eventual consistency issue with the IAM role itself, and a workaround to help with the issue has already been merged into the aws provider. Thanks! |
Thanks for looking at that @jbardin It does seem like you're right about the order when looking at the logs. One thing that still bothers me about this is that it definitely is destroying the old role even though the update fails. Surely the destruction of the dependency should only happen upon successful updating of the references to it. |
I'd have to think about if it would be possible to move the destroy node for the deposed resource later in the graph without causing cycles in all cases. The problem however would be that once one of the old dependencies fail and terraform errors out, it no longer references the old destroy node, and there would be no way to remove it in the same order in a subsequent run, so the benefit is minimal. Also, removing it later on wouldn't not change the result here at all, because it's not the removal of the old resource that is causing the failure, it's the fact that the new one hasn't fully propagated across the remote API endpoint. |
You're right that the IAM race condition is what's causing the Terraform apply failure here. That's not the problem that I care about in this issue. It probably got lost in the wall of text but I mentioned it above the terraform output block. I understand the main point of create_or_destroy is to transform infrastructure such that at every step, your infrastructure is always valid and available. I regret using this example because that's a bit of a red herring. It's just an example of a failure which errors the apply partway through, revealing that the plan created allows for the infrastructure to be broken - and I mean the infrastructure is actually broken such that the Lambda function won't execute anymore until you intervene and re-apply, not just that the plan failed to complete. This is a problem bigger than the IAM race condition. The plan is bad. I've just outputted the graphs for the plan and it's definitely consistent with my complaint: When the role is created in the same module as the Lambda, exactly as I expect for zero-downtime (even if a step fails) the role destruction waits for successful Lambda update - there's a single branch through the plan graph that goes role create| lambda update | role destroy and no failure in any step leaves the infrastructure in a broken state. Even though the apply still fails in this case because of the IAM race condition, crucially, the infrastructure is never in a broken state at any point. When the role is declared in a child module and its ARN given as a module output, destroying the old role does not block on successful update of the Lambda - destroy role is in a plan branch parallel to the lambda update. There's something about being in a different module that's causing the plan graph to be different. Everything in one module - Works as expected with role destroy waiting for successful lambda updateprovider "aws" {
region = "ap-southeast-2"
}
data "archive_file" "lambda_content" {
source_file = "${path.module}/my_test_function.py"
output_path = "${path.module}/my_test_function.zip"
type = "zip"
}
resource "aws_lambda_function" "my_test_function" {
function_name = "my_test_function"
handler = "my_test_function.lambda_handler"
role = "${aws_iam_role.role.arn}"
runtime = "python2.7"
filename = "${data.archive_file.lambda_content.output_path}"
source_code_hash = "${data.archive_file.lambda_content.output_base64sha256}"
publish = true
}
data "aws_iam_policy_document" "trust_relationship_policy" {
statement {
actions = ["sts:AssumeRole"]
principals {
type = "Service"
identifiers = ["lambda.amazonaws.com"]
}
}
}
resource "aws_iam_role" "role" {
assume_role_policy = "${data.aws_iam_policy_document.trust_relationship_policy.json}"
name = "test1"
lifecycle {
create_before_destroy = true
}
} Plan for name change of role - Labels modified for readability
Using the original HCL at the top of this thread, role name change gives this plan, which does NOT make destruction of the role wait for successful Lambda modification:
|
Thanks for the excellent analysis! I see exactly what you mean now, and since in most cases the effect is the same, it's gone unnoticed for quite a while. Just recording a minimal example here for reference:
After the following commands, there should still be a deposed
Since the CBD graph is not taking into account transitive dependencies through outputs and locals, the dependency inversion that CBD creates isn't extended up through modules. |
@jbardin, any thoughts on this? Maybe something that'll be fixed in 0.12? This is the root cause of hashicorp/terraform-provider-google#1448. Happy to provide debug logs and the like, though it seems like the problem is pretty well-defined at this point. cc @brandoconnor since we were debugging this earlier today :) |
Hi @danawillow, Do I have thoughts! |
Are there any plans to fix this? Looks like it still exists in 0.12 |
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further. |
Consider the below configuration. When running Terraform apply on a variant of this configuration that requires the IAM role to be replaced (the name will suffice), it performs IAM role create, IAM role destroy, then Lambda update. However, if the below configuration is changed so that the IAM role is declared within the same module as the Lambda function, the actions carried out are IAM role creation, Lambda function update, IAM role deletion, as expected. This leaves a period in which the role reference from Lambda to IAM is dangling. This is made more obvious by the race conditions in IAM that can lead to the Lambda update failing, only to succeed later on retry. In the interventing period, the Lambda is defunct as it still points to a role that has already been destroyed.
Terraform Version
Terraform v0.11.5
Terraform Configuration Files
/main.tf
/role_module/main.tf
Terraform Apply Output - For role name change "test2" -> "test1"
Please note that the IAM race condition which causes the failure of the Lambda update is not the subject of this issue (I've raise it in #3972). It's only an example of an intermediate failure that halts the infrastructure modification at the point of dangling dependency.
Crash Output
Expected Behavior
Using create_before_destroy = true on a dependency of a resource causes a new copy of the dependency to be created before the old one is destroyed. It's expected that the dependent resource's reference to it is updated between re-creation and destruction of the dependency.
Actual Behavior
The new dependency is created and the old one destroyed, both before the dependent resource's reference to it is updated.
Steps to Reproduce
The text was updated successfully, but these errors were encountered: