Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider security groups with source security groups when hashing #2376

Merged
merged 13 commits into from
Jun 19, 2015

Conversation

catsby
Copy link
Contributor

@catsby catsby commented Jun 16, 2015

Supersedes #2318
Fixes #2294

Original message:

Previously they would conflict you had multiple security group rules
with the same ingress or egress ports but different source security
groups because only the CIDR blocks were considered (which are empty
when using source security groups).

Updates:

Remediation: If you encounter an error regarding duplicate rules, you will need to identify and manually remove the offending/missing security group rules from the Security Group, and re-plan and apply with Terraform.

Previously they would conflict you had multiple security group rules
with the same ingress or egress ports but different source security
groups because only the CIDR blocks were considered (which are empty
when using source security groups).

Updated to include migrations (from clint@ctshryock.com)

Signed-off-by: Clint Shryock <clint@ctshryock.com>
@catsby
Copy link
Contributor Author

catsby commented Jun 16, 2015

This will clean up #2294. It will migrate people when possible, but there may still exist scenarios (like in 2294) where some manual cleanup will be necessary. The reason being that in the config given there, there are 2 rules created, but without considering the source group id in the hash, the state file will mistakenly have the same source id for both rules (clearly not the correct case).

@catsby
Copy link
Contributor Author

catsby commented Jun 16, 2015

cc @phinze for some review

@catsby
Copy link
Contributor Author

catsby commented Jun 16, 2015

The reason being that in the config given there, there are 2 rules created, but without considering the source group id in the hash, the state file will mistakenly have the same source id for both rules (clearly not the correct case).

I should add this; remotely (on AWS) the rules are created correctly. They're simply stored locally in the state file incorrectly. This PR will migrate most cases ( I believe ) and prevent that in the future.

Users with these conflicts can hit "Duplicate" errors. Manual clean up is needed, unless we swallow that error in the create method for the rule... but that seemed unsafe

return *b[i].GroupName < *b[j].GroupName
}

panic("mismatched security group rules, may be a terraform bug")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can a user get to this panic by specifying a config improperly? Or are we protected by earlier validations from that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we're protected, but I'm honestly not 100% sure. @jszwedko wrote that part, and I meant to ask you @phinze about it. The panic makes me uneasy, but I'm not sure what's the consequences of omitting it are. Is there a more sane "default" fall through behavior?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah let's just return false here, which will just sort last anything that's wonky looking, allowing later code to fail in a likely more meaningful way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd worry that just returning false might mask an error (though theoretically that line should never be hit). If we don't want to panic, I'd suggest at least sorting the bad data in a deterministic way (otherwise you could end up in a semi-permadiff situation).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a user of terraform, I'd personally prefer that it blow up than silently swallow this sort of error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing to keep in mind (a positive thing) is that Terraform will recover panics that pop out of resource calls (as long as they're not from a goroutine within that resource call) and still handle cleanup and partial state.

@zxjinn
Copy link

zxjinn commented Jun 17, 2015

@catsby after running unmodified code I posted in #2294 against this branch, I still get the same error on the second apply 😦

$ ~/code/go/bin/terraform --version
Terraform v0.6.0-dev (640836ee58d9d3e92a75f7c6e27953eb16e2f911)

@zxjinn
Copy link

zxjinn commented Jun 17, 2015

@catsby Sorry false alarm! I forgot to update the symlink to point to ~/code/go/bin/terraform, I was still using the latest release when I ran that. 👏 thank you very much!

* master: (23 commits)
  typo
  Update CHANGELOG.md
  provider/aws: Add docs for autoscaling_policy + cloudwatch_metric_alarm
  provider/aws: Add autoscaling_policy
  provider/aws: Add cloudwatch_metric_alarm
  Update CHANGELOG.md
  Update CHANGELOG.md
  provider/template: don't error when rendering fails in Exists
  Update CHANGELOG.md
  Added Azure SQL server and service support.
  Update CHANGELOG.md
  docs: clarify wording around destroy/apply args
  Getting Started: Added a Next Step upon finishing install.
  docs: add description of archive format to download page
  docs: snapshot plugin dependencies when releasing
  add v0.5.3 transitory deps
  Fixes support for changing just the read / write capacity of a GSI
  Change sleep time for DynamoDB table waits from 3 seconds to 5 seconds
  Remove request for attribute changes
  Fix AWS SDK imports
  ...
* master:
  provider/azure: Fix SQL client name to match upstream
@catsby
Copy link
Contributor Author

catsby commented Jun 19, 2015

Thanks for checking it out @zxjinn , much appreciated. Sorry for the mess 😄

return fmt.Errorf(`[WARN] A duplicate Security Group rule was found. This may be
a side effect of a now-fixed Terraform issue causing two security groups with
identical attributes but different source_security_group_ids to overwrite each
other in the state. See https://github.com/hashicorp/teraform/pull/2376 for more
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/teraform/terraform/ - probably my typo from earlier 😛

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IT'S ALL YOUR FAULT

@phinze
Copy link
Contributor

phinze commented Jun 19, 2015

LGTM

catsby added a commit that referenced this pull request Jun 19, 2015
…ng-with-source-sg

Consider security groups with source security groups when hashing
@catsby catsby merged commit f045e9e into master Jun 19, 2015
@catsby catsby deleted the fix-security-group-rule-hashing-with-source-sg branch June 19, 2015 16:50
@jszwedko
Copy link
Contributor

👏 thanks @catsby !

@SamClinckspoor
Copy link
Contributor

@jszwedko wrong mention 😉
@samprakos

@jszwedko
Copy link
Contributor

Oops! @samprakos ^^

@samprakos
Copy link

I don't believe I'd be able to reproduce it on demand...I fixed the 18 errors by manually deleting the rules from aws and re-running Terraform plan/apply. It's almost as though the security group ids in aws all of a sudden didn't align with the ones in the tfstate file.

@jszwedko
Copy link
Contributor

@samprakos that is probably different from this issue then.

@catsby
Copy link
Contributor Author

catsby commented Aug 20, 2015

I recently opened #3019 to do further refactoring of Security Group rules. If you're interested in trying it out, I would recommend a test setup, as there's a schema migration and I don't know how easy that would be to roll back.

@dgarlitt
Copy link

dgarlitt commented Oct 6, 2016

Is there a work-around for this issue in the meantime? I get the "duplicate Security Group rule" error when trying to apply the following resources:

resource "aws_security_group_rule" "all-master-to-master" {
  type = "ingress"
  security_group_id = "${aws_security_group.masters-k8s.id}"
  self = true
  from_port = 0
  to_port = 0
  protocol = "-1"
}

resource "aws_security_group_rule" "all-master-to-node" {
  type = "ingress"
  security_group_id = "${aws_security_group.nodes-k8s.id}"
  source_security_group_id = "${aws_security_group.masters-k8s.id}"
  from_port = 0
  to_port = 0
  protocol = "-1"
}

@gtmtech
Copy link

gtmtech commented Nov 1, 2016

Still getting this on terraform 0.7.5 >.<

Configuration was built and first deployed using terraform 0.7.5 too, so no legacy stuff - something is clearly still broken

@gtmtech
Copy link

gtmtech commented Nov 1, 2016

Ah I tracked it down - in my case I was rewiring two aws_subnets, subnet A, and subnet B, and swapping their CIDR blocks. On some terraform apply's it would say it couldnt create the subnet because the CIDR already existed (one bug - terraform isnt intelligent enough to handle this situation) - on another run it complained:

* aws_security_group_rule.egress8080loadbalancer: [WARN] A duplicate Security Group rule was found on (sg-9d5c8ffb). This may be
a side effect of a now-fixed Terraform issue causing two security groups with
identical attributes but different source_security_group_ids to overwrite each
other in the state. See https://github.com/hashicorp/terraform/pull/2376 for more
information and instructions for recovery. Error message: the specified rule "peer: 10.128.10.0/24, TCP, from port: 8080, to port: 8080, ALLOW" already exists
* aws_security_group_rule.ingress8080loadbalancer: [WARN] A duplicate Security Group rule was found on (sg-67439001). This may be
a side effect of a now-fixed Terraform issue causing two security groups with
identical attributes but different source_security_group_ids to overwrite each
other in the state. See https://github.com/hashicorp/terraform/pull/2376 for more
information and instructions for recovery. Error message: the specified rule "peer: 10.128.10.0/24, TCP, from port: 8080, to port: 8080, ALLOW" already exists

Which would be the case because the subnets are swapping around and order is important.

In such cases like these the only appropriate thing would be to delete all subnets first, and all SG first and then recreate. Its more advanced logic than terraform currently has, but this would cater for this suite of problems

@shailendersingh2
Copy link

I am facing same error where i already have SG with specific CIDR block but when my terrafrom aplly command gives same duplicate error and quits during terraform apply. I can't delete existing rule because it was already present and PRD data communication is happening and i need functionality to just ignore if it is already present and i used lifecycle {ignore_changed = "subnetname"} that not helped me so why it can't just ignore if security group CIDR already present.

@genevieve
Copy link

Is there a way to migrate from inline security group rules to aws_security_group_rule resources? We want the same rules but external from the security group so that we can conditionally create new rules for a particular source security group.

@korfuri
Copy link
Contributor

korfuri commented Jun 28, 2018

@genevieve better late than never, but: you can migrate, kind of.

  • Moving each rule out of the inline definition into its own aws_security_group_rule resource will create a plan that wants to destroy and re-create all rules.
  • Instead of applying that plan, you can use terraform state mv to move each resource from its old path (aws_security_group.mygroup.aws_security_group_rule.mygroup.0) to its new path (aws_security_group_rule.myrule1). You may need to use terraform state show to find out what "old path" corresponds to what "new path".

It's quite tedious if you have a lot of rules. At least it's easy to tell if you're done: you can plan as many times as you want while moving stuff around, and when the plan is empty, you know you've got it right. Also terraform state mv won't let you overwrite a resource by moving another resource on top of it, so you're relatively safe when using it.

@maliaoreta
Copy link

Hi @korfuri!

I'm currently attempting to use terraform state mv when refactoring inline resource types into separate resources/modules.
However I've been unable to find the valid source address syntax for these inline resources such as aws_security_group_rule or aws_network_interface_attachment.

For example, when using terraform state show on the aws_network_interface resource to get the attachment address, it looks something like this:

attachment.#                       = 1
attachment.xxxxxxxxx.attachment_id = eni-attach-xxxxxxxx
attachment.xxxxxxxxx.device_index  = 0
attachment.xxxxxxxxx.instance      = i-xxxxxxxxxxxxxx

Sorry if I'm just missing something obvious, but where can I find the correct addresses for these inline configs?
This would be really helpful since I'm currently importing a bunch of existing AWS resources into Terraform management, and all the resources seem to be returned with inline configurations.

Thanks in advance!

@korfuri
Copy link
Contributor

korfuri commented Jul 18, 2018

I think I was confused in my response above, and that's not possible.
You may be able to do something like: (1) back up your state (2) Write down the security group's ID and terraform state rm it (3) change your config to use non-inline rules (4) terraform import your security group again. The rules will be imported as their own resources, with arbitrary paths. (5) terraform state mv each rule into its proper location.

@maliaoreta
Copy link

YES that worked!!!
Thank you so much!!! Especially for such a quick response!

Have a good one!

@alkalinecoffee
Copy link

alkalinecoffee commented Aug 22, 2018

We're seeing this issue (Terraform v0.11.7, provider.aws v1.29.0) when modifying a security group rule with new IP ranges. The security group config has no in-line rules--all rules are defined in aws_security_group_rule resources.

We've commented out all of our aws_security_group_rule resources, applied (so the security group has no rules attached), removed the security group from state, re-imported, uncommented and re-applied all of the rules. This works fine, until we need to add a new rule, which results in the duplicate error.

Steps to reproduce (let's call the security group infra)...

  1. Clear out existing rules from security group and state:
# Comment out rules in config
tf apply (security group now has no rules)
tf state list
tf state show sg-123456
  1. Remove the SG from state and re-add:
tf state rm module.ci.aws_security_group.infra
tf import tf module.ci.aws_security_group.infra sg-123456
  1. Inspect state of resource to ensure rules are empty
tf state show module.ci.aws_security_group.infra
  1. Re-add security groups removed from Step 1
# Uncomment security group rules in config
tf apply
  1. Add a new IP range to one of the rules and tf apply ... which results in failure

This issue should be revisited as 1) it's still broken and 2) the URL provided in the error message points the user to this thread for a resolution, for which one does not clearly exist.

@alkalinecoffee
Copy link

Following up on the message above...

We had the following lifecycle in our security group rule configs:

lifecycle {
  create_before_destroy = true
}

It seems obvious now, but removing this lifecycle config avoids the duplicate warning.

We initially had this lifecycle policy in place as we ran into an issue where a bad terraform apply left our security groups in a bad state which resulted in blocked traffic. But it seems that overall, having this policy in place is inherently incompatible with the strict duplication rules applied to security group rules.

I'm not sure what the resolution is here, or if this issue belongs in this thread or a new one should be opened. We want to continue using the lifecycle for the reason above, but it also prevents us from easily managing our SG rules.

@imdhruva
Copy link

similar issue as the one posted by @alkalinecoffee ; we have configured "aws_security_groups" and "aws_security_group_rule" to work for external rule declaration; however on every update on the cider block we have a new security-group being spun up - one of the w/a opined was to write a lifecycle hook create_before_destroy and that won't work because the ingress/egress rules would remain the same.

@ndp-stc
Copy link

ndp-stc commented Dec 14, 2018

Using terraform 0.11.8 and 0.11.10, both give me this error. I am not using create_before_destroy. All rules are separate (not inline) resources.

This seems to occur for me every time I use TF to push a change to any security group rule.

I also attempted to use timestamp() in rule descriptions; hoping that this would force attributes to be unique for each application of each rule - but it still reports this error. Which means that the description attribute is NOT part of the hash.

I tried deleting all security group rules, (both manually, and using terraform apply) - and that did not fix this problem.

I tried using terraform state rm for the security group rule resources in state - but that didn't fix the problem, because terraform will no longer update those rules, so the state will never get refreshed with the rule state. terraform state list | grep aws_security_group_rule returns nothing after a re-apply.

The text of the error message makes no sense, because when it's applying to a "source security group id" - it then says that the duplicate rule is for a cidr range, not a source id. (you can not specify a source security group id, when you're specifying a cidr range).

The only eventual workaround was to delete all of the existing rules by hand in aws console, then re-apply the new terraform rules. (though I suppose this could probably be automated with aws cli).

edit: workaround with terraform taint:
terraform state list | grep "aws_security_group_rule" > rules.txt ; while IFS='' read -r line || [[ -n "$line" ]] ; do l="${line//[/.}" ; l2="${l//]}" ; arrLINE=(${l2//./ }) ; terraform taint -module ${arrLINE[1]} ${arrLINE[2]}.${arrLINE[3]}.${arrLINE[4]} ; done < rules.txt

. . . actually took a while to figure out because the taint syntax is different than all other terraform syntax.

For some reason, ONE of my rules can't be found by taint; so I continue to get one "duplicate security group errors".

@neechbear
Copy link

@ndp-stc's suggestion above seemed to be a viable workaround for me, but I needed to tweak the method very slightly. Maybe the taint syntax changed slightly or my nested modules complicate issues, .. not sure?

This seemed to give me all the terraform commands I needed to fix my problem:

while read -r addr
do
  if [[ "$addr" == "module."** ]]
  then
    module="${addr%.*.*}"
    addr="${addr#$module.}"
    echo terraform taint -module="${module//module./}" "$addr"
  else
    echo terraform taint "$addr"
  fi
done < <(terraform state list | grep "aws_security_group_rule")

@ghost
Copy link

ghost commented Jul 24, 2019

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@ghost ghost locked and limited conversation to collaborators Jul 24, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AWS Provider in 0.5.3- Can not tie multiple aws_security_group_rule to one security group