Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow filtering out IP exhausted Subnets during subnet selection [ec2, rds, VpcLookup] - SubnetSelection #8310

Closed
1 of 2 tasks
gradybarrett opened this issue Jun 1, 2020 · 9 comments
Assignees
Labels
@aws-cdk/aws-ec2 Related to Amazon Elastic Compute Cloud feature-request A feature should be added or improved. in-progress This issue is being actively worked on. needs-triage This issue or PR still needs to be triaged.

Comments

@gradybarrett
Copy link

gradybarrett commented Jun 1, 2020

Expose a SubnetSelection option that allows filtering out Subnets with 0 available IPs (IP "exhausted" subnets).

Use Case

I work in a massive organization where we all share a set of accounts - managed by central resources and deployed with aging Terraform configs. We've reached the point where one of 3 of our private subnets has run out of IPs. The other two subnets still have quite a bit of IP space left.

It would be nice to be able to filter out IP-unavailable Subnets when creating resources to avoid failed deploys to those Subnets. For example, creating a DBCluster, I want to avoid including unavailable Subnets when the DbSubnetGroup is created.

Proposed Solution

I have implemented this change by exposing the availableIpAddressCount Subnet property source from the context provider and exposed a filterOutIpExhausted SubnetSelection option - defaulted to false. Exhausted subnets are still pulled back as part of the vpc lookup. This just allows consumers to filter them out during subnet selection.

I'm a CDK codebase noob so there are cases I'm not sure about, particularly regarding context management (e.g. stuff like #6929). If an existing (deployed) resource is ignoring IP exhausted Subnets, and one of the subnets it's referencing runs out of IPs, then what happens the next time the resource is deployed? If the context is version controlled, then will the subnet still show as available?

Edit: After re-reading the CDK context/environment docs and Rico's super helpful post in #6929, the questions I had above don't seem concerning anymore, provided people are managing context according to the supplied guidelines. Playing around with my own stack, which is deployed to 3 different accounts has also assuaged some fears I had. Now I'm just dying to know what I've missed and what the experts think. 😄

Other

The PR was really just a fun exercise to start getting familiar with CDK code. If it's useful, great. If not, no sweat.

I'm also aware of #5927 + #7720, but I don't see any major conflicts between the changes. There might however be impacts depending on what sort of other rework is under consideration.

Cheers!

  • 👋 I may be able to implement this feature request
  • ⚠️ This feature might incur a breaking change

This is a 🚀 Feature Request

@gradybarrett gradybarrett added feature-request A feature should be added or improved. needs-triage This issue or PR still needs to be triaged. labels Jun 1, 2020
@gradybarrett gradybarrett changed the title Allow filtering out IP exhausted Subnets during subnet selection Allow filtering out IP exhausted Subnets during subnet selection [ec2, rds, VpcLookup] Jun 1, 2020
@gradybarrett gradybarrett changed the title Allow filtering out IP exhausted Subnets during subnet selection [ec2, rds, VpcLookup] Allow filtering out IP exhausted Subnets during subnet selection [ec2, rds, VpcLookup] - SubnetSelection Jun 1, 2020
@SomayaB SomayaB added @aws-cdk/aws-ec2 Related to Amazon Elastic Compute Cloud in-progress This issue is being actively worked on. labels Jun 2, 2020
@gradybarrett
Copy link
Author

Added an Edit addendum to the description above after re-reading some docs and playing around a bit more with CDK context.

@rix0rrr
Copy link
Contributor

rix0rrr commented Jun 3, 2020

I'm happy you've been able to work out your use case for your particular organization.

Having said that, I think functionality like this is great for organizations that need it, but it's not necessarily something that should be ported into the base CDK. Instead, the CDK should probably be providing mechanisms for you to be able to plug your own context providers into.

As for the specific use case: I'm a little concerned that you would need to keep on refreshing the context which says "yes IP space available/no IP space not available". Also seems there needs to be a way to be able to say "no, IP space not available, HOWEVER one of the IPs in there was mine already so that's fine". I guess by fact of having recorded a subnet with space=true in the cdk.context.json, that counts as space having been allocated for this application at the same time, right?

(Obligatory note that this will break if you synth 2 new applications at the same time before deploying either and there's only 1 IP address left... all the more reason that I think this probably doesn't belong in CDK proper)

@gradybarrett
Copy link
Author

gradybarrett commented Jun 3, 2020

All valid points. Perhaps another underlying issue worth pointing out that leads to this need is "well, everyone picked our us-east-1a subnet, and now it's full while our other two subnets are basically empty." That's such a psychologically predictable pattern that AWS saw it coming with AZ selection.

BYO context provider is interesting. It feels potentially heavy-handed, but I suppose that depends entirely on the implementation. You probably have a much better vision of it than I.

I don't actually imagine myself refreshing that often. Generally, once a subnet fills up, it stays full, with the exception of 1 or 2 IPs that fluctuate from stack deletes and redeploys. There's definitely a chance that I steal an IP while someone is deleting a stack, or that I simul-synth alongside someone else. In both situations, someone "loses" and has to reset their context. Those are the situations under which I'd have to refresh context again.

But in theory, I shouldn't have to refresh again until I encounter a conflict again. My thought is that just the presence of a subnet in context alongside a successful deployment is enough information to say that one of those subnet IPs is mine. Even if there are 0 IPs available in reality, my context hasn't changed so my deployment should proceed without issue. Managing your context according to guidelines would be critical in this case to avoid fluctuating subnet selections.

It just feels bad to make devs start defining specific subnets for a group after only 1 Subnet runs out of IPs. Especially if you decide to expand the existing group with new subnets, rather than adding a whole new group. 1 subnet running out of IPs effectively means I can no longer select subnets using subnetGroupName or subnetType. This is even more painful in account configurations like ours where we have non-prod and prod accounts. If I have to start hardcoding subnets, then all of a sudden I now have to do it for all accounts. I'd love to just be able to rely on there being standard subnet groups available to me. A context reset every once in a while to resolve a conflict is totally worth the tradeoff in my mind. Even being able to set an IP limit threshold that's used to filter out subnets would mean new stacks would be less likely to swipe IP addresses during existing stack delete's and redeploys.

@gradybarrett
Copy link
Author

Perhaps I'm talking myself further into the idea of a custom context provider where rules like this could be defined. 😄 Out of curiosity, where do you think an appropriate entry point would be for that custom context provider?

@rix0rrr
Copy link
Contributor

rix0rrr commented Jun 4, 2020

It would need a lot of validation around it, and it would be a pretty big undertaking too.

The simplest mechanics I could think of is "a context provider is an NPM module", so you would write one, and your Construct that uses it would write the name of the NPM module into the manifest, so that the CLI can go and fetch the module and have it run the query. I'm going to make a call and say it needs to include a version specifier too, to be safe. (*)

But I'm just spitballing off the top of my head here, and we'd need to create team buy-in first. Seems like a chunky enough feature that it'd probably be a good idea for someone to write an RFC first. I'd do it myself but I'm a little swamped at the moment...

(*) first inkling would probably be to say "just use the version from package.json" but that would only help people with NPM projects, would not work for people who have CDK installed globally for their Java projects, for example (or--which I recently learned about--through brew!)

@gradybarrett
Copy link
Author

gradybarrett commented Jun 4, 2020

Oof I realized I totally misspoke above. I was describing the situation as though the subnets had been filtered out during the lookup, which means they'd never end up in your context or would be removed during a context reset, which is obviously not what the implementation I have does. 🤦This would make for more stable context. I didn't go that route because I didn't really understand CDK context at the time.

@gradybarrett
Copy link
Author

gradybarrett commented Jun 4, 2020

Regarding the context provider RFC, I think RFC 167 might actually cover this - from RFC #18.

@gradybarrett
Copy link
Author

Does this being part of the Lookup layer change your mind at all about the feature? It still seems like putting resources into an unavailable subnet is a universal concern.

@rix0rrr
Copy link
Contributor

rix0rrr commented Jun 8, 2020

The CDK's goal is to be a platform for building these kinds of features as much (or more) as delivering them out of the box.

Especially if the feature is going to have caveats and there are multiple reasonable implementations imaginable, this is something that should be solved in the broader CDK ecosystem rather than a single "one-size-fits-all" solution that we are supposed to vend and maintain.

(Especially since doing so would more likely than not translate into a slew of "my use case is slightly different than what you support can you add a single flag just for me pretty please?" feature requests).

For now, I would recommend you build your own context-provider-like system outside the CDK regular build. Store some state in JSON files, read them in your application and configure your app accordingly. How you push around state is then completely under your control and as flexible as you need it to be.

When the open provider framework materializes, the same code can then be more nicely integrated and re-used, but fundamentally it doesn't change much about how this would work.

Thanks for the energy behind the idea! Closing this issue as not really appropriate for CDK itself, but I wish you the best of luck implementing this and releasing it to the broader community if you feel it fills an important need.

@rix0rrr rix0rrr closed this as completed Jun 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-ec2 Related to Amazon Elastic Compute Cloud feature-request A feature should be added or improved. in-progress This issue is being actively worked on. needs-triage This issue or PR still needs to be triaged.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants