Skip to content
This repository has been archived by the owner on Sep 30, 2020. It is now read-only.

Add support for customization of network topologies #284

Merged

Conversation

mumoshu
Copy link
Contributor

@mumoshu mumoshu commented Jan 26, 2017

This change allows us to define private and public subnets in the top-level of cluster.yaml to be chosen for worker/controller/etcd nodes and a controller loadbalancer.

Thanks to @neoandroid and @Sasso for submitting the pull request #169 and #227 respectively which had been the basis of this feature.

Let me also add that several resources including subnets, NAT gateways, route tables can now be reused by specifying id or idFromStackOutput.
Thanks to @icereval for his PR #195 to firstly introducing the idea of type Identifier to add support for existing AWS resources in an universal way.

A minimum config utilizing this feature would look like:

clusterName: mycluster
externalDNSName: mycluster.example.com
hostedZoneId: yourhostedzoneid
keyName: yourkeyname
kmsKeyArn: "arn:aws:kms:<region>:<account>:key/<id>"
region: <region>
createRecordSet: true
experimental:
  waitSignal:
    enabled: true
subnets:
- name: private1
  availabilityZone: ap-northeast-1a
  instanceCIDR: "10.0.1.0/24"
  private: true
- name: private2
  availabilityZone: ap-northeast-1c
  instanceCIDR: "10.0.2.0/24"
  private: true
- name: public1
  availabilityZone: ap-northeast-1a
  instanceCIDR: "10.0.3.0/24"
- name: public2
  availabilityZone: ap-northeast-1c
  instanceCIDR: "10.0.4.0/24"
controller:
  subnets:
  - name: private1
  - name: private2
  loadBalancer:
    private: false
etcd:
  subnets:
  - name: private1
  - name: private2
worker:
  subnets:
  - name: public1
  - name: public2

This will create 2 private subnets and 2 public subnets. Private ones are used by etcd and controller nodes and the public ones are used by worker nodes and the controller loadbalancer.

It is flexible enough to differentiate private subnets between etcd and controllers:

subnets:
- name: etcdPrivate1
  availabilityZone: ap-northeast-1a
  instanceCIDR: "10.0.1.0/24"
  private: true
- name: etcdPrivate2
  availabilityZone: ap-northeast-1c
  instanceCIDR: "10.0.2.0/24"
  private: true
- name: controllerPrivate1
  availabilityZone: ap-northeast-1a
  instanceCIDR: "10.0.3.0/24"
  private: true
- name: controllerPrivate2
  availabilityZone: ap-northeast-1c
  instanceCIDR: "10.0.4.0/24"
  private: true
- name: public1
  availabilityZone: ap-northeast-1a
  instanceCIDR: "10.0.5.0/24"
- name: public2
  availabilityZone: ap-northeast-1c
  instanceCIDR: "10.0.6.0/24"
controller:
  subnets:
  - name: controllerPrivate1
  - name: controllerPrivate2
  loadBalancer:
    private: false
etcd:
  subnets:
  - name: etcdPrivate1
  - name: etcdPrivate2
worker:
  subnets:
  - name: public1
  - name: public2

It even support using existing subnets by specifying subnet IDs:

subnets:
- name: private1
  id: subnet-12345a
  availabilityZone: ap-northeast-1a
  private: true
- name: private2
  id: subnet-12345b
  availabilityZone: ap-northeast-1c
  private: true
- name: public1
  id: subnet-12345c
  availabilityZone: ap-northeast-1a
- name: public2
  id: subnet-12345d
  availabilityZone: ap-northeast-1c
controller:
  subnets:
  - name: private1
  - name: private2
  loadBalancer:
    private: false
etcd:
  subnets:
  - name: private1
  - name: private2
worker:
  subnets:
  - name: public1
  - name: public2

Or importing subnet IDs from another cfn stack(s):

subnets:
- name: private1
  idFromStackoutput: myinfrastack-privateSubnet1
  availabilityZone: ap-northeast-1a
  private: true
- name: private2
  idFromStackoutput: myinfrastack-privateSubnet2
  availabilityZone: ap-northeast-1c
  private: true
- name: public1
  idFromStackoutput: myinfrastack-publicSubnet2
  availabilityZone: ap-northeast-1a
- name: public2
  idFromStackoutput: myinfrastack-publicSubnet3
  availabilityZone: ap-northeast-1c
controller:
  subnets:
  - name: private1
  - name: private2
  loadBalancer:
    private: false
etcd:
  subnets:
  - name: private1
  - name: private2
worker:
  subnets:
  - name: public1
  - name: public2

…rker/controller/etcd nodes and a controller loadbalancer
@mumoshu
Copy link
Contributor Author

mumoshu commented Jan 26, 2017

Let me also add that several resources including subnets, NAT gateways, route tables can now be reused by specifying id or idFromStackOutput.
Thanks to @icereval for his PR #195 to firstly introduced the idea of type Identifier to add support for existing AWS resources in an universal way.

@codecov-io
Copy link

codecov-io commented Jan 26, 2017

Codecov Report

Merging #284 into master will increase coverage by -2.45%.

@@            Coverage Diff             @@
##           master     #284      +/-   ##
==========================================
- Coverage   57.53%   55.08%   -2.45%     
==========================================
  Files           6        6              
  Lines        1288     1387      +99     
==========================================
+ Hits          741      764      +23     
- Misses        449      505      +56     
- Partials       98      118      +20
Impacted Files Coverage Δ
config/config.go 64.63% <35.24%> (-8.06%)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4cefe47...1b75cbe. Read the comment docs.

@redbaron
Copy link
Contributor

redbaron commented Jan 26, 2017

May I propose slightly different structure, which matches underneath CF structure closer.

  1. as subnet names must be unique,then top level subnets: should be turned into a dictionary
  2. any reference to subnets in a stack-template.json should be done by its name { "Ref": "Subnet0" }
  3. Subnets which have no id fields, are created but by kube-aws
  4. subnets WITH id fields are passed as a typed default params in CF template. You'll get a free validation as a bonus:
"Parameters": {
  "Subnet1": {
     "Type": "AWS::EC2::SubnetId", 
     "Default": "subnet-xxyyzz"
  }
}

Therefore cluster yaml would look something like this:

subnets:
  Subnet0:
    availabilityZone: us-west-2a
    CIDR: 10.10.10.0/24
  Subnet1:
    id: subnet-xxyyzz
  1. To keep backwards compatibility, old 'subnets' key in cluster.yaml can be converted to new one internally based on it's type: if it is a list then convert it into a map with 'Subnet0', 'Subnet1', 'Subnet2'' keys

@mumoshu
Copy link
Contributor Author

mumoshu commented Jan 26, 2017

@redbaron thanks as always for your comments!

  • IMHO 1. should be an optional structure for defining subnets because name isn't a required param. I'm considering to extend the feature to have a less verbose way of doing the same setup above with something like:
subnets:
- availabilityZone: ap-northeast-1a
  instanceCIDR: "10.0.1.0/24"
  private: true
- availabilityZone: ap-northeast-1c
  instanceCIDR: "10.0.2.0/24"
  private: true
- availabilityZone: ap-northeast-1a
  instanceCIDR: "10.0.3.0/24"
- availabilityZone: ap-northeast-1c
  instanceCIDR: "10.0.4.0/24"
controller:
  private: true
  loadBalancer:
    private: false
etcd:
  private: true
worker:
  private: false

i.e. we can omit names if not interested. This extension doesn't seem to match well against the subnets as a hash syntax?

  • I believe 2. and 3. are already implemented as you've proposed and therefore 4. could be implemented on top of this PR easily
    1. definitly seems to be the way to go for me too. However, not to make this PR too large, would you mind creating an another issue for it?
  • If believe 1. and 6. should be impemented both/at once if 1. is necessary

@redbaron
Copy link
Contributor

redbaron commented Jan 26, 2017

  1. should be an optional structure for defining subnets because name isn't a required param.

That is what I am proposing to change. You always refer to a subnet by name therefore it is explicit what is created where and it also perfectly matches what happens in CF, therefore it simplifies kube-aws code so it doesn't need to translate one representation into another.

By listing subnets explicitly you can have controller and etcd in following setup to pick subnets which might even be different from each other but both private. Or some subnets exist and some - not.

controller:
  private: true 
etcd:
  private: true

If we mandate that every subnet wheter it created by kube-aws or already exist and references by it's ID, then rest of stack-template.json unconditionally use { "Ref": "<subnet_name>" }. It will work for both cases, because CF Ref syntax to reference resources within a CF and to reference input params is conveniently the same.

@mumoshu
Copy link
Contributor Author

mumoshu commented Jan 26, 2017

Thanks again!

Regarding the latter part of your commet, I guess I already know the convenience and efficiency of utilizing cfn parameters after reading through @icereval's great work. I already introduced a few essences of his work via this PR. Now I'm open for a future PR to achieve what you suggested hence I've suggested to open an another issue dedicated to that 👍

I'll try to sort out all the issues eventually but it is definitely impossible to manage those alone. Merging everyone's works and desires to make it work together while adding tests and refactoring to make it maintainable at least for me already took several days to happen!
Gradual changes made by not only the maintainer but multiple contributors is only the way to sustainably keep improving kube-aws IMHO.
Raising another issues for the problem which can be addressed independently is the first step toward it.
I know contributors like you are already helping kube-aws and me a lot but I have to request for more.
Fire and motion with me!

@mumoshu
Copy link
Contributor Author

mumoshu commented Jan 26, 2017

By listing subnets explicitly you can have controller and etcd in following setup to pick subnets which might even be different from each other but both private. Or some subnets exist and some - not.

Yes, it is why I made it to allow referencing subnets by names.
However I'd still want to support both cases. The one with subnets whose names are omitted is definitely for demo purpose. The one with explicitly named subnets for production purpose(hence implemented at first in this PR). Thoughts?

@redbaron
Copy link
Contributor

The one with explicitly named subnets for production purpose(hence implemented at first in this PR). Thoughts?

my concern about introducing names and not making them keys in a hashmap is that it is naturally can lead to duplicates and you need yet another piece of Go code to validate and report error if there are dupes. If you make all subnet names to be keys in hashmap, then you get uniqueness for free, datastructure itself enforces desired properties. Apart from that it all looks fine by me given the current state of the code.

IN the background I am preparing kube-aws overhaul which takes different approach in the way it translates cluster.yaml into CF template. I'll present it as a separate bracnh for discussion and collaboration once it achieves feature parity.

@mumoshu
Copy link
Contributor Author

mumoshu commented Jan 27, 2017

@redbaron

my concern about introducing names and not making them keys in a hashmap is that it is naturally can lead to duplicates and you need yet another piece of Go code to validate and report error if there are dupes.

Certainly.

If you make all subnet names to be keys in hashmap, then you get uniqueness for free, datastructure itself enforces desired properties

Definitely. Hmm, this is a bit hard decision for me to make at this stage alone. Please let me leave comments to request for confirmations in the related github issues.
Basically if no one needs:

controller:
  private: true 
etcd:
  private: true

for demo purpose, we can safely switch to the way of utilizing hashes as you've suggested.
Personally I'd also prefer it if possible.

Btw,

Apart from that it all looks fine by me given the current state of the code.

Thanks for the kind words!

I'm also looking forward to see your overhaul work!
Will it includes the support for node pools defined inside the top-level cluster.yaml, which is I'm going to tackle next?

@mumoshu
Copy link
Contributor Author

mumoshu commented Jan 27, 2017

I've updated the description of this PR to cover the overview of all the changes and improvements made.

cc @neoandroid @icereval @c-knowles @Sasso

@whereisaaron
Copy link
Contributor

This all looks cool and flexible @mumoshu, thanks for your work! For my use cases it is considerably more flexibility than I would need. The one capability here I'd definitely find useful though, and will use if this goes ahead, is the ability to deploy private clusters with only the API and Ingress/Service load balancers public.

controller:
  loadBalancer:
    private: false

The cluster.yaml file is small enough that I remake it for every minor release. I don't want to miss some change or new feature, so I init every time and re-do the customization of cluster.yaml. Therefore I don't mind what syntax in cluster.yaml gets changed between minor, x.y releases.

@neoandroid
Copy link
Contributor

@whereisaaron you already can achieve that scenario since #169 was merged

@mumoshu
Copy link
Contributor Author

mumoshu commented Jan 29, 2017

Hi @whereisaaron, thanks for the request!
If I understand correctly, I believe your case mentioned in #284 (comment) can already be achieved after this PR with:

subnets:
- name: private1
  availabilityZone: ap-northeast-1a
  instanceCIDR: "10.0.1.0/24"
  private: true
- name: private2
  availabilityZone: ap-northeast-1c
  instanceCIDR: "10.0.2.0/24"
  private: true
- name: public1
  availabilityZone: ap-northeast-1a
  instanceCIDR: "10.0.3.0/24"
- name: public2
  availabilityZone: ap-northeast-1c
  instanceCIDR: "10.0.4.0/24"
controller:
  subnets:
  - name: private1
  - name: private2
  loadBalancer:
    private: false
   # Setting `loadBalancer.private` to false leads kube-aws to make it an `internet-facing` lb while choosing public subnets for the lb like
   # subnets:
   # - name: public1
   # - name: public2
etcd:
  subnets:
  - name: private1
  - name: private2
worker:
  subnets:
  - name: private1
  - name: private2

As can be seen in the above example, explicitly choosing private subnets for all the subnets keys for worker and controller, etcd while setting loadBalancer.private to false results to your desired setup.

@mumoshu
Copy link
Contributor Author

mumoshu commented Jan 29, 2017

@whereisaaron Regarding your comment #169 (comment), would it be enough for your use-case to add an option to disable creation of a NAT gateway and a route to it like:

- name: private1
  availabilityZone: ap-northeast-1a
  instanceCIDR: "10.0.1.0/24"
  private: true
  natGateway:
    create: false
- name: private2
  availabilityZone: ap-northeast-1c
  instanceCIDR: "10.0.2.0/24"
  private: true
  natGateway:
    create: false

?

@mumoshu
Copy link
Contributor Author

mumoshu commented Jan 29, 2017

@whereisaaron Ah, or you'll even need to do something like:

- name: private1
  availabilityZone: ap-northeast-1a
  instanceCIDR: "10.0.1.0/24"
  private: true
  natGateway:
    create: false
  # this, in combination with `natGateway.create=false`, implies that the route table already has a route to an existing NAT gateway
  routeTable:
    id: <id of your route table mentioned in https://github.com/coreos/kube-aws/pull/169#issuecomment-275895730>
- name: private2
  availabilityZone: ap-northeast-1c
  instanceCIDR: "10.0.2.0/24"
  private: true
  natGateway:
    create: false
  # this, in combination with `natGateway.create=false`, implies that the route table already has a route to an existing NAT gateway
  routeTable:
    id: routeTable:
    id: <id of your route table mentioned in https://github.com/coreos/kube-aws/pull/169#issuecomment-275895730>
# public subnets used for the external elb for api
- name: public1
  # snip
- name: public2
  # snip

natGateway.create is not yet implemented but it can be added easily if necessary.

routeTable.id for a subnet is already implemented in this PR but would you prefer the top-level routeTableId to be used as the default value for all the subnets (like before)?

Update:

Instead of:

natGateway:
    create: false

I made it:

natGateway:
    preconfigured: true

so that the zero-value(=false) can be the default value for the setting (which is now preconfigured) and therefore the implementation is a bit cleaner.

@mumoshu
Copy link
Contributor Author

mumoshu commented Jan 30, 2017

Replying to my own comment above.

would you prefer the top-level routeTableId to be used as the default value for all the subnets (like before)?

This should not be allowed because then we don't have a clear way to differentiate public/private route tables;

  • public route table(s) should be provided to/managed by kube-aws to put external kubernetes api loadbalancer
  • private route table(s) should be provided/managed by kube-aws to put private worker/etcd/controller nodes

as implied by what @whereisaaron mentioned in #284 (comment)

@cknowles
Copy link
Contributor

@mumoshu as per your comment, I think the above looks fine. My understanding of this change is that it adds more flexibility to subnet customisation. Currently using a slightly older 0.9.2 release it is possible to change the route table associated with subnets generated by kube-aws and therefore possible to make them private/public. I'm not sure if #169 removed that and it's re-added here or just here adds more flexibility, either way is fine. Still a bit concerned about struggling to ensure all these different options are all well tested but I don't know how we can deal with that other than cutting the options and perhaps providing some network CF examples out of the box.

@whereisaaron
Copy link
Contributor

Thanks for looking in this @mumoshu. It is no problem to specify the route table per subnet, so if routeTable.id is already in place for subnets then I can still ensure our region-level or VPC-level NAT instances are used, rather than these new per-deployment ones. That's great 🎉

So I think the only missing piece to retain the previous capabilities is to have some way to not have any NAT instances created, since they serve no purpose if you have specified routeTable.id and workerSecurityGroupIds for VPC-level or region-level NAT instances. If the proposed natGateway.create option would do that, then we should be sorted.

One question about I think the proposed natGateway.create; I think controller, etcd, and worker subnets all use the same NAT instances. So perhaps the natGateway.create is effectively only a global/top-level yes/no choice rather than per-subnet?

@c-knowles creating private subnets with a specified route table is still possible in v0.9.3-rc.5 and it looks like this new way of doing things still retains or re-adds that capability via routeTable.id. The change, as I understand it, is that enabling private subnets now also triggers the creation of NAT instances. That's a great default for people creating a VPC and a first cluster. But if you have a few clusters you can't afford to create VPCs and/or NAT instances for every one, since the AWS limit is only 5 (in both cases).

@c-knowles as soon as there is an alpha/beta/rc release I'll get on with testing private subnet deployments and report back.

@mumoshu
Copy link
Contributor Author

mumoshu commented Jan 30, 2017

@c-knowles @whereisaaron Thanks again!

The intention of this PR is to provide enough flexibility to e.g. create worker/etcd/controller nodes in private/public subnet(s) and create an api load balancer in private/public subnet(s) plus reusing existing AWS resources including:

  • NAT gateways
  • subnets
  • route tables

where necessary but I began to believe that I had unintentionally broken your use-cases(=everything in private subnets with a smaller config?) not in functionality but in configuration syntax.
It isn't intended at least for now.

You had been using a configuration like the below to put all the nodes and lbs to private subnets i.e. nothing including nodes and lbs in public subnets, right?

routeTableId: rtb-for-privatesubnet-withpreconfigurednat
mapPublicIPs: false
controllerLoadBalancerPrivate: true

If so, I'm considering to improve this PR so that the above is translated to something like:

subnets:
- private: true
  natGateway:
    preconfigured: true
  routeTable:
    id: rtb-for-privatesubnet-withpreconfigurednat
controller:
  loadBalancer:
    private: true

// However, to be honest, such translation could be soon deprecated and removed if it turns out to be too hard to maintain. Sorry!

Does it make sense to you two?

@mumoshu
Copy link
Contributor Author

mumoshu commented Jan 30, 2017

@whereisaaron I've implemented natGateway.preconfigured instead of natGateway.create with the bool value inverted. Please see my updated comment at #284 (comment) for more explanation!

So I think the only missing piece to retain the previous capabilities is to have some way to not have any NAT instances created, since they serve no purpose if you have specified routeTable.id and workerSecurityGroupIds for VPC-level or region-level NAT instances. If the proposed natGateway.create option would do that, then we should be sorted.

Yes, I think so too and it is addressed in my last commit 242783d which adds subnets[].natGateway.preconfigured: true which requires subnets[].routeTable.id/idFromStackOutput to be specified.
Did I follow you correctly?

@mumoshu
Copy link
Contributor Author

mumoshu commented Jan 30, 2017

@c-knowles @whereisaaron I've updated my comment #284 (comment) several times. Please re-read if you came from email notifications from github 😃

@mumoshu
Copy link
Contributor Author

mumoshu commented Jan 30, 2017

@whereisaaron

One question about I think the proposed natGateway.create; I think controller, etcd, and worker subnets all use the same NAT instances. So perhaps the natGateway.create is effectively only a global/top-level yes/no choice rather than per-subnet?

My assumption is that: one or more subnet(s) may rely on an NGW created by kube-aws and others may rely on an NGW preconfigured for an existing subnet(s). Providing the flexibility to customize ngw per subnet also support it.

However I believe that it is true in some aspects that a setting like natGateway.preconfigured should optionally be able to be set at the top-level of cluster.yaml so that one doesn't need to write a verbose configuration like #284 (comment) in some cases hence my prev comment #284 (comment)

@cknowles
Copy link
Contributor

cknowles commented Jan 30, 2017

@whereisaaron @mumoshu thanks, it seems my use case is still accommodated. Specifically, the use case is detailed in #44. Summary:

  1. Existing VPC to share NATs and also connect through to some other components inside the same VPC such as RDS deployments. Shared NATs is mainly for cost reasons as not much benefit to have separate managed NATs for different clusters we have.
  2. Private subnets for workers due to best practice (non of the services need direct exposure). Ideally generated by kube-aws as we have a few k8s clusters and adding/removing them from the VPC separately seems to be the wrong place.
  3. Controller nodes and ELBs are all still in public subnets (but later I'd like the controllers in private subnets too).

So in turn we end up with the solution for kube-aws to create the subnets and link to existing route tables which are easily defined in a shared VPC (some private with NAT and some public with IGW).

YAML setup looks a bit like this:

subnets:
  - availabilityZone: eu-west-1a
    instanceCIDR: "10.0.1.0/24"
    # private route table to NAT AZ a
    routeTableId: rtb-ID1
  - availabilityZone: eu-west-1b
    instanceCIDR: "10.0.2.0/24"
    # private route table to NAT AZ b
    routeTableId: rtb-ID2
  - availabilityZone: eu-west-1c
    instanceCIDR: "10.0.3.0/24"
    # private route table to NAT AZ c
    routeTableId: rtb-ID3
controllerIP: 10.0.4.50
controllerSubnets:
  - availabilityZone: eu-west-1a
    instanceCIDR: "10.0.4.0/24"
    # public route table
    routeTableId: rtb-ID4
  - availabilityZone: eu-west-1b
    instanceCIDR: "10.0.5.0/24"
    # public route table
    routeTableId: rtb-ID4
  - availabilityZone: eu-west-1c
    instanceCIDR: "10.0.6.0/24"
    # public route table
    routeTableId: rtb-ID4

We have a similar setup with node pools as well, the config is essentially the same. @mumoshu I'm happy for you to break config compatibility to keep kube-aws simple, it's trivial to move config around and we could even write a small version migration script.

@mumoshu
Copy link
Contributor Author

mumoshu commented Jan 30, 2017

@c-knowles @whereisaaron To sync up, let me confirm that you've never tried to do something like:

routeTableId: rtb-for-privatesubnet-withpreconfigurednat
mapPublicIPs: false
# !!!!!
controllerLoadBalancerPrivate: false

which is IMHO not recommended because it shouldn't work if you configured your existing AWS resources properly.

My reasoning here is that:

  • mapPublicIPs: false means you wanted to kube-aws to create private subnets for all the nodes while
  • controllerLoadBalancerPrivate: false means you also wanted it to create a k8s api ELB in public subnets.
  • However public subnets required by an external ELB aren't created by kube-aws because you've set mapPublicIPs: false. Am I correct?

Anyways, fyi, this use-case is intended to be newly supported via this PR with configuration like explained in #284 (comment).

@mumoshu
Copy link
Contributor Author

mumoshu commented Jan 30, 2017

@c-knowles Thanks as always!
Just a quick response to #284 (comment) but what is and its intention of controllerIP: 10.0.4.50?

@mumoshu
Copy link
Contributor Author

mumoshu commented Jan 30, 2017

@c-knowles @whereisaaron And thanks for the kind words about following breaking changes in configuration syntax! It will definitely accelerate the development of kube-aws.

@cknowles
Copy link
Contributor

@mumoshu for your first comment, I'm not using private controller ELBs at all right now although if that's the preferred way once I add in a bastion host then I will use it (happy to go off best practice). For your second comment, I believe there was previously a bug in kube-aws which meant controllerIP had to be specified if we customised the subnets so in turn I preferred to be a bit more verbose in the config to avoid further issues, I think that goes away with HA controllers.

@whereisaaron
Copy link
Contributor

Wow, thanks for the quick work to add natGateway.preconfigured! As soon as there is a test release I'll recreate some test clusters with the new syntax. It sounds like the existing use case will be covered again now 🎉

This is my current successful use case under 0.9.3. Same settings for the cluster and all node-pools.

vpcId: vpc-12345678
routeTableId: rtb-12345678
mapPublicIPs: false
workerSecurityGroupIds:
  - sg-12345678

I deploy everything with private subnets and then get k8s to create public ELBs for Services and Ingress Controllers that should be exposed. Kubernetes specifically supports private subnet clusters with public ELBs with the kubernetes.io/role/elb subnet tag, which you apply to the (possibly empty) public subnets you want to use for public ELBs. You tag the public subnet you want to use for ELBs in each relevant AZ with:

Tag Value
KubernetesCluster your-cluster-name
kubernetes.io/role/elb

When Kubernetes is creating a public ELB, it collects all the public subnets associated with the cluster (tagged KubernetesCluster: cluster-name) and picks the first one in each AZ tagged kubernetes.io/role/elb, or if none are tagged kubernetes.io/role/elb it just picks the first public subnet. (It determines 'public' subnets by checking the route table for an 'igw*' entry.) The relevant code is here.

I keep the controller API ELB private, but you can certainly have controllers in private subnets but with a public API ELB. You (or kube-aws) just needs to create one public subnet (or one per AZ) for the ELB.

For mapPublicIPs: false deployments with private subnets, kube-aws could leverage the built-in k8s support for this by creating and tagging an empty public subnet (per AZ) for public LoadBalancer Services.

…ublicIPs, routeTableId to let kube-aws create all the nodes and the api load-balancer inside either private or public subnets
@mumoshu
Copy link
Contributor Author

mumoshu commented Jan 31, 2017

@whereisaaron @c-knowles To make your migration path extra clear, I've added an additional commit with more validations and backward-compatibility with the older config syntax based on my idea in #284 (comment)

Specifically, I guess your (potential) use-case is covered by this test 80885cb#diff-4fd4a4a9a3755708c6909f7f824f5754R207
Would you mind taking a look into it?
Is migration path clear enough for you? Do I understand your use-case correctly?

// Please forgive me if I'm being too nervous here but I don't really like to break existing use-cases of yours!

@whereisaaron
Copy link
Contributor

whereisaaron commented Jan 31, 2017

Thanks @mumoshu as I read it that does cover the two use cases and seems pretty clean.

  1. If you specify mapPublicIPs: false and NATGateway.preconfigured: true you must also specify routeTableId: rtb-12345678 and it is expected, and the user's responsibility that that route table will provide a NAT gateway. kube-aws will apply your route table to subnets but create no NAT gateways.
  2. If you specify mapPublicIPs: true and NATGateway.preconfigured: true you must also specify routeTableId: rtb-12345678 and it is expected, and the user's responsibility that that route table will provide an Internet gateway. kube-aws will apply your route table to subnets but create no Internet gateways.

Thank you very much for keeping these use cases in play, even though you want to deprecate them 😢 . I would have thought the use case where kube-aws users already use AWS, and already have a working VPC with NAT would be fairly large group, and I'm surprised you plan to drop them.

In the use case where the user has no VPC and kube-aws is creating one, then creating the NAT instances for mapPublicIPs: false is a great new feature for the first install. However, mandating every subsequent install creates yet more NAT gateways seems onerous to me.

So I hope you'll consider keeping a use case where the user has an existing AWS VPC environment to deploy clusters into! 🙏 🤞

@mumoshu
Copy link
Contributor Author

mumoshu commented Jan 31, 2017

Thanks as always @whereisaaron 🙇
Sorry for the confusion if I wasn't very clear about it but I'm intended to deprecate confusing configuration syntax(es) but not your use-case(s)!

For example, I'm wondering if we could deprecate mapPublicIPs: true in favor of the ideal allSubnetsPublic: true and mapPublicIPs: false in favor of the ideal allSubnetsPrivate: true.
Then, when both of the ideal flags are set to false, we can be sure that an user does want to mix private/public subnets hence more validation(s) to save the user from making mistakes.

mapPublicIPs as of today can't be used like that as its default value is true. What I coded in this PR is that mapPublicIPs: true(which is the default) combined with subnets[].private: true results in private subnets! It's just confusing to be explained by not only me but everyone, isn't it?

Once again,

I would have thought the use case where kube-aws users already use AWS, and already have a working VPC with NAT would be fairly large group, and I'm surprised you plan to drop them.

I'm not intended to break that use-case!

@whereisaaron
Copy link
Contributor

whereisaaron commented Jan 31, 2017

Oh yeah, I totally misunderstood! Sorry @mumoshu! 💩
// DEPRECATED AND REMOVED IN THE FUTURE refers to the syntax rather than the use case. I am certainly happy adjust to any new syntax that makes the code or the configuration easier, as often as necessary.
Thanks for you efforts!!

@mumoshu
Copy link
Contributor Author

mumoshu commented Feb 1, 2017

@whereisaaron FYI, for simplicity of cluster.yaml, I've dropped the natGateway.preconfigured while retaining the support for your use-case!
Please read the newly added comments for subnets: key in the updated cluster.yaml to find your use-case and appropriate configuration.

@mumoshu
Copy link
Contributor Author

mumoshu commented Feb 1, 2017

Although important parts are covered by test/integration/maincluster_test.go and tested automatically by passing through real AWS service(=cfn stack validations), it's impossible to test all the use-cases myself.
I'd appreciate it if you could test yours by your own and report back if there's any issue 🙇

@mumoshu mumoshu changed the title WIP: Add support for customization of network topologies Add support for customization of network topologies Feb 1, 2017
@mumoshu mumoshu added this to the v0.9.4-rc.1 milestone Feb 1, 2017
@mumoshu mumoshu merged commit 9556f3d into kubernetes-retired:master Feb 1, 2017
@mumoshu mumoshu deleted the flexible-network-topology branch February 1, 2017 06:52
mumoshu added a commit to mumoshu/kube-aws that referenced this pull request Feb 15, 2017
This is an implementation of kubernetes-retired#238 from @redbaron especially what I've described in my comment there kubernetes-retired#238 (comment), and an answer to the request "**3. Node pools should be more tightly integrated**" of kubernetes-retired#271 from @Sasso .
I believe this also achieves what was requested by @andrejvanderzee in kubernetes-retired#176 (comment).

After applying this change:

1. All the `kube-aws node-pools` sub-commands are dropped
2. You can now bring up a main cluster and one or more node pools at once with `kube-aws up`
3. You can now update all the sub-clusters including a main cluster and node pool(s) by running  `kube-aws update`
4. You can now destroy all the AWS resources spanning main and node pools at once with `kube-aws destroy`
5. You can configure node pools by defining a `worker.nodePools` array in cluster.yaml`
6. `workerCount` is dropped. Please migrate to `worker.nodePools[].count`
7. `node-pools/` and hence `node-pools/<node pool name>` directories, `cluster.yaml`, `stack-template.json`, `user-data/cloud-config-worker` for each node pool are dropped.
8. A typical local file tree would now look like:
  - `cluster.yaml`
  - `stack-templates/` (generated on `kube-aws render`)
     - `root.json.tmpl`
     - `control-plane.json.tmpl`
     - `node-pool.json.tmpl`
  - `userdata/`
     - `cloud-config-worker`
     - `cloud-config-controller`
     - `cloud-config-etcd`
  - `credentials/`
     - *.pem(generated on `kube-aws render`)
     - *.pem.enc(generated on `kube-aws validate` or `kube-aws up`)
  - `exported/` (generated on `kube-aws up --export --s3-uri <s3uri>`)
     - `stacks/`
       - `control-plane/`
         - `stack.json`
         - `user-data-controller`
       - `<node pool name = stack name>/`
         - `stack.json`
         - `user-data-worker`
9. A typical object tree in S3 would now look like:
  - `<bucket and directory from s3URI>`/
    - kube-aws/
      - clusters/
        - `<cluster name>`/
          - `exported`/
            - `stacks`/
              - `control-plane/`
                - `stack.json`
                - `cloud-config-controller`
              - `<node pool name = stack name>`/
                - `stack.json`

Implementation details:

Under the hood, kube-aws utilizes CloudFormation nested stacks to delegate management of multiple stacks as a whole.
kube-aws now creates 1 root stack and nested stacks including 1 main(or currently named "control plane") stack and 0 or more node pool stacks.
kube-aws operates on S3 to upload all the assets required by all the stacks(root, main, node pools) and then on CloudFormation to create/update/destroy a root stack.

An example `cluster.yaml`  I've been used to test this looks like:

```yaml
clusterName: <your cluster name>
externalDNSName: <your external dns name>
hostedZoneId: <your hosted zone id>
keyName: <your key name>
kmsKeyArn: <your kms key arn>
region: ap-northeast-1
createRecordSet: true
experimental:
  waitSignal:
    enabled: true
subnets:
- name: private1
  availabilityZone: ap-northeast-1a
  instanceCIDR: "10.0.1.0/24"
  private: true
- name: private2
  availabilityZone: ap-northeast-1c
  instanceCIDR: "10.0.2.0/24"
  private: true
- name: public1
  availabilityZone: ap-northeast-1a
  instanceCIDR: "10.0.3.0/24"
- name: public2
  availabilityZone: ap-northeast-1c
  instanceCIDR: "10.0.4.0/24"
controller:
  subnets:
  - name: public1
  - name: public2
  loadBalancer:
    private: false
etcd:
  subnets:
  - name: public1
  - name: public2
worker:
  nodePools:
  - name: pool1
    subnets:
    - name: asgPublic1a
  - name: pool2
    subnets: # former `worker.subnets` introduced in v0.9.4-rc.1 via kubernetes-retired#284
    - name: asgPublic1c
    instanceType: "c4.large" # former `workerInstanceType` in the top-level
    count: 2 # former `workerCount` in the top-level
    rootVolumeSize: ...
    rootVolumeType: ...
    rootVolumeIOPs: ...
    autoScalingGroup:
      minSize: 0
      maxSize: 10
    waitSignal:
      enabled: true
      maxBatchSize: 2
  - name: spotFleetPublic1a
    subnets:
    - name: public1
    spotFleet:
      targetCapacity: 1
      unitRootVolumeSize: 50
      unitRootvolumeIOPs: 100
      rootVolumeType: gp2
      spotPrice: 0.06
      launchSpecifications:
      - spotPrice: 0.12
         weightedCapacity: 2
         instanceType: m4.xlarge
        rootVolumeType: io1
        rootVolumeIOPs: 200
        rootVolumeSize: 100
```
mumoshu added a commit to mumoshu/kube-aws that referenced this pull request Feb 16, 2017
This is an implementation of kubernetes-retired#238 from @redbaron especially what I've described in my comment there kubernetes-retired#238 (comment), and an answer to the request "**3. Node pools should be more tightly integrated**" of kubernetes-retired#271 from @Sasso .
I believe this also achieves what was requested by @andrejvanderzee in kubernetes-retired#176 (comment).

After applying this change:

1. All the `kube-aws node-pools` sub-commands are dropped
2. You can now bring up a main cluster and one or more node pools at once with `kube-aws up`
3. You can now update all the sub-clusters including a main cluster and node pool(s) by running  `kube-aws update`
4. You can now destroy all the AWS resources spanning main and node pools at once with `kube-aws destroy`
5. You can configure node pools by defining a `worker.nodePools` array in cluster.yaml`
6. `workerCount` is dropped. Please migrate to `worker.nodePools[].count`
7. `node-pools/` and hence `node-pools/<node pool name>` directories, `cluster.yaml`, `stack-template.json`, `user-data/cloud-config-worker` for each node pool are dropped.
8. A typical local file tree would now look like:
  - `cluster.yaml`
  - `stack-templates/` (generated on `kube-aws render`)
     - `root.json.tmpl`
     - `control-plane.json.tmpl`
     - `node-pool.json.tmpl`
  - `userdata/`
     - `cloud-config-worker`
     - `cloud-config-controller`
     - `cloud-config-etcd`
  - `credentials/`
     - *.pem(generated on `kube-aws render`)
     - *.pem.enc(generated on `kube-aws validate` or `kube-aws up`)
  - `exported/` (generated on `kube-aws up --export --s3-uri <s3uri>`)
     - `stacks/`
       - `control-plane/`
         - `stack.json`
         - `user-data-controller`
       - `<node pool name = stack name>/`
         - `stack.json`
         - `user-data-worker`
9. A typical object tree in S3 would now look like:
  - `<bucket and directory from s3URI>`/
    - kube-aws/
      - clusters/
        - `<cluster name>`/
          - `exported`/
            - `stacks`/
              - `control-plane/`
                - `stack.json`
                - `cloud-config-controller`
              - `<node pool name = stack name>`/
                - `stack.json`

Implementation details:

Under the hood, kube-aws utilizes CloudFormation nested stacks to delegate management of multiple stacks as a whole.
kube-aws now creates 1 root stack and nested stacks including 1 main(or currently named "control plane") stack and 0 or more node pool stacks.
kube-aws operates on S3 to upload all the assets required by all the stacks(root, main, node pools) and then on CloudFormation to create/update/destroy a root stack.

An example `cluster.yaml`  I've been used to test this looks like:

```yaml
clusterName: <your cluster name>
externalDNSName: <your external dns name>
hostedZoneId: <your hosted zone id>
keyName: <your key name>
kmsKeyArn: <your kms key arn>
region: ap-northeast-1
createRecordSet: true
experimental:
  waitSignal:
    enabled: true
subnets:
- name: private1
  availabilityZone: ap-northeast-1a
  instanceCIDR: "10.0.1.0/24"
  private: true
- name: private2
  availabilityZone: ap-northeast-1c
  instanceCIDR: "10.0.2.0/24"
  private: true
- name: public1
  availabilityZone: ap-northeast-1a
  instanceCIDR: "10.0.3.0/24"
- name: public2
  availabilityZone: ap-northeast-1c
  instanceCIDR: "10.0.4.0/24"
controller:
  subnets:
  - name: public1
  - name: public2
  loadBalancer:
    private: false
etcd:
  subnets:
  - name: public1
  - name: public2
worker:
  nodePools:
  - name: pool1
    subnets:
    - name: asgPublic1a
  - name: pool2
    subnets: # former `worker.subnets` introduced in v0.9.4-rc.1 via kubernetes-retired#284
    - name: asgPublic1c
    instanceType: "c4.large" # former `workerInstanceType` in the top-level
    count: 2 # former `workerCount` in the top-level
    rootVolumeSize: ...
    rootVolumeType: ...
    rootVolumeIOPs: ...
    autoScalingGroup:
      minSize: 0
      maxSize: 10
    waitSignal:
      enabled: true
      maxBatchSize: 2
  - name: spotFleetPublic1a
    subnets:
    - name: public1
    spotFleet:
      targetCapacity: 1
      unitRootVolumeSize: 50
      unitRootvolumeIOPs: 100
      rootVolumeType: gp2
      spotPrice: 0.06
      launchSpecifications:
      - spotPrice: 0.12
         weightedCapacity: 2
         instanceType: m4.xlarge
        rootVolumeType: io1
        rootVolumeIOPs: 200
        rootVolumeSize: 100
```
kylehodgetts pushed a commit to HotelsDotCom/kube-aws that referenced this pull request Mar 27, 2018
kylehodgetts pushed a commit to HotelsDotCom/kube-aws that referenced this pull request Mar 27, 2018
…rk-topology

Add support for customization of network topologies
kylehodgetts pushed a commit to HotelsDotCom/kube-aws that referenced this pull request Mar 27, 2018
This is an implementation of kubernetes-retired#238 from @redbaron especially what I've described in my comment there kubernetes-retired#238 (comment), and an answer to the request "**3. Node pools should be more tightly integrated**" of kubernetes-retired#271 from @Sasso .
I believe this also achieves what was requested by @andrejvanderzee in kubernetes-retired#176 (comment).

After applying this change:

1. All the `kube-aws node-pools` sub-commands are dropped
2. You can now bring up a main cluster and one or more node pools at once with `kube-aws up`
3. You can now update all the sub-clusters including a main cluster and node pool(s) by running  `kube-aws update`
4. You can now destroy all the AWS resources spanning main and node pools at once with `kube-aws destroy`
5. You can configure node pools by defining a `worker.nodePools` array in cluster.yaml`
6. `workerCount` is dropped. Please migrate to `worker.nodePools[].count`
7. `node-pools/` and hence `node-pools/<node pool name>` directories, `cluster.yaml`, `stack-template.json`, `user-data/cloud-config-worker` for each node pool are dropped.
8. A typical local file tree would now look like:
  - `cluster.yaml`
  - `stack-templates/` (generated on `kube-aws render`)
     - `root.json.tmpl`
     - `control-plane.json.tmpl`
     - `node-pool.json.tmpl`
  - `userdata/`
     - `cloud-config-worker`
     - `cloud-config-controller`
     - `cloud-config-etcd`
  - `credentials/`
     - *.pem(generated on `kube-aws render`)
     - *.pem.enc(generated on `kube-aws validate` or `kube-aws up`)
  - `exported/` (generated on `kube-aws up --export --s3-uri <s3uri>`)
     - `stacks/`
       - `control-plane/`
         - `stack.json`
         - `user-data-controller`
       - `<node pool name = stack name>/`
         - `stack.json`
         - `user-data-worker`
9. A typical object tree in S3 would now look like:
  - `<bucket and directory from s3URI>`/
    - kube-aws/
      - clusters/
        - `<cluster name>`/
          - `exported`/
            - `stacks`/
              - `control-plane/`
                - `stack.json`
                - `cloud-config-controller`
              - `<node pool name = stack name>`/
                - `stack.json`

Implementation details:

Under the hood, kube-aws utilizes CloudFormation nested stacks to delegate management of multiple stacks as a whole.
kube-aws now creates 1 root stack and nested stacks including 1 main(or currently named "control plane") stack and 0 or more node pool stacks.
kube-aws operates on S3 to upload all the assets required by all the stacks(root, main, node pools) and then on CloudFormation to create/update/destroy a root stack.

An example `cluster.yaml`  I've been used to test this looks like:

```yaml
clusterName: <your cluster name>
externalDNSName: <your external dns name>
hostedZoneId: <your hosted zone id>
keyName: <your key name>
kmsKeyArn: <your kms key arn>
region: ap-northeast-1
createRecordSet: true
experimental:
  waitSignal:
    enabled: true
subnets:
- name: private1
  availabilityZone: ap-northeast-1a
  instanceCIDR: "10.0.1.0/24"
  private: true
- name: private2
  availabilityZone: ap-northeast-1c
  instanceCIDR: "10.0.2.0/24"
  private: true
- name: public1
  availabilityZone: ap-northeast-1a
  instanceCIDR: "10.0.3.0/24"
- name: public2
  availabilityZone: ap-northeast-1c
  instanceCIDR: "10.0.4.0/24"
controller:
  subnets:
  - name: public1
  - name: public2
  loadBalancer:
    private: false
etcd:
  subnets:
  - name: public1
  - name: public2
worker:
  nodePools:
  - name: pool1
    subnets:
    - name: asgPublic1a
  - name: pool2
    subnets: # former `worker.subnets` introduced in v0.9.4-rc.1 via kubernetes-retired#284
    - name: asgPublic1c
    instanceType: "c4.large" # former `workerInstanceType` in the top-level
    count: 2 # former `workerCount` in the top-level
    rootVolumeSize: ...
    rootVolumeType: ...
    rootVolumeIOPs: ...
    autoScalingGroup:
      minSize: 0
      maxSize: 10
    waitSignal:
      enabled: true
      maxBatchSize: 2
  - name: spotFleetPublic1a
    subnets:
    - name: public1
    spotFleet:
      targetCapacity: 1
      unitRootVolumeSize: 50
      unitRootvolumeIOPs: 100
      rootVolumeType: gp2
      spotPrice: 0.06
      launchSpecifications:
      - spotPrice: 0.12
         weightedCapacity: 2
         instanceType: m4.xlarge
        rootVolumeType: io1
        rootVolumeIOPs: 200
        rootVolumeSize: 100
```
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants