AWS VPC limits curb scalability of aws-vpc backend #164

rohansingh · 2015-04-30T15:26:27Z

At Spotify we've actually determined the aws-vpc backend isn't usable for us, partly due to AWS VPC limits:
http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/VPC_Appendix_Limits.html

Specifically, the "Entries per route table" is limited to 50. While the documentation states that the number of entries per route table can be increased on request, here's Spotify's experience as related by one of our devops folks:

yeah, we raised the limit (200?) but apparently the routes over 100 are just skipped...
which is slightly better because before if we went over 100 it completely crashed

So yeah, unfortunately looks like aws-vpc is unusable for more than a couple dozen nodes.

The text was updated successfully, but these errors were encountered:

rynbrd · 2015-05-03T17:13:44Z

The VPC backend also has the limitation that all hosts must be in subnets which share the same route table.

Our cluster layout includes a subnet to which we deploy our proxy containers. This uses a different route table as these are the only hosts with external IPs and direct internet access.

A solution to both problems could involve adding support for multiple route tables. The implementation would likely include a mapping of host subnets to route table IDs.

The above solution would allow us to bypass the 100 entry limit in a single route table by assigning a new route table to each subnet.

eyakubovich · 2015-05-08T19:01:12Z

Thank you for this feedback. Having good support for aws-vpc would be very valuable to avoid the overhead. We're currently working on adding a client/server option. The idea is that all the flannel daemons talk to flannel server instead of etcd directly. This is needed for some deployments where there's desire to restrict a set of machines that can access etcd. But it will also allow for the flannel server to be the one modifying the route tables, thus restricting just the server to having access to the tokens/IAM role needed to modify the routes.

As part of that work, we'll try to see how we can support multiple route tables. And PR's are always welcome.

rohansingh · 2015-05-08T19:38:44Z

Interesting, good to know. Are you considering the availability implications? It would be a no-go for us if the flannel server/master became a single point of failure.

eyakubovich · 2015-05-13T20:09:39Z

@rohansingh The client/server will be an opt-in option so you'll be able to keep things as is. But more importantly, the server will be stateless -- the data is still stored in etcd. If a server fails, a new one can be brought up, hopefully automatically by a cluster scheduler such as fleet or Kubernetes.

rohansingh · 2015-05-13T22:06:13Z

@eyakubovich Sounds great! Thanks for the clarification :)

Grindizer · 2015-11-19T22:18:25Z

Hi everyone,

we are also investigating the use of aws-vpc for our cluster, the most annoying for us is actually the need for the subnets to share the same route table, in our case we use different subnets (one in each AZ) and each one point to a NAT instance present in its AZ (so table are different for each subnet).

I was wondering if updating more then one route table would be a possible solution, the list of routing table to alter would be given with an AWS tag (where the backend would update any routing table tagged a certain way, the tag name and value would be given as a parameter).
or we can also image an autoscaling group feature, where the backend introspect the instance autoscaling group, fetch the list of subnets involved and then update any routing table associated ?

For now the backend inspect the instance subnet and update the routing table associated.
This won't solve the limitation problem but would certainly make the backend more usable many case ?

bernielomax · 2016-08-12T06:23:40Z

@Grindizer did you have any luck with this? I would like to do the same.

anubhavmishra · 2016-10-26T22:40:13Z

@rohansingh What did you guys end up using? VXLAN option?

rohansingh · 2016-10-29T16:00:13Z

@anubhavmishra We went with this (copied from a presentation, sorry):

Alternative: BGP

Border Gateway Protocol, the routing protocol of
the internet.

BGP peers connect to each other and exchange routes.

OMG. What did you do?

We configured our top-of-rack switches to accept
routes from our docker hosts.

And then installed a bgp daemon, bird, on each
docker host.

So how does it work now?

We install flannel on docker hosts, just to do
IP allocation.

Docker uses the machine subnet allocated by flannel.

bird takes a look at what the machine subnet is,
and advertises this to the top-of-rack switch.

That top-of-rack switch exchanges routes with its
peers, so packets get routed correctly.

Is this better?

More complexity at startup since we need both
bird and BGP to work.

But after initialization, the system is resilient.

Also, packets are "normal":
traceroute makes sense, etc.

eyakubovich assigned MohdAhmad Jun 1, 2015

eyakubovich mentioned this issue Jun 1, 2015

Investigate use of multiple route tables on AWS #193

Closed

jonboulle unassigned MohdAhmad Oct 6, 2015

jonboulle added area/performance components/backend/aws labels Oct 6, 2015

tomdee closed this as completed Mar 13, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AWS VPC limits curb scalability of aws-vpc backend #164

AWS VPC limits curb scalability of aws-vpc backend #164

rohansingh commented Apr 30, 2015

rynbrd commented May 3, 2015

eyakubovich commented May 8, 2015

rohansingh commented May 8, 2015

eyakubovich commented May 13, 2015

rohansingh commented May 13, 2015

Grindizer commented Nov 19, 2015

bernielomax commented Aug 12, 2016

anubhavmishra commented Oct 26, 2016

rohansingh commented Oct 29, 2016

Alternative: BGP

OMG. What did you do?

So how does it work now?

Is this better?

AWS VPC limits curb scalability of aws-vpc backend #164

AWS VPC limits curb scalability of aws-vpc backend #164

Comments

rohansingh commented Apr 30, 2015

rynbrd commented May 3, 2015

eyakubovich commented May 8, 2015

rohansingh commented May 8, 2015

eyakubovich commented May 13, 2015

rohansingh commented May 13, 2015

Grindizer commented Nov 19, 2015

bernielomax commented Aug 12, 2016

anubhavmishra commented Oct 26, 2016

rohansingh commented Oct 29, 2016

Alternative: BGP

OMG. What did you do?

So how does it work now?

Is this better?