-
Notifications
You must be signed in to change notification settings - Fork 466
Create clusters with HA masters by default #90
Comments
👍 This is the one thing that is stopping is from switching to this setup already. it looks like podmaster is already configured so its just a case of dropping in an ELB... |
See #147. |
I believe multimaster is now supported in this repo for k8s 1.2 |
@tomdee I have recently started looking into this, too. It can be supported if you modify cfn templates Let me share my incomplete thoughts just not to stop this discussion. IFAIK, we have to think of HA for apiserver, scheduler/proxy/controller-manager, and etcd respectively. apiservers seem to be state-less. So you just may want to have 2 or more of them(to not make an apiserver your SPOF). Then, at least, you need to tell workers where the live apiservers are. @eliaslevy seems to have done it in his PR #147 through an internal load balancer having a well-known dns name (Btw, thanks for sharing the great PR @eliaslevy !) scheduler/proxy/controller-manager should have For etcd, I guess you need to form a H/A etcd cluster consists of at-least 3 members. Each member should be located in different availability-zone(Btw, how everyone do this? Is there an AWS region which has 3 AZ open to its users?) to make single member's failure not to result in breaking quorum. Well, so how everyone is doing it? :) |
@mumoshu here is a list. Most regions have at least three but there are a few with only two. Personally we design around a two AZ per region model, so I would prefer the option to have five etcd servers across two AZs. |
@brandonweeks operating across only two AZs leaves you at risk of failure if a single AZ fails (the ones with the majority of etcd nodes), as you won't have a quorum. |
Ideally Kubernetes clusters should have highly available masters. Currently k8s nodes are auto scaled, but the master is not. This can be achieved with the combination of 1) an ELB and 2) either the podmaster (whose spec is already included in the public artifacts), or the use of fleet to guarantee that only one copy each of the controller manager and scheduler are running at once.
The text was updated successfully, but these errors were encountered: