Skip to content
This repository has been archived by the owner on Sep 4, 2021. It is now read-only.

Create clusters with HA masters by default #90

Closed
jimmycuadra opened this issue Oct 2, 2015 · 7 comments
Closed

Create clusters with HA masters by default #90

jimmycuadra opened this issue Oct 2, 2015 · 7 comments

Comments

@jimmycuadra
Copy link

Ideally Kubernetes clusters should have highly available masters. Currently k8s nodes are auto scaled, but the master is not. This can be achieved with the combination of 1) an ELB and 2) either the podmaster (whose spec is already included in the public artifacts), or the use of fleet to guarantee that only one copy each of the controller manager and scheduler are running at once.

@errm
Copy link

errm commented Oct 5, 2015

👍 This is the one thing that is stopping is from switching to this setup already. it looks like podmaster is already configured so its just a case of dropping in an ELB...

@ghost
Copy link

ghost commented Oct 6, 2015

@eliaslevy
Copy link
Contributor

See #147.

@tomdee
Copy link
Member

tomdee commented Apr 19, 2016

I believe multimaster is now supported in this repo for k8s 1.2

@mumoshu
Copy link
Contributor

mumoshu commented Apr 29, 2016

@tomdee I have recently started looking into this, too. It can be supported if you modify cfn templates kube-aws wrote. Not out of the box though.

Let me share my incomplete thoughts just not to stop this discussion.

IFAIK, we have to think of HA for apiserver, scheduler/proxy/controller-manager, and etcd respectively.

apiservers seem to be state-less. So you just may want to have 2 or more of them(to not make an apiserver your SPOF). Then, at least, you need to tell workers where the live apiservers are. @eliaslevy seems to have done it in his PR #147 through an internal load balancer having a well-known dns name (Btw, thanks for sharing the great PR @eliaslevy !)
This can't be done out of the box with coreos-kubernetes yet.

scheduler/proxy/controller-manager should have --leader-elect=true on their startup. This seems to have already done.

For etcd, I guess you need to form a H/A etcd cluster consists of at-least 3 members. Each member should be located in different availability-zone(Btw, how everyone do this? Is there an AWS region which has 3 AZ open to its users?) to make single member's failure not to result in breaking quorum.

Well, so how everyone is doing it? :)

@brandonweeks
Copy link

@mumoshu here is a list. Most regions have at least three but there are a few with only two.

Personally we design around a two AZ per region model, so I would prefer the option to have five etcd servers across two AZs.

@eliaslevy
Copy link
Contributor

@brandonweeks operating across only two AZs leaves you at risk of failure if a single AZ fails (the ones with the majority of etcd nodes), as you won't have a quorum.

@colhom colhom closed this as completed Jan 3, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

9 participants