-
Notifications
You must be signed in to change notification settings - Fork 295
etcd management #27
Comments
I should add that i've been running a test cluster with my monkeypatch branch and it seems to perform well. I've been seeing a lot of warnings like these though:
I think it's because the controller is configured to talk to the internal ELB and we might be hitting an etcd node that's lagging on replication but i'm not sure. It doesn't seem to break anything though. |
@pieterlange made some progress on |
Yes, this is obviously where this is all going eventually but i think it might be many moons before all components in that deployment scenario reach a stable state. It would be nice to get some clarity from the coreos crew on deployment plans for this in case i'm completely wrong 😬 Edit: they have, i missed their latest blogs: |
Hey, thanks for bringing up the discussion! @pieterlange I agree with every missing feature you've listed in the first comment. I believe we should eventually provide standard way(s) to cover those. At the beginning, as I am looking forward with introducing @crewjam's etcd-aws (I was very impressed when I first read his blog post describing it btw) as a viable option, I'd like to see your branch https://github.com/pieterlange/kube-aws/tree/feature/external-etcd to get merged into master. Would you mind pull-requesting it so that everyone can start to experiment/feedback(I believe this is what we'd like to have 😄 ) easier using rc binaries of kube-aws? |
@camilb Hi, I've just read through your great work in the your etcd-asg branch! Though I'm not sure whether we can include the etcd-aws part entirely in kube-aws soon enough or not, I'd like to merge significant parts of your work to support SSL to external etcd in calico-enabled setup camilb/coreos-kubernetes@a7a14a2...29d538b into kube-aws(Am I missed something? Let me know!) In that way, combined with @pieterlange's work, we can also encourage everyone to try out calico+etcd-aws+ssl setup easier than now, using rc binaries of kube-aws. Would you mind pull-requesting that part of your work and collaborate more with us? 😃 |
I'm not sure we'd want to merge my branch in yet. It's fairly intrusive. I'm also still digesting the etcd-operator news.. Can we work on a |
@pieterlange Sure. Anyways, I've created that branch wishing it to become a possible place to merge our effort 👍 https://github.com/coreos/kube-aws/tree/experimental/external-etcd |
@mumoshu I'll be glad to. The |
@pieterlange I know you're already working on this, but would you let me explicitly assign this issue to you just for clarity? 🙇 |
@camilb Any chance you could take a look into this recently? |
FYI I've written #298 (comment) about the general concerns about H/A of an etcd cluster and why we don't collocated etcd on controller nodes. |
#332 for the ASG-EBS-EIP/ENI-per-etcd-node strategy of achieving H/A and rolling-updates and an etcd cluster is almost finished 😃 |
fyi: @hjacobs kindly shared me that his company uses https://github.com/zalando-incubator/stups-etcd-cluster for running dedicated etcd clusters, separately from k8s clusters. |
@gianrubio Thanks for the info! |
fyi my question is briefly:
|
This change is basically for achieving "Managed HA etcd cluster" with private IPs resolved via public EC2 hostnames stabilized with a pool of EBS and EIP pairs for etcd nodes. After this change, EC2 instances backing "virtual" etcd nodes are managed by an ASG. Supported use-cases: * Automatic recovery from temporary Etcd node failures * Even if all the nodes went down, the cluster recovers eventually as long as the EBS volumes aren't corrupted * Rolling-update of the instance type for etcd nodes without downtime * = Scaling-out of Etcd nodes via NOT modifying the ASG directly BUT indirectly via CloudFormation stack updates * Other use-cases implied by the fact that the nodes are managed by ASGs * You can choose "eip" or "eni" for etcd node(=etcd member) identity via the `etcd.memberIdentityProvider` key in cluster.yaml * `"eip"`, which is the default setting, is recommended * If you want, choose `"eni"`. * If you choose `"eni"`, and your region has less than 3 AZs, setting `etcd.internalDomainName` to something other than default is HIGHLY RECOMMENDED to prepare for disaster recovery * It is an advanced option but DNS other than Amazon DNS could be used (when `memberIdentityProvider` is `"eni"`, `internalDomainName` is set, `manageRecordSets` is `false`, and every EC2 instance has a custom DNS which is capable of resolving FQDNs under `internalDomainName`) Unsupported use-cases: * Automatic recovery from more than `(N-1)/2` permanent Etcd nodes failure. * Requires etcd backups and automatic determination of whether the new etcd cluster should be created or not via `ETCD_INITIAL_CLUSTER_STATE` * Scaling-in of Etcd nodes * Just remains untested because it isn't my primary focus in this area. Contributions are welcomed Relevant issues to be (partly) resolved via this PR: * Part(s) of kubernetes-retired#27 * Wait signal for etcd nodes. See kubernetes-retired#49 * Probably kubernetes-retired#189 kubernetes-retired#260 as this relies on stable EC2 public hostnames and AWS DNS for peer communication and discovery regardless of whether an EC2 instance relies on a custom domain/hostname or not The general idea is to make Etcd nodes "virtual" by retaining the state and the identity of an etcd node in a pair of an EBS volume and an EIP or an ENI, respectively. This way, we can recover/recreate/rolling-update EC2 instances backing etcd nodes without another moving parts like external apps and/or ASG lifecycle hooks, SQS queues, SNS topics, etc. Unlike well-known etcd HA solutions like crewjam/etcd-aws and MonsantoCo/etcd-aws-cluster, this is intended to be a less flexible but a simpler alternative or the basis for introducing a similar solutions like those. * If you rely on Route 53 record sets, don't modify ones initially created by CloudFormation * Doing so breaks CloudFormation stack deletions because it has no way to know about modified record sets and therefore can't cleanly remove them. * To prepare for a disaster recovery for a single-AZ etcd cluster(possible when the user relies on an AWS region with 2 or less AZs), use Route 53 record sets or EIPs to retain network identities among AZs * ENIs and EBS can't be moved to an another AZ * EBS volume can, however, be transferred utilizing a snapshot * Static private IPs via a pool of ENIs dynamically assigned to EC2 instances under control of a single ASG * ENIs can't move around different AZs. What happens when you have 2 ENIs in and 1 ENI in different AZs and the former AZ goes down? Nothing until the AZ comes up! It isn't the degree of H/A I wish to have at all! * Dynamic private IPs via stable hostnames using a pool of EIP&EBS pairs, single ASG * EBS is required in order to achieve "locking" of a pair associated to an etcd instance * First of all, identify the "free" pair by filtering available EBS volumes and try to associate it to the EC2 instance * Successful association of an EBS volume means that the paired EIP can also be associated to the instance without race conditions * EBS can't move around different AZs. What happens when you have 2 pairs in AZ 1 and 1 pair in AZ 2? Once the AZ 2 goes down, the options you can take are (1) manually altering AZ 2 to have 3 etcd nodes and then manually elect a new leader (2) recreate the etcd cluster within AZ 2 by modifying `etcd.subnets[]` point to AZ 2 in cluster.yaml and running `kube-aws update`, ssh into one of nodes and restore etcd state from a backup. Neither is automatic.
FYI my question was answered thanks to @hongchaodeng 😄 |
In a nutshell, I'd like to start with a POC which supports:
And then extend it to also support:
Finally extend it to also support:
|
It turns out, when recreating an etcd cluster from a backup, we need a single founding member and all the subsequent etcd members needs to be added one by one; This(i.e. dynamic reconfiguration of etcd cluster) diverges greatly from a static configuration of etcd cluster we're currently relying on. Supporting both static and dynamic configuration of etcd members does complicate the implementation. |
Etcd v3, AFAIK supported in k8s 1.6, seems to provide a new way of restoring an etcd cluster in a way more similar to the static configuration according to etcd-io/etcd#2366 (comment) |
Submitted #417 for this.
|
Hello. I am trying to restore a k8s cluster from etcd data backup. I am able to restore the etcd cluster to it original state (has all the k8s info). However, when I get the k8s cluster rc,services,deployments,etc they are all gone. The k8s cluster is not like before the restore. |
Hi @ReneSaenz, thanks for trying kube-aws 👍 Several questions came to my mind at this point:
|
@ReneSaenz did you finish etcd backup restore before any controller-managers are started? And certainly not doing a restore when controller-managers are running? |
@mumoshu I am not using aws. Sorry for the confusion. The etcd backup is done like this. @redbaron When before making a backup, I stopped the etcd process. Make the backup, and then restarted etcd process again. |
…avour-0.9.9 to hcom-flavour * commit '0e116d72ead70121c730d3bc4009f8d562e16912': (24 commits) RUN-788 Add kubectl run parameters Allow toggling Metrics Server installation Correct values for the `kubernetes.io/cluster/<Cluster ID>` tags Resolves kubernetes-retired#1025 Fix dashboard doco links Fix install-kube-system when node drainer is enabled Follow-up for kubernetes-retired#1043 Two fixes to 0.9.9 rc.3 (kubernetes-retired#1043) Update the documentation for Kubernetes Dashboard. Improve the configuration for Kubernetes Dashboard. Fix the creation of all metrics-server resources. Use templated image for metrics-server. Follow-ups for Kubernetes 1.8 Metrics Server addon. (kubernetes-retired#973) Quick start and high availability guides Add rkt container cleanup to journald-cloudwatch-logs service Update Tiller image to v2.7.2 Update kube-dns 1.14.7 Bump Cluster Autoscaler version to 1.0.3 Bump Kubernetes and ETCD version. Support EC2 instance tags per node role This feature will be handy when e.g. your monitoring tools discovers EC2 instances and then groups resource metrics with EC2 instance tags. Fix the default FleetIamRole Closes kubernetes-retired#1022 ...
This change is basically for achieving "Managed HA etcd cluster" with private IPs resolved via public EC2 hostnames stabilized with a pool of EBS and EIP pairs for etcd nodes. After this change, EC2 instances backing "virtual" etcd nodes are managed by an ASG. Supported use-cases: * Automatic recovery from temporary Etcd node failures * Even if all the nodes went down, the cluster recovers eventually as long as the EBS volumes aren't corrupted * Rolling-update of the instance type for etcd nodes without downtime * = Scaling-out of Etcd nodes via NOT modifying the ASG directly BUT indirectly via CloudFormation stack updates * Other use-cases implied by the fact that the nodes are managed by ASGs * You can choose "eip" or "eni" for etcd node(=etcd member) identity via the `etcd.memberIdentityProvider` key in cluster.yaml * `"eip"`, which is the default setting, is recommended * If you want, choose `"eni"`. * If you choose `"eni"`, and your region has less than 3 AZs, setting `etcd.internalDomainName` to something other than default is HIGHLY RECOMMENDED to prepare for disaster recovery * It is an advanced option but DNS other than Amazon DNS could be used (when `memberIdentityProvider` is `"eni"`, `internalDomainName` is set, `manageRecordSets` is `false`, and every EC2 instance has a custom DNS which is capable of resolving FQDNs under `internalDomainName`) Unsupported use-cases: * Automatic recovery from more than `(N-1)/2` permanent Etcd nodes failure. * Requires etcd backups and automatic determination of whether the new etcd cluster should be created or not via `ETCD_INITIAL_CLUSTER_STATE` * Scaling-in of Etcd nodes * Just remains untested because it isn't my primary focus in this area. Contributions are welcomed Relevant issues to be (partly) resolved via this PR: * Part(s) of kubernetes-retired#27 * Wait signal for etcd nodes. See kubernetes-retired#49 * Probably kubernetes-retired#189 kubernetes-retired#260 as this relies on stable EC2 public hostnames and AWS DNS for peer communication and discovery regardless of whether an EC2 instance relies on a custom domain/hostname or not The general idea is to make Etcd nodes "virtual" by retaining the state and the identity of an etcd node in a pair of an EBS volume and an EIP or an ENI, respectively. This way, we can recover/recreate/rolling-update EC2 instances backing etcd nodes without another moving parts like external apps and/or ASG lifecycle hooks, SQS queues, SNS topics, etc. Unlike well-known etcd HA solutions like crewjam/etcd-aws and MonsantoCo/etcd-aws-cluster, this is intended to be a less flexible but a simpler alternative or the basis for introducing a similar solutions like those. * If you rely on Route 53 record sets, don't modify ones initially created by CloudFormation * Doing so breaks CloudFormation stack deletions because it has no way to know about modified record sets and therefore can't cleanly remove them. * To prepare for a disaster recovery for a single-AZ etcd cluster(possible when the user relies on an AWS region with 2 or less AZs), use Route 53 record sets or EIPs to retain network identities among AZs * ENIs and EBS can't be moved to an another AZ * EBS volume can, however, be transferred utilizing a snapshot * Static private IPs via a pool of ENIs dynamically assigned to EC2 instances under control of a single ASG * ENIs can't move around different AZs. What happens when you have 2 ENIs in and 1 ENI in different AZs and the former AZ goes down? Nothing until the AZ comes up! It isn't the degree of H/A I wish to have at all! * Dynamic private IPs via stable hostnames using a pool of EIP&EBS pairs, single ASG * EBS is required in order to achieve "locking" of a pair associated to an etcd instance * First of all, identify the "free" pair by filtering available EBS volumes and try to associate it to the EC2 instance * Successful association of an EBS volume means that the paired EIP can also be associated to the instance without race conditions * EBS can't move around different AZs. What happens when you have 2 pairs in AZ 1 and 1 pair in AZ 2? Once the AZ 2 goes down, the options you can take are (1) manually altering AZ 2 to have 3 etcd nodes and then manually elect a new leader (2) recreate the etcd cluster within AZ 2 by modifying `etcd.subnets[]` point to AZ 2 in cluster.yaml and running `kube-aws update`, ssh into one of nodes and restore etcd state from a backup. Neither is automatic.
There are currently some people forking because we're not sure about the current etcd solution. Lets discuss the issues in this topic. A lot of us seem to center around @crewjam's etcd solution but there's also others:
I have a personal preference for https://crewjam.com/etcd-aws/ (https://github.com/crewjam/etcd-aws) but we should definitely have this conversation as a community (as i tried in the old repo coreos/coreos-kubernetes#629)
Lets combine our efforts @colhom @camilb @dzavalkinolx
Branches for inspiration:
Currently missing features:
As noted in the overall production readiness issue #9 there's also work being done on etcd being hosted inside of kubernetes itself, which is probably where all of this is going in the end.
The text was updated successfully, but these errors were encountered: