-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evaluate kops vs EKS #28
Comments
AWS is aware of problem (1) and have a roadmap for pod density fixes. Realistically, I don't expect that for another 5-6 months at the earliest. |
My current setup is:
|
Things I've had to patch so far:
|
So kubernetes/enhancements#1144 ended up needing some work. It needed a couple feature gates enabled. And thankfully, kops makes this easy. I now have a cluster working almost the way I want with the following kops config. apiVersion: kops.k8s.io/v1alpha2
kind: Cluster
metadata:
creationTimestamp: "2021-02-13T23:25:11Z"
name: farallon-2i2c.k8s.local
spec:
clusterAutoscaler:
enabled: true
api:
loadBalancer:
class: Classic
type: Public
authorization:
rbac: {}
dns:
kubeDNS:
provider: CoreDNS
channel: stable
cloudProvider: aws
configBase: s3://2i2c-farallon-pangeo-kops/farallon-2i2c.k8s.local
containerRuntime: docker
etcdClusters:
- cpuRequest: 200m
etcdMembers:
- instanceGroup: master-us-east-2a
name: a
memoryRequest: 100Mi
name: main
- cpuRequest: 100m
etcdMembers:
- instanceGroup: master-us-east-2a
name: a
memoryRequest: 100Mi
name: events
iam:
allowContainerRegistry: true
legacy: false
kubelet:
anonymousAuth: false
featureGates:
LegacyNodeRoleBehavior: "false"
ServiceNodeExclusion: "false"
kubeControllerManager:
featureGates:
LegacyNodeRoleBehavior: "false"
ServiceNodeExclusion: "false"
kubernetesApiAccess:
- 0.0.0.0/0
kubernetesVersion: 1.19.7
masterPublicName: api.farallon-2i2c.k8s.local
networkCIDR: 172.20.0.0/16
networking:
calico:
majorVersion: v3
nonMasqueradeCIDR: 100.64.0.0/10
sshAccess:
- 0.0.0.0/0
subnets:
- cidr: 172.20.32.0/19
name: us-east-2a
type: Public
zone: us-east-2a
topology:
dns:
type: Public
masters: public
nodes: public
---
apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
creationTimestamp: "2021-02-13T23:25:12Z"
labels:
kops.k8s.io/cluster: farallon-2i2c.k8s.local
name: master-us-east-2a
spec:
image: 099720109477/ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20210119.1
machineType: t3.medium
maxSize: 1
minSize: 0
nodeLabels:
kops.k8s.io/instancegroup: master-us-east-2a
role: Master
subnets:
- us-east-2a
---
apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
labels:
kops.k8s.io/cluster: farallon-2i2c.k8s.local
hub.jupyter.org/pool-name: notebook-m5-xlarge
name: notebook-m5-xlarge-2021-02-15
spec:
image: 099720109477/ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20210119.1
cloudLabels:
k8s.io/cluster-autoscaler/node-template/label/hub.jupyter.org/pool-name: notebook-pool-m5-xlarge
k8s.io/cluster-autoscaler/node-template/taint/hub.jupyter.org_dedicated: user:NoSchedule
k8s.io/cluster-autoscaler/node-template/taint/hub.jupyter.org/dedicated: user:NoSchedule
taints:
- hub.jupyter.org_dedicated=user:NoSchedule
- hub.jupyter.org/dedicated=user:NoSchedule
nodeLabels:
hub.jupyter.org/pool-name: notebook-m5-xlarge
machineType: m5.xlarge
maxSize: 20
minSize: 0
role: Node
subnets:
- us-east-2a
---
apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
labels:
kops.k8s.io/cluster: farallon-2i2c.k8s.local
hub.jupyter.org/pool-name: notebook-m5-2xlarge
name: notebook-m5-2xlarge-2021-02-15
spec:
image: 099720109477/ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20210119.1
cloudLabels:
k8s.io/cluster-autoscaler/node-template/label/hub.jupyter.org/pool-name: notebook-pool-m5-2xlarge
k8s.io/cluster-autoscaler/node-template/taint/hub.jupyter.org_dedicated: user:NoSchedule
k8s.io/cluster-autoscaler/node-template/taint/hub.jupyter.org/dedicated: user:NoSchedule
taints:
- hub.jupyter.org_dedicated=user:NoSchedule
- hub.jupyter.org/dedicated=user:NoSchedule
nodeLabels:
hub.jupyter.org/pool-name: notebook-m5-2xlarge
machineType: m5.2xlarge
maxSize: 20
minSize: 0
role: Node
subnets:
- us-east-2a
---
apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
labels:
kops.k8s.io/cluster: farallon-2i2c.k8s.local
hub.jupyter.org/pool-name: dask-worker
name: dask-worker-2021-02-15
spec:
image: 099720109477/ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20210119.1
cloudLabels:
k8s.io/cluster-autoscaler/node-template/label/hub.jupyter.org/pool-name: dask-worker
k8s.io/cluster-autoscaler/node-template/taint/k8s.dask.org_dedicated: worker:NoSchedule
k8s.io/cluster-autoscaler/node-template/taint/k8s.dask.org/dedicated: worker:NoSchedule
taints:
- k8s.dask.org_dedicated=worker:NoSchedule
- k8s.dask.org/dedicated=worker:NoSchedule
nodeLabels:
hub.jupyter.org/pool-name: dask-worker
machineType: m5.2xlarge
maxSize: 50
minSize: 0
role: Node
subnets:
- us-east-2a I have so far absolutely liked this experience much much more than EKS! There is more control than EKS, which actually is a good thing - since EKS is only 'semi-managed', you are often stuck with places where they have made a constraining decision but not provided enough support to make that work easily. Maybe once managed node groups reach feature parity with other cloud providers. Something else I really really like is that there's one master node that's running t3.medium, and it contains our JupyterHubs too! So the total base cost becomes a little over 30$ a month. With EKS, you've to pay the 72$ a month master fee, and then enough money for a node to run hub infra. And that extra node needs to be much bigger too, due to low pod density. I still need to do spot instances, and test dask out. |
With Calico CNI by default, you get networkpolicy enforcement. So gotta make sure your dask setup works with that. |
Thanks for the example config and write up here. From my understanding of what you're trying to do here are some suggestions I would probably investigate if I were in your situation. You may already have some of these things done. This was the research I've done and some ideas. Reducing costKops is great and removes a lot of the operational burden from you. If you want to stick with it I would look into using spot worker nodes with mixed ASG instance types. You can get a list of comparable instance types using ec2 instance selector and you'll be able to reduce your cost by a lot. Two things to consider with spot is I'm not sure if the pods students connect to are stateful. If an instance gets shut down in the middle of a class it could be very disruptive. Especially at the pod density you're running. The second thing I would look at is k3s. If you're already running a single node master k3s could reduce the overhead quite a bit by using SQLite instead of etcd. If it were me I would run k3s via terraform and mount an EBS volume and and store sqlite there. If you make the API server node an ASG with min 1 max 1 and store your state in EBS it'll be easy to replace the instance if/when things go bad. There's a slightly older terraform project that you can use as a starting point. Even with k3s you can still create Spot based ASG worker nodes. You can also look at what kops is doing with I suspect you'd be able to reduce your control plane node size if you use k3s and reduce your worker node cost by more than 50% with Spot. Faster scalingThis problem was a bit more interesting to look at and I have 3 main recommendations. Build a custom AMI with images pre-pulledSince your container images are so large you can solve this problem a couple different ways. You can build a custom AMI with the images pulled and then the instance snapshotted. You'd want to build this AMI often to make sure you don't have to pull too much when nodes join the cluster. If you don't want to build a custom AMI you can also use EBS mounted at /var/run/docker and use snapshots and clone the disk for each worker node. I forget the exact syntax but you can do it in the launch template for the ASG. It would be great if you had a breakdown of how long it takes to start a pod currently with each stage broken down. Control ASG scale up outside of KubernetesThe cluster autoscaler works for gradual increases but it's not great for sudden bursts and is naturally reactive rather than proactive. I think you mentioned that a teacher has a web portal to create the student notebooks. I'm assuming behind the web app is something that scales up your deployment and then you let the autoscaler add instances. If you can change the web to scale up the ASG before it scales up the k8s deployment you can probably get ahead of the scaling needs buy 30-60 seconds. You could either have the deployment button talk directly to the aws API and ASG but that means you need to hard code region/ASG information. It would probably be easier to use EventBridge that can send a small amount of data like region and how many pods are going to be added. You can then do some basic math in a lambda to know how many instances to add to the ASG based on your desired pod density. You can still use the cluster autoscaler to scale down the ASG (eventually to 0) but scaling up fast will always be slower if you're trying to react to scheduled pods or running metrics vs pre-scaling the instances. Optimize container imageThis option is a bit more experimental but would be interesting to look at as a possibility. It would also require you to switch to containerd for your container runtime in Kubernetes but you should probably do that within the next year anyway as docker will be removed at some point. Looking at containerd plugins like stargz you can have large images that pull data on demand. This means you could probably have most of the initial UI files pulled but then pull the rest of the libraries after a student tries to run their notebook. This option may not work for you but if the other two options don't reduce your startup speed this would probably be the next thing I would look at. I would love to hear back if you try any of these suggestions and would be curious to know how well they work. Feel free to reach out if you have other questions. |
Thanks for the suggestions, @rothgar! Unfortunately, student pods depend on in-memory state, so we can't use spot instances. That's also why we can't aggressively scale down. We do use spot instances with dask though, since those are much more resilient to terminations. Building AMIs with container images pre-pulled is one of the things I'm most excited about with the move to kops. I just realized that this is possible with eksctl as well. Right now, we run student workflows mostly on Google Cloud - only research flows are on AWS. Will definitely put effort towards this when that changes. I've never really considered k3s for anything more than single node workflows. We try very hard to be as uniform across cloud providers as possible, so k3s on AWS doesn't seem worth the trouble for the differential. I also don't think the etcd resource cost makes a lot of difference in our case, but will consider it! kops just seems a lot more supported... stargz is definitely on our radar! Will report back when we start working on it. Thanks for responding here, @rothgar - we appreciate it. I think pod density is really our biggest blocker with EKS for many use cases, so will look forward to that getting better so we can re-evaluate it vs kops. |
I totally understand with spot and in memory state. Custom AMI support for managed node groups came out late last year so it should work for you if you start building an AMI with containers pre-pulled. We're working on VPC CNI pod density, but it's not quite ready yet. I'm a big fan of cilium too. They have a walk through on using it with EKS that might be helpful if you want to try EKS again https://docs.cilium.io/en/v1.9/gettingstarted/k8s-install-eks/ |
@rothgar ah, thanks! I'm just wary of doing a custom CNI when I've no control over the master Hope that makes sense. |
I'm playing with kops for managing AWS Kubernetes clusters instead of EKS. I've been frustrated with EKS a while:
(1) was the motivating case. (2) makes me feel like I'm already managing a lot of infrastructure intimately - might as well embrace the extra control provided by kops, no?
Going to test it out, and keep this issue updated
The text was updated successfully, but these errors were encountered: