Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get kubernetes node pool overhead? #248

Open
CapCap opened this issue Jun 26, 2020 · 6 comments
Open

get kubernetes node pool overhead? #248

CapCap opened this issue Jun 26, 2020 · 6 comments

Comments

@CapCap
Copy link

CapCap commented Jun 26, 2020

when looking at an instance type for a node pool, it shows "available" memory. Is there a way to get this from the API?
image

@andrewsomething
Copy link
Member

Hi @CapCap,

That information is not currently exposed via the API. Though ,there is some documentation covering it here:

https://www.digitalocean.com/docs/kubernetes/#allocatable-memory

Curious to know a bit more about your use case here.

If you're looking to ensure that enough memory is available for your workload, you might want to look at the autoscaling option.

@CapCap
Copy link
Author

CapCap commented Jun 29, 2020

Hey @andrewsomething!

I love the autoscaling feature, but my understanding is that requires fixed node pool instance sizes. I'm trying to handle the case where (this is a really simplified example) we have a node pool of instances with 2GB usable memory, and one of our teams wants to deploy something which requests 3GB memory: this would sit around as unschedulable forever. So I am trying to, in the event there are no existing node pools that can handle the deployment, automatically create one that will.

@timoreimann
Copy link

Hi @CapCap 👋 I work on the DOKS team. First of all, thanks for your feedback!

You are correct that, with a single 2 GB node pool, the workload in question would be stuck forever. However, if you had another 3 GB node pool available that was also auto-scaled, the cluster autoscaler should already be smart enough today to increase the node size of that one to make room for the pending workload.

Of course the bummer here could be that you may not want to keep one node per pool standing by just in case you may want to scale up as that could lead a fairly wasteful usage (and, likewise, higher cost). We do have plans though to support scale-to-zero node pools allowing pools to have no worker nodes at all until they are really needed. This would also handle the case where the resources might not be needed anymore at one point by automatically scaling down as resource usage drops below a threshold.

Would that be something that will serve your use case?

@CapCap
Copy link
Author

CapCap commented Jun 29, 2020

Hi @timoreimann!
For our use multi-tenant k8s cluster we don't know necessarily in advance what the requirements for a workload will be (and in the long we want to do bin-packing to optimize costs further), so it's possible we won't need a given instance for a very long time, i.e if we have users with one-off (or rarely run jobs), as we'd end up with potentially large machines sitting around doing nothing for a large majority of the time.

So I think scaling down to zero for us could definitely do it, but I'm thinking our approach would be to create many node pools (potentially one per instance type?) which are scaled to zero- so if a user needs to run a workload that doesn't fit, it'll spin up a node for them that will. It feels a little hacky, but I think does ultimately does achieve what I'm looking for: does that work for you guys?

@timoreimann
Copy link

@CapCap I agree that having to provide zero-scaled "ready" node pools is a bit inconvenient. Luckily, the autoscaler framework also provides for the ability to create entirely new node pools on demand: we could add support for that to a future version of the DigitalOcean autoscaler provider so that it detects when capacity in any of the existing node pools wouldn't suffice even after scaling out and subsequently triggers the creation of a brand new pool with just enough capacity to host the pending workload.

To be honest, this extension of the autoscaler is an area we haven't looked at too deeply yet, so it'd probably only come after we have provided the "hacky" solution. At one point though, it could become a more convenient way to manage unpredictable workload demands very flexibly.

@CapCap
Copy link
Author

CapCap commented Jul 6, 2020

@timoreimann if you're okay with it, I'm okay with it :-) thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants