-
Notifications
You must be signed in to change notification settings - Fork 264
Add Pod Condition and unblock cluster autoscaler #526
Comments
I have not checked code yet. I think pods status needs to be updated in the scheduling process. Correct me if I am wrong. If there's any code change work, I can take it. Thanks! |
A following up questions: Assume this is addressed and we have a job consist of 5 pod, each one requires 1 cpu. Now, we only have 4 nodes (1 CPU 1Gb Memory). Ideally, pending jobs waits for 1 more cpu, autoscaler at this time will detect 5 pending jobs, it will scale up 5 nodes finally.. Do you think if we can make some improvement either on autoscaler side or kube-batch side? |
Yes, kube-batch should also update Pod's status in this case; code change/PR is necessary :)
oh, for this case, maybe PodGroup's status can help; we need a detail solution for that :) |
@k82cn Thanks! I will submit a PR to reflect pod condition changes. |
Marking pod as unschedulable will make CA notice the pending pod, but it's by no means enough to make it work with kube-batch. CA has been built specifically for kubernetes default scheduler and the whole logic is built around scheduler code imported form kubernetes/kubernetes repo. More details in #533 (review). |
I think CA is only built based on predicates, same with kube-batch; but after |
This is why I initially commented on kubernetes/enhancements#639. CA is part of Kubernetes, anything new feature in default scheduler should either be compatible with CA or have a design including how the new feature would be added to CA agreed between sig-scheduling and sig-autoscaling. |
Can we de-couple that? For scheduler, we can not include all algorithms in upstream; instead, I'd suggest user to build customized algorithm based on scheduler framework or http extender. In that case, CA can not work :( |
Sorry, I'm not sure I understand your comment. If the user builds customized algorithm using extender they can no longer use autoscaling. That's how it's always been. If there is a plan to include a feature in default scheduler (ie. it will run if you use default scheduler that comes in kubernetes tarball without installing any custom schedulers or extenders and / or recompiling anything), than I think working with autoscaler should be a prerequisite. |
@k82cn @MaciekPytel Takes some time to finish a PR #535 to add PodCondition, but right now, this won't trigger ScaleUp action based on my test. CA won't trigger scale up directly with resources of pending jobs, it will FilterOutSchedulable firstly but one by one. (https://github.com/kubernetes/autoscaler/blob/4002559a4c69c5624ee685dbb2f9dd2e6240b896/cluster-autoscaler/core/utils.go#L112-L147) So in this case, I have two nodes - NodeA 3.7cpus NodeB 2.3 cpus. 4 pods in podGroup, each requires 2cpu. |
There is no resource calculation at all in CA. It uses default scheduler logic for binpacking pods (it puts them one by one on in-memory fake node objects to see how many nodes would be needed). And, as you noticed, it will not scale-up before all space in cluster is used up. |
Right now, we add generic support for pod condition and close this issue. For cluster autoscaling part, we definitely have to take more things into consideration for kube-batch use case. I will try to figure all them out and have a separate issue/doc for you to review. |
+1 |
Is this a BUG REPORT or FEATURE REQUEST?:
/kind feature
What happened:
Cluster autoscaler can not scale up nodes if pending pods are scheduled by kube-batch
After some investigation, I notice cluster autoscaler use following logic to filter pending pods. In this case, pending pods scheduled by kube-batch won't trigger autoscaling and it has to wait for other pods to release resource. The root cause is because pods in pending doesn't have podCondition and autoscaler will skip those pods.
pods in pending scheduled by kube-batch will have status spec like this
Look at normal pending unschedule pods scheduled by kubernetes scheduler.
What you expected to happen:
How to reproduce it (as minimally and precisely as possible):
Try to schedule job using kube-batch and use autoscaler for node scaling.
Anything else we need to know?:
I notice #521 will add PodGroupStatus, but I think this won't work with autoscaler either.
Environment:
Kubernetes version (use
kubectl version
):Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.3", GitCommit:"2bba0127d85d5a46ab4b778548be28623b32d0b0", GitTreeState:"clean", BuildDate:"2018-05-21T09:17:39Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"11+", GitVersion:"v1.11.5-eks-6bad6d", GitCommit:"6bad6d9c768dc0864dab48a11653aa53b5a47043", GitTreeState:"clean", BuildDate:"2018-12-06T23:13:14Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}
Cloud provider or hardware configuration: aws
OS (e.g. from /etc/os-release):
NAME="Amazon Linux"
VERSION="2"
ID="amzn"
ID_LIKE="centos rhel fedora"
VERSION_ID="2"
PRETTY_NAME="Amazon Linux 2"
ANSI_COLOR="0;33"
CPE_NAME="cpe:2.3:o:amazon:amazon_linux:2"
HOME_URL="https://amazonlinux.com/"
Kernel (e.g.
uname -a
): 4.14.77-81.59.amzn2.x86_64 typo fixes #1 SMP Mon Nov 12 21:32:48 UTC 2018 x86_64 x86_64 x86_64 GNU/LinuxInstall tools: eksctl
Others:
The text was updated successfully, but these errors were encountered: