-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Auto-scaling controller does not scale-down the Job #450
Labels
Comments
This is a bug due to:
|
Merged
Reopen this issue because the problem remains. |
I checked the log, it seems that total CPU request is less than the real value in the cluster. |
Closed
I have added this PR as an improvement: #456 |
I think this is fixed, please reopen if otherwise. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I submit 10 auto-scaling jobs with min-instance=2, max-instance=20, but job-0 has 20 trainers and job-9 has only 2 trainers.
There are too many PENDING trainer pod, but the logs in controller is as following:
In the logs
CPURequestMilli < CPUTotalMilli
, but I think it should beCPURequestMilli > CPUTotalMilli
in fact.The text was updated successfully, but these errors were encountered: