-
Notifications
You must be signed in to change notification settings - Fork 993
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When gpu lost, scheduler will assign pod to wrong node #1782
Comments
l am a little confused, since the first pod (4 GPU) occupies the node first, why is the pod (1 GPU) can still be scheduled to the node, did l miss sth ? |
The problem is here: When schedule start , sync scheduler cache, tigger volcano/pkg/scheduler/api/node_info.go Lines 336 to 343 in b119114
first pod Resreq 4 gpu, node has 3 gpu can allocate, it raise error. volcano/pkg/scheduler/cache/event_handlers.go Lines 187 to 205 in a7ecd08
but in |
What happened:
After pod scheduled, it create fail.
What you expected to happen:
Pod Shouldn't schedule to this Node.
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Environment:
kubectl version
):uname -a
):The text was updated successfully, but these errors were encountered: