-
Notifications
You must be signed in to change notification settings - Fork 994
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Umbrella] Support Suspend
in volcano
#3875
Comments
cc @GhangZh |
+1, this will be a very useful feature 👍 |
Hi, Thanks for your contribution! Just a little confused here, I'd like to know why kueue needs vcjob suspension. kueue is an independent queue management project, and Volcano also has its own queue, Volcano's own scheduling and queue are integrated. Why do we need to adapt a separate queue project? |
Thanks @Monokaix for the inputs and sorry for the incomplete context, I'll explain more clearly here. First of all, I think we can come to a consequence that And the second question about why kueue needs this is just because asked by our users and community partners, they use both volcano and other schedulers in their clusters, they hope then can have a global job queueing system in the front. And they do have a forked volcano running in their clusters to finish this work. Based on this, I do think this is reasonable to support
|
FYI: I tried to update the vcjob with status.state.phase = Aborted, it doesn't work. It will be rolled to Running. |
Another problem is once resumed, the completed ones will be ignored which means we'll restart a fairly new task. |
I think This way you don’t need to change any API of volcano, you just need to create an external controller. |
Thanks @hwdef Any document about jobTemplate, what's this used for in volcano? Will vcjob controller watch for this resource? |
Please check this: It is the template of vcjob and can be referenced by jobflow |
Can jobtemplate be rolled back? Suspend means a |
What is the problem you're trying to solve
We would like to make vcjob part of the Kueue ecosystem working as a high level job queueing component. Meanwhile, in preemption scenarios, suspending the job (or other terms in volcano) is somehow a foundational capacity.
Describe the solution you'd like
Make it possible to suspend the vcjob and reclaim the owned Pods. This can be achieved via two ways:
job.spec.suspend
I would like to leverage the existing function rather than reinventing the wheel if possible.
Additional context
related issues:
The text was updated successfully, but these errors were encountered: