-
Notifications
You must be signed in to change notification settings - Fork 993
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add task status named ReleasingFailed #2943
Conversation
Signed-off-by: chenfengyu <yuxubst@126.com>
Thanks for your contributions!
|
Thanks, I'll add ut test next week. |
Signed-off-by: chenfengyu <yuxubst@126.com>
Signed-off-by: chenfengyu <yuxubst@126.com>
can you help review it again? |
Signed-off-by: chenfengyu <yuxubst@126.com>
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Signed-off-by: chenfengyu <yuxubst@126.com>
/assign @qiankunli |
It is clear to me now. Just a little comment, why don't put node-name "n3" at the parameter location of function buildPod? Thanks. case2Pod3 := util.BuildPod("c2", "p3", "", v1.PodRunning, util.BuildResourceListWithGPU("2", "4G", "3"), "pg3", make(map[string]string), make(map[string]string))
case2Pod3.CreationTimestamp = metav1.Time{Time: time.Now().Add(-20 * time.Minute)}
case2Pod3.Spec.NodeName = "n3" Hi, @wangyang0616 please help to review this pr. It has something to do with the enhancement in PR #2815 . |
Signed-off-by: chenfengyu <yuxubst@126.com>
/assign @wangyang0616 @william-wang @huone1 |
Can this pr #2815 fix your problem. |
thanks,it looks better |
@ycfnana Thanks for your report. Personally speaking, I think the root reason is something goes wrong with the unexpected pod status instead of task's status. There is no enough resource to start new pods, which belong to the pipelined jobs. So I don't get the point why should add a task status and what this can help for users. |
thanks, sometimes pod status is pending even if there are enough resources to start a new pod because of unexpected pod status, you can see my comment in my ut code. But this pr #2815 may solve my problem. I'll close this pr if test it's ok |
the question is #2922
I met a problem, pod always be Pending when one node has zombie pod and another node has enough reource, due to the relaesing status, I think waiting time larger than TerminationGracePeriodSeconds it should not be releaseing, may happen something error, its should be another status like ReleasingFailed