Bug Or Feature - Experiment enters failed state but trials left running #855

jlewi · 2019-10-04T00:28:39Z

/kind bug

What steps did you take and what happened:
I'm not sure if this is a bug or a feature.

I had an experiment enter a failed state because too many trials failed.

All of the existing trials that were in the running state continued to run.

I can see advantages and disadvantages to not terminating existing trials when an experiment fails

johnugeorge · 2019-10-05T13:35:08Z

Currently, the behavior is same as you described. Do you have any particular use case?

jlewi · 2019-10-08T15:24:52Z

I don't have a specific use case to warrant one behavior or the other.

Training jobs can run for a long time so if training jobs have been running for 10 hours you don't necessarily want to just abort them; you might want to let them continue to run and finish.

I guess if the user wants to kill them they can always do explicitly. Wheras if they are killed automatically resuming the jobs may not be possible.

gaocegege · 2019-10-09T06:23:33Z

@johnugeorge Did we fix it in #861

johnugeorge · 2019-10-09T06:52:51Z

Yes. Currently, we keep existing running trials to continue and complete even if it has entered a terminal state.

johnugeorge · 2019-10-09T06:53:13Z

Closing the issue as it works as expected

k8s-ci-robot added the kind/bug label Oct 4, 2019

jlewi added the priority/p2 label Oct 8, 2019

johnugeorge closed this as completed Oct 9, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug Or Feature - Experiment enters failed state but trials left running #855

Bug Or Feature - Experiment enters failed state but trials left running #855

jlewi commented Oct 4, 2019

johnugeorge commented Oct 5, 2019

jlewi commented Oct 8, 2019

gaocegege commented Oct 9, 2019

johnugeorge commented Oct 9, 2019

johnugeorge commented Oct 9, 2019

Bug Or Feature - Experiment enters failed state but trials left running #855

Bug Or Feature - Experiment enters failed state but trials left running #855

Comments

jlewi commented Oct 4, 2019

johnugeorge commented Oct 5, 2019

jlewi commented Oct 8, 2019

gaocegege commented Oct 9, 2019

johnugeorge commented Oct 9, 2019

johnugeorge commented Oct 9, 2019