You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I don't have a specific use case to warrant one behavior or the other.
Training jobs can run for a long time so if training jobs have been running for 10 hours you don't necessarily want to just abort them; you might want to let them continue to run and finish.
I guess if the user wants to kill them they can always do explicitly. Wheras if they are killed automatically resuming the jobs may not be possible.
/kind bug
What steps did you take and what happened:
I'm not sure if this is a bug or a feature.
I had an experiment enter a failed state because too many trials failed.
All of the existing trials that were in the running state continued to run.
I can see advantages and disadvantages to not terminating existing trials when an experiment fails
The text was updated successfully, but these errors were encountered: