-
Notifications
You must be signed in to change notification settings - Fork 448
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resume experiment with extra trials from last checkpoint #952
Conversation
/hold |
/cc @richardsliu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Should we add a test case for it?
@gaocegege added a test to reconfigure max trails and parallel trial count |
@@ -123,6 +124,14 @@ func (k *KatibClient) CreateExperiment(experiment *experimentsv1alpha3.Experimen | |||
return nil | |||
} | |||
|
|||
func (k *KatibClient) UpdateExperiment(experiment *experimentsv1alpha3.Experiment, namespace ...string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this method used for UI? and it seems namespace
param isn't used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure about it. Shall we do this separately?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this method is not useful in your PR topic, I think we'd better remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. this method is currently used in resume_e2e_experiment.go script in this PR to update the experiment.
// Experiment is restartable only if it is in succeeded state by reaching max trials | ||
if util.IsCompletedExperimentRestartable(instance) { | ||
// Check if max trials is reconfigured | ||
if (instance.Spec.MaxTrialCount != nil) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if a user changes MaxTrialCount
to nil (infinity), here we should also allow to restart it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
(*instance.Spec.MaxTrialCount != instance.Status.Trials) { | ||
if (instance.Spec.MaxTrialCount != nil && | ||
*instance.Spec.MaxTrialCount != instance.Status.Trials) || | ||
(instance.Spec.MaxTrialCount == nil && instance.Status.Trials != 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we consider the case: instance.Spec.MaxTrialCount == 0 and the experiment will be marked completed with instance.Status.Trials == 0; then update instance.Spec.MaxTrialCount to nil?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think, this is a invalid case. It doesn't make sense to set MaxTrialCount to be zero if set. We should add a validation for this separately.
Related: #768
/lgtm |
/hold cancel |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: johnugeorge The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Fixes: #891
Resuming experiment feature is added. If user wants to try the same experiment with more number of trials, max trials can be reconfigured to restart the experiment from the last checkpoint.
Experiment is restarted only when experiment is in succeeded state with max trials reached previously.
This change is