-
Notifications
You must be signed in to change notification settings - Fork 448
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Optuna based suggestion service #1613
Conversation
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please visit https://cla.developers.google.com/ to sign. Once you've signed (or fixed any issues), please reply here with What to do if you already signed the CLAIndividual signers
Corporate signers
ℹ️ Googlers: Go here for more info. |
Hi @g-votte. Thanks for your PR. I'm waiting for a kubeflow member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@googlebot I signed it! |
/ok-to-test |
/assign @c-bata |
/ok-to-test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this awesome contribution @g-votte!
I left few comments.
Please can you also add one example, for instance of using multivariate TPE ?
@gaocegege @johnugeorge Since we are planing to cut the release on the next week, do we want to include this feature in Katib 0.12 ?
else: | ||
# The trial has not been suggested by the Optuna study. | ||
# A new trial object is created and reported using study.add_trial() with the assignments and the search space. | ||
optuna_trial = optuna.create_trial( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please can you tell me the use-case when "the Trial has not been suggested by the Optuna study"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the question. This block is to deal with the change of suggestion logic during an experiment, e.g. Optuna-based search after a certain number of bayesianoptimization trials, but does it happen in Katib?
If it is guaranteed that the suggestion service is unchangeable during an experiment, I will change the logic so that an error is raised instead of creating and adding Optuna trials.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is guaranteed that the suggestion service is unchangeable during an experiment
Yeah, currently we don't provide a functionality to change Suggestion logic during Experiment run. For example, changing Suggestion algorithm.
User can modify only Experiment budget. Check this: https://www.kubeflow.org/docs/components/katib/resume-experiment/#modify-running-experiment.
cc @gaocegege @johnugeorge
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, should we remove this part for now?
Agree. Goptuna suggestion service doesn't also support such a use case now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andreyvelich @c-bata
Thanks for your comments. I changed the logic so that it raises an error for unknown assignments. (I also changed the test case because the previous test passes the assignments externally created as the initial trials.)
Commit: 1c0e15f
Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
Thanks for your contribution! 🎉 👍 https://github.com/g-votte/katib-optuna-example is missing |
This PR will not break existing features, thus I think we can have it. Maybe mark it alpha or beta, WDYT @andreyvelich |
@gaocegege |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
In this PR, additionalMetricNames is used to provide multiple metrics to katib, I think we should have a new CRD design to support it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@g-votte Thank you for your pull request. Overall looks good 💯 I put some review comments.
name = algorithm_spec.algorithm_name | ||
settings = {s.name:s.value for s in algorithm_spec.algorithm_settings} | ||
|
||
if name == "tpe" or name == "multivariate-tpe": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to update the following files to use multivariate-tpe
on New Katib UI.
- https://github.com/kubeflow/katib/blob/master/pkg/new-ui/v1beta1/frontend/src/app/constants/algorithms-types.const.ts
- https://github.com/kubeflow/katib/blob/master/pkg/new-ui/v1beta1/frontend/src/app/constants/algorithms-settings.const.ts
- https://github.com/kubeflow/katib/blob/master/pkg/new-ui/v1beta1/frontend/src/app/enumerations/algorithms.enum.ts
But it might be enough to work on this in a separated pull request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your information. Since changing those components is user-facing, I'd like to work on that in a separated PR.
@@ -0,0 +1,31 @@ | |||
FROM python:3.6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nits] How about using Python 3.9?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. I updated the Python version. Some dependencies in requirements.txt are also updated for that purpose.
Commit: a4ae6e2
def _get_assignments_key(self, assignments): | ||
assignments = sorted(assignments, key=lambda a: a.name) | ||
assignments_str = [str(a) for a in assignments] | ||
return ",".join(assignments_str) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Question] I have two questions.
- Does the string representation of Assignment object hold both parameter names and parameter values?
- You defined
assignments_to_optuna_number
asdefaultdict(list)
. I guess the reason why you use list is for duplicated hyperparameters, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
Yes; this should contain both the assignment's name and value, using the string representation of the Assignment class.
katib/pkg/suggestion/v1beta1/internal/trial.py
Lines 88 to 89 in abbc9c9
def __str__(self): return "Assignment(name={}, value={})".format(self.name, self.value) -
As you mention, this is to handle duplications of assignments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! Sounds like it works 👍
This is a very nit-picking comment but I'd say the following code is more clear that contains both the parameter name and the value. And a bit memory efficient.
assignments_str = [f"{a.name}:{a.value}" for a in assignments]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. I changed the line.
b2085a6
@gaocegege our definition of Since this PR doesn't affect anything else, this can go in this release also. |
Can we copy this example: https://github.com/g-votte/katib-optuna-example/blob/main/experiment.yaml under https://github.com/kubeflow/katib/tree/master/examples/v1beta1 with name I think in the future PRs we can update our WDYT @g-votte @c-bata @gaocegege ? |
SGTM |
@andreyvelich Please let me know if we should not include this until CC: @c-bata @gaocegege |
|
||
kwargs["multivariate"] = name == "multivariate-tpe" | ||
|
||
sampler = optuna.samplers.TPESampler(**kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@g-votte I think it's reasonable to set kwargs["constant_liar"] = True
by default. Because you know, Katib allows us to run distributed optimization easily. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or we can provide an algorithm setting to set constant_liar=True
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. There is less reason to disable constant liar especially with parallel optimization.
Added a line to turn on the option by default.
1d81885
It's fine that we have this example, once we push |
@andreyvelich @c-bata @gaocegege @johnugeorge |
Thank you for implementing this @g-votte! |
LGTM 🎉 |
/lgtm |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: g-votte, johnugeorge The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What this PR does / why we need it:
This PR proposes to add the suggestion service based on Optuna. Optuna provides several sampling algorithms that have not been implemented in other suggestion services in Katib, such as multi-variate TPE and constant liar.
In addition, as discussed in #1549, Optuna can offer the extension of multi-objective optimization when Katib supports that interface.
I've written the example of invoking multi-variate TPE in a separated repository, so that we can test and discuss the new algorithm based on the Optuna service. Multi-variate TPE captures the dependencies among multiple inputs and shows better performances than normal TPE in many benchmark tasks. If the example looks fine, I'd also like to add it in another PR.
Checklist: