Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Optuna based suggestion service #1613

Merged
merged 12 commits into from
Aug 16, 2021

Conversation

g-votte
Copy link
Contributor

@g-votte g-votte commented Aug 11, 2021

What this PR does / why we need it:
This PR proposes to add the suggestion service based on Optuna. Optuna provides several sampling algorithms that have not been implemented in other suggestion services in Katib, such as multi-variate TPE and constant liar.

In addition, as discussed in #1549, Optuna can offer the extension of multi-objective optimization when Katib supports that interface.

I've written the example of invoking multi-variate TPE in a separated repository, so that we can test and discuss the new algorithm based on the Optuna service. Multi-variate TPE captures the dependencies among multiple inputs and shows better performances than normal TPE in many benchmark tasks. If the example looks fine, I'd also like to add it in another PR.

Checklist:

  • Docs included if any changes are user facing

@google-cla
Copy link

google-cla bot commented Aug 11, 2021

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

@aws-kf-ci-bot
Copy link
Contributor

Hi @g-votte. Thanks for your PR.

I'm waiting for a kubeflow member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@g-votte
Copy link
Contributor Author

g-votte commented Aug 11, 2021

@googlebot I signed it!

@c-bata
Copy link
Member

c-bata commented Aug 11, 2021

/ok-to-test

@c-bata
Copy link
Member

c-bata commented Aug 11, 2021

/assign @c-bata

@johnugeorge
Copy link
Member

/ok-to-test

Copy link
Member

@andreyvelich andreyvelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this awesome contribution @g-votte!
I left few comments.
Please can you also add one example, for instance of using multivariate TPE ?

@gaocegege @johnugeorge Since we are planing to cut the release on the next week, do we want to include this feature in Katib 0.12 ?

cmd/suggestion/optuna/v1beta1/Dockerfile Outdated Show resolved Hide resolved
pkg/suggestion/v1beta1/optuna/service.py Outdated Show resolved Hide resolved
pkg/suggestion/v1beta1/optuna/service.py Outdated Show resolved Hide resolved
pkg/suggestion/v1beta1/optuna/service.py Outdated Show resolved Hide resolved
pkg/suggestion/v1beta1/optuna/service.py Outdated Show resolved Hide resolved
pkg/suggestion/v1beta1/optuna/service.py Outdated Show resolved Hide resolved
Comment on lines 146 to 149
else:
# The trial has not been suggested by the Optuna study.
# A new trial object is created and reported using study.add_trial() with the assignments and the search space.
optuna_trial = optuna.create_trial(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please can you tell me the use-case when "the Trial has not been suggested by the Optuna study"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the question. This block is to deal with the change of suggestion logic during an experiment, e.g. Optuna-based search after a certain number of bayesianoptimization trials, but does it happen in Katib?

If it is guaranteed that the suggestion service is unchangeable during an experiment, I will change the logic so that an error is raised instead of creating and adding Optuna trials.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is guaranteed that the suggestion service is unchangeable during an experiment

Yeah, currently we don't provide a functionality to change Suggestion logic during Experiment run. For example, changing Suggestion algorithm.
User can modify only Experiment budget. Check this: https://www.kubeflow.org/docs/components/katib/resume-experiment/#modify-running-experiment.
cc @gaocegege @johnugeorge

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, should we remove this part for now?
In the future if we decide to add this feature in the controller, we can extend the Suggestion logic.
What do you think @g-votte @c-bata ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, should we remove this part for now?

Agree. Goptuna suggestion service doesn't also support such a use case now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andreyvelich @c-bata
Thanks for your comments. I changed the logic so that it raises an error for unknown assignments. (I also changed the test case because the previous test passes the assignments externally created as the initial trials.)
Commit: 1c0e15f

g-votte and others added 5 commits August 12, 2021 09:04
Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
@gaocegege
Copy link
Member

Thanks for your contribution! 🎉 👍

https://github.com/g-votte/katib-optuna-example is missing

@gaocegege
Copy link
Member

Since we are planing to cut the release on the next week, do we want to include this feature in Katib 0.12 ?

This PR will not break existing features, thus I think we can have it. Maybe mark it alpha or beta, WDYT @andreyvelich

@g-votte
Copy link
Contributor Author

g-votte commented Aug 12, 2021

Thanks for your contribution! 🎉 👍

https://github.com/g-votte/katib-optuna-example is missing

@gaocegege
Oops. I've just made this public.

Copy link
Member

@gaocegege gaocegege left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

In this PR, additionalMetricNames is used to provide multiple metrics to katib, I think we should have a new CRD design to support it.

WDYT @andreyvelich @johnugeorge

Copy link
Member

@c-bata c-bata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@g-votte Thank you for your pull request. Overall looks good 💯 I put some review comments.

name = algorithm_spec.algorithm_name
settings = {s.name:s.value for s in algorithm_spec.algorithm_settings}

if name == "tpe" or name == "multivariate-tpe":
Copy link
Member

@c-bata c-bata Aug 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your information. Since changing those components is user-facing, I'd like to work on that in a separated PR.

@@ -0,0 +1,31 @@
FROM python:3.6
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nits] How about using Python 3.9?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I updated the Python version. Some dependencies in requirements.txt are also updated for that purpose.

Commit: a4ae6e2

Comment on lines 151 to 154
def _get_assignments_key(self, assignments):
assignments = sorted(assignments, key=lambda a: a.name)
assignments_str = [str(a) for a in assignments]
return ",".join(assignments_str)
Copy link
Member

@c-bata c-bata Aug 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Question] I have two questions.

  1. Does the string representation of Assignment object hold both parameter names and parameter values?
  2. You defined assignments_to_optuna_number as defaultdict(list). I guess the reason why you use list is for duplicated hyperparameters, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Yes; this should contain both the assignment's name and value, using the string representation of the Assignment class.

    def __str__(self):
    return "Assignment(name={}, value={})".format(self.name, self.value)

  2. As you mention, this is to handle duplications of assignments.

Copy link
Member

@c-bata c-bata Aug 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Sounds like it works 👍

This is a very nit-picking comment but I'd say the following code is more clear that contains both the parameter name and the value. And a bit memory efficient.

assignments_str = [f"{a.name}:{a.value}" for a in assignments]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. I changed the line.
b2085a6

@johnugeorge
Copy link
Member

@gaocegege our definition of additionalMetricName is different. This seems to be a new CRD design.

Since this PR doesn't affect anything else, this can go in this release also.

@andreyvelich
Copy link
Member

Can we copy this example: https://github.com/g-votte/katib-optuna-example/blob/main/experiment.yaml under https://github.com/kubeflow/katib/tree/master/examples/v1beta1 with name multivariate-tpe-example.yaml ?

I think in the future PRs we can update our katib-config as you did here: https://github.com/g-votte/katib-optuna-example/blob/main/katib-config.yaml#L47-L50.

WDYT @g-votte @c-bata @gaocegege ?

@gaocegege
Copy link
Member

SGTM

@g-votte
Copy link
Contributor Author

g-votte commented Aug 13, 2021

@andreyvelich
Thanks for your comment. I copied the example to examples/v1beta1/.
Commit: a119fb2

Please let me know if we should not include this until katib-config.yaml is updated. I can separate the PR in that case.

CC: @c-bata @gaocegege


kwargs["multivariate"] = name == "multivariate-tpe"

sampler = optuna.samplers.TPESampler(**kwargs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@g-votte I think it's reasonable to set kwargs["constant_liar"] = True by default. Because you know, Katib allows us to run distributed optimization easily. What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or we can provide an algorithm setting to set constant_liar=True.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. There is less reason to disable constant liar especially with parallel optimization.
Added a line to turn on the option by default.
1d81885

@andreyvelich
Copy link
Member

Please let me know if we should not include this until katib-config.yaml is updated. I can separate the PR in that case.

It's fine that we have this example, once we push docker.io/kubeflowkatib/suggestion-optuna to the Docker hub, we can update Katib config.

@g-votte
Copy link
Contributor Author

g-votte commented Aug 16, 2021

@andreyvelich @c-bata @gaocegege @johnugeorge
Thanks for your swift and detailed reviews!
I think all comments are reflected. PTAL.

@andreyvelich
Copy link
Member

Thank you for implementing this @g-votte!
/lgtm
Others please take a look @c-bata @gaocegege @johnugeorge.

@c-bata
Copy link
Member

c-bata commented Aug 16, 2021

LGTM 🎉

@johnugeorge
Copy link
Member

/lgtm

@johnugeorge
Copy link
Member

/approve

@google-oss-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: g-votte, johnugeorge

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants