Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix TPE Suggestion #1063

Merged
merged 5 commits into from
Feb 25, 2020
Merged

Conversation

andreyvelich
Copy link
Member

@andreyvelich andreyvelich commented Feb 21, 2020

This PR fixes some problems in hyperopt Suggestion.

  1. I tested tpe Algorithm on large-amount of Trials. After 20 Succeeded Trial I always got this error:
INFO:hyperopt.tpe:tpe_transform took 0.001436 seconds
INFO:hyperopt.tpe:TPE using 20/20 trials with best loss -0.955490
ERROR:grpc._server:Exception calling application: '>' not supported between instances of 'int' and 'str'
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/grpc/_server.py", line 434, in _call_behavior
response_or_iterator = behavior(argument, context)
File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/hyperopt_service.py", line 27, in GetSuggestions
search_space, trials, request.request_number)
File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/hyperopt/base_hyperopt_service.py", line 129, in getSuggestions
new_ids, rval.domain, completed_hyperopt_trials, random_state)
File "/usr/local/lib/python3.6/site-packages/hyperopt/tpe.py", line 876, in suggest
fake_id_0 = max(max(tids), new_id) + 2
TypeError: '>' not supported between instances of 'int' and 'str'

I investigated hyperopt and found that here: https://github.com/hyperopt/hyperopt/blob/master/hyperopt/tpe.py#L906, we run just a simple random suggestion for n_startup_jobs times, and only after 20 Trials we start to run tpe. I changed this number to request_number. Only in the first GetSuggestion call we must run random algorithm.

  1. To solve above error, I change structure of Doc. I believe, we can't use Trial name as tid in the Doc. For tid I use id generated by self.fmin object. As well, I save Trial name to the spec section.

  2. I insert only new Trials in the Doc. I check which Trials have been already inserted in recorded_trials_names.

  3. To generate new ids every GetSuggestions call, I create FMinIter instance in the first call. And I update information about succeeded Trials in self.fmin object.

  4. To follow this PR Create Optimizer in BO Suggestion only for the first run #1057, Search Space is transformed to Domain only in the first GetSuggestions call.

  5. When we analyse information from Succeeded Trials we must cast values to int or float or they will be strings and we will have errors in tpe.suggest

  6. After first GetSuggestion call, tpe.suggest will generate only one Trial. I create a loop to run this method, in case that we need to create more than 1 Trial during Experiment.

I tested tpe and random on 150 Trial Experiment, I didn't get any errors.

Please take a look.

/assign @gaocegege @johnugeorge


This change is Reviewable

@andreyvelich
Copy link
Member Author

/retest

@johnugeorge
Copy link
Member

/cc @gaocegege

Copy link
Member

@gaocegege gaocegege left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Thanks for the PR!

@johnugeorge
Copy link
Member

/approve

@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: johnugeorge

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@andreyvelich
Copy link
Member Author

/retest

2 similar comments
@andreyvelich
Copy link
Member Author

/retest

@andreyvelich
Copy link
Member Author

/retest

@k8s-ci-robot k8s-ci-robot merged commit 09da67c into kubeflow:master Feb 25, 2020
@andreyvelich andreyvelich deleted the fix-tpe-suggestion branch October 6, 2021 00:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants