Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Chocolate Suggestion #1116

Merged

Conversation

andreyvelich
Copy link
Member

@andreyvelich andreyvelich commented Mar 31, 2020

I changed Chocolate Suggestion to be consistent with other Suggestions (Hyperopt, skopt).

  1. Chocolate suggestion should analyse search space only for the first run.
  2. I changed mechanism of saving information to sqlite3 DB. We don't need to override all Trial data every GetSuggestions call. We just update current DB data with the new Trial loss and Trial name for appropriate _chocolate_id. For this, I use list of recorded dictionaries self.created_trials variable. I update this list when receive new information about Trials.
  3. Because of that, Chocolate sampler generates correct _chocolate_id for each new assignment and Chocolate algorithms should work correct. User can check data in sqlite3 DB.
  4. Added few logs steps.
  5. Removed warnings from SQLAlchemy about different threads by adding: check_same_thread=False flag.

I tested this Suggestion on grid, chocolate-quasirandom and chocolate-mocmaes. All Experiments were succeeded. Also, I tested grid on 150 Trials and return Experiment functionality.

/assign @gaocegege
/cc @johnugeorge

While testing I found one thing. In very rare cases, Katib controller calls GetSuggestion and doesn't create appropriate jobs.

It might be because of these sequence of actions in the controller:

  1. Katib controller calls GetSuggestion with request_number = 1.
  2. Katib controller gets response from GetSuggestion
  3. Katib controller gets these errors. grid-lg9jzkgw.- Previously created Trial.
"logger":"kubebuilder.controller","msg":"Reconciler error","controller":"suggestion-controller","request":"kubeflow/grid","error":"Operation cannot be fulfilled on suggestions.kubeflow.org \"grid\": the object has been modified; please apply your changes to the latest version and try again"
"logger": "trial-controller","msg":"Reconcile trial error","Trial":"kubeflow/grid-lg9jzkgw","error":"jobs.batch \"grid-lg9jzkgw\" not found"
  1. At the same time Katib controller calls GetSuggestion with request_number = 2, because activeCount<parallelCount and got new suggestion with 2 Trials.
  2. Katib controller created new Trials from the latest GetSuggestion response.

So we lose the first suggested Trial.

@kubeflow-bot
Copy link

This change is Reviewable

@andreyvelich
Copy link
Member Author

/retest

8 similar comments
@andreyvelich
Copy link
Member Author

/retest

@andreyvelich
Copy link
Member Author

/retest

@andreyvelich
Copy link
Member Author

/retest

@gaocegege
Copy link
Member

/retest

@andreyvelich
Copy link
Member Author

/retest

@andreyvelich
Copy link
Member Author

/retest

@andreyvelich
Copy link
Member Author

/retest

@gaocegege
Copy link
Member

/retest

@andreyvelich
Copy link
Member Author

CI passed. @gaocegege Can you take a look again, please?

Copy link
Member

@gaocegege gaocegege left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

Thanks for your contribution! 🎉 👍

@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gaocegege

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 241b9a3 into kubeflow:master Apr 7, 2020
sperlingxx pushed a commit to sperlingxx/katib that referenced this pull request Jul 9, 2020
@andreyvelich andreyvelich deleted the search-space-chocolate-suggestion branch October 6, 2021 00:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants