-
Notifications
You must be signed in to change notification settings - Fork 448
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Population based training #1833
Conversation
Hi @a9p. Thanks for your PR. I'm waiting for a kubeflow member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Comments and open questions:
/cc @andreyvelich |
/cc @richardsliu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/ok-to-test
/retest |
Please comment when PR is complete to be reviewed |
/retest (@johnugeorge fyi) |
@a9p: The
Use In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/test all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this great contribution @a9p, sorry for the late reply.
I left my first comments.
You can also create PR, similar to this: kubeflow/testing#970.
We should add PBT Suggestion and Trial images to our CI.
/assign @gaocegege @johnugeorge @tenzen-y @anencore94
@kubeflow/wg-automl-leads Can you start GH Actions? |
It looks like no trials kicked off again, but this time due to a pull failure for the |
Thanks for checking CI. @a9p katib/test/e2e/v1beta1/scripts/gh-actions/build-load.sh Lines 160 to 167 in 3e0cfd1
|
@johnugeorge I think the last approved run failed due to a pull policy that I had in simple-pbt. I checked against the other hp jobs and they should be in parity now (no additional fields)! |
@a9p Tests didn't complete yet. Does it succeed when you tried locally? |
…mages.sh update yaml function (causing failing gha).
@johnugeorge it did work for me, but it seems like I had images that satisfied the docker image pull locally. I setup a new test vm -- the main issue that has been causing the ci failure was a missing entry for simple-pbt in update-images.sh which changes the image tag from latest to e2e-test for the tests. |
Thanks @a9p for hard work and patience. We are ready to merge. One minor addition request. Can you update https://github.com/kubeflow/katib#search-algorithms to add PBT as well in the README table? In a separate PR, can you also add a new section https://www.kubeflow.org/docs/components/katib/experiment/#search-algorithms-in-detail as well in https://github.com/kubeflow/website ? This will help drive users to try PBT. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much for implementing this excellent feature! @a9p
/lgtm
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this amazing contribution @a9p!
I am excited to see the outcome from our users.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: a9p, andreyvelich, tenzen-y The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Thank you all for your guidance and patience with getting this merged in! |
What this PR does / why we need it:
Support the discovery of modulated hyperparameters rather than attempting to find a fixed set over the entire training process. The paper has more details about the technique.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):This PR provides some initial support for PBT within Katib (#1382).
Checklist: