Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade run_trial.py and architect.py #2248

Closed
wants to merge 33 commits into from
Closed

Upgrade run_trial.py and architect.py #2248

wants to merge 33 commits into from

Conversation

sifa1024
Copy link
Contributor

What this PR does / why we need it:
If I use the original program, I will get this error when running darts-gpu,

<architect.Architect object at 0x7fe597aad780>
Traceback (most recent call last):
  File "/home/sifa/docker/katib/examples/v1beta1/trial-images/darts-cnn-cifar10/run_trial.py", line 259, in <module>
    main()
  File "/home/sifa/docker/katib/examples/v1beta1/trial-images/darts-cnn-cifar10/run_trial.py", line 155, in main
    train(train_loader, valid_loader, model, architect, w_optim, alpha_optim,
  File "/home/sifa/docker/katib/examples/v1beta1/trial-images/darts-cnn-cifar10/run_trial.py", line 194, in train
    architect.unrolled_backward(train_x, train_y, valid_x, valid_y, lr, w_optim)
  File "/home/sifa/docker/katib/examples/v1beta1/trial-images/darts-cnn-cifar10/architect.py", line 69, in unrolled_backward
    self.virtual_step(train_x, train_y, xi, w_optim)
  File "/home/sifa/docker/katib/examples/v1beta1/trial-images/darts-cnn-cifar10/architect.py", line 56, in virtual_step
    vw.copy_(w - torch.FloatTensor(xi) * (m + g + self.w_weight_decay * w))
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

Which issue(s) this PR fixes

None. I've create pull request directly.

Checklist:

  • Docs included if any changes are user facing

Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: sifa1024
Once this PR has been reviewed and has the lgtm label, please assign tenzen-y for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow google-oss-prow bot requested a review from anencore94 December 15, 2023 08:40
@sifa1024 sifa1024 marked this pull request as draft December 21, 2023 19:06
@sifa1024 sifa1024 marked this pull request as ready for review January 4, 2024 22:28
@sifa1024 sifa1024 changed the title Update run_trial.py and architect.py Upgrade run_trial.py and architect.py Jan 17, 2024
Copy link
Member

@tenzen-y tenzen-y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sifa1024 I apologize for delaying the response.

@kubeflow/wg-automl-leads Could you approve CI?

@@ -26,7 +26,7 @@ def __init__(self, model, w_momentum, w_weight_decay):
self.w_momentum = w_momentum
self.w_weight_decay = w_weight_decay

def virtual_step(self, train_x, train_y, xi, w_optim):
def virtual_step(self, train_x, train_y, xi, w_optim, device):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of passing device name, should we detect device like this?

device = torch.device("cuda" if use_cuda else "cpu")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we should do like this

@sifa1024 sifa1024 requested a review from tenzen-y March 4, 2024 09:41
@tenzen-y tenzen-y mentioned this pull request Mar 4, 2024
1 task
@tenzen-y
Copy link
Member

tenzen-y commented Mar 4, 2024

@sifa1024 Could you rebase this PR since we have fixed the CI errors here: #2267.

@sifa1024
Copy link
Contributor Author

sifa1024 commented Mar 4, 2024

@tenzen-y I modify code and commits and I rebased this PR.

@tenzen-y
Copy link
Member

tenzen-y commented Mar 5, 2024

@sifa1024 Could you sign to the DCO?

@tenzen-y
Copy link
Member

tenzen-y commented Mar 5, 2024

@sifa1024 I guess that you need to sign to all commits with git commit -s.

@sifa1024
Copy link
Contributor Author

sifa1024 commented Mar 5, 2024

sorry I am first time use this....

@tenzen-y
Copy link
Member

tenzen-y commented Mar 5, 2024

sorry I am first time use this....

@sifa1024 Uhm, could you create a separate PR? It seems that unintended commits are included in this PR.

andreyvelich and others added 11 commits March 5, 2024 17:19
Signed-off-by: sifa1024 <stu95440@gmail.com>
Signed-off-by: sifa1024 <stu95440@gmail.com>
Signed-off-by: sifa1024 <stu95440@gmail.com>
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
Signed-off-by: sifa1024 <stu95440@gmail.com>
Signed-off-by: sifa1024 <stu95440@gmail.com>
Signed-off-by: sifa1024 <stu95440@gmail.com>
* Update Ubuntu to 22.04 for E2E Tests

* Update Ubuntu for all Tests

Signed-off-by: sifa1024 <stu95440@gmail.com>
* Add Katib ROADMAP 2022/2023

* Add multi-objective optimization

* Add Scalability Improvements

* Remove Katib CRD naming

Signed-off-by: sifa1024 <stu95440@gmail.com>
Co-authored-by: andreafehrman <andrea.k.fehrman@vanderbilt.edu>
Co-authored-by: harrisonfritz <harrisonmichaelfritz@gmail.com>
Signed-off-by: sifa1024 <stu95440@gmail.com>
Signed-off-by: sifa1024 <stu95440@gmail.com>
Signed-off-by: sifa1024 <stu95440@gmail.com>
dependabot bot and others added 21 commits March 5, 2024 17:19
#2234)

Bumps [@babel/traverse](https://github.com/babel/babel/tree/HEAD/packages/babel-traverse) from 7.15.4 to 7.23.2.
- [Release notes](https://github.com/babel/babel/releases)
- [Changelog](https://github.com/babel/babel/blob/main/CHANGELOG.md)
- [Commits](https://github.com/babel/babel/commits/v7.23.2/packages/babel-traverse)

---
updated-dependencies:
- dependency-name: "@babel/traverse"
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: sifa1024 <stu95440@gmail.com>
Bumps [debug](https://github.com/debug-js/debug) from 4.2.0 to 4.3.4.
- [Release notes](https://github.com/debug-js/debug/releases)
- [Commits](debug-js/debug@4.2.0...4.3.4)

---
updated-dependencies:
- dependency-name: debug
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: sifa1024 <stu95440@gmail.com>
* Fix Optuna Validation for CMA-ES

* Fix Optuna test

Signed-off-by: sifa1024 <stu95440@gmail.com>
Signed-off-by: sifa1024 <stu95440@gmail.com>
Signed-off-by: sifa1024 <stu95440@gmail.com>
* add env & env_from spec

* unify env and env_from specs

Signed-off-by: sifa1024 <stu95440@gmail.com>
Signed-off-by: sifa1024 <stu95440@gmail.com>
Signed-off-by: tenzen-y <yuki.iwai.tz@gmail.com>
Signed-off-by: sifa1024 <stu95440@gmail.com>
Signed-off-by: tenzen-y <yuki.iwai.tz@gmail.com>
Signed-off-by: sifa1024 <stu95440@gmail.com>
* DB: Add env to skip DB creationˆ

* DB: Rename var name & Remove new function

* Migration -> Initialization
* Remove GetBoolEnvOrDefault

* DB: Rearrange dependencies

Signed-off-by: sifa1024 <stu95440@gmail.com>
…nd (#2253)

Bumps [follow-redirects](https://github.com/follow-redirects/follow-redirects) from 1.14.8 to 1.15.4.
- [Release notes](https://github.com/follow-redirects/follow-redirects/releases)
- [Commits](follow-redirects/follow-redirects@v1.14.8...v1.15.4)

---
updated-dependencies:
- dependency-name: follow-redirects
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: sifa1024 <stu95440@gmail.com>
Bumps [axios](https://github.com/axios/axios) to 1.6.5 and updates ancestor dependency [wait-on](https://github.com/jeffbski/wait-on). These dependencies need to be updated together.

Updates `axios` from 0.27.2 to 1.6.5
- [Release notes](https://github.com/axios/axios/releases)
- [Changelog](https://github.com/axios/axios/blob/v1.x/CHANGELOG.md)
- [Commits](axios/axios@v0.27.2...v1.6.5)

Updates `wait-on` from 7.0.1 to 7.2.0
- [Release notes](https://github.com/jeffbski/wait-on/releases)
- [Commits](jeffbski/wait-on@v7.0.1...v7.2.0)

---
updated-dependencies:
- dependency-name: axios
  dependency-type: indirect
- dependency-name: wait-on
  dependency-type: direct:development
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: sifa1024 <stu95440@gmail.com>
Message `close-pr-message` was likely a wrong copy-paste from stale.

This aligns `close-` messages.

Signed-off-by: sifa1024 <stu95440@gmail.com>
* UT: Replace MXNet example with PyTorch example

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

* CI: Replace MXNet examples with PyTorch examples

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

---------

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
Signed-off-by: sifa1024 <stu95440@gmail.com>
Signed-off-by: sifa1024 <stu95440@gmail.com>
Signed-off-by: sifa1024 <stu95440@gmail.com>
Signed-off-by: sifa1024 <stu95440@gmail.com>
Signed-off-by: sifa1024 <stu95440@gmail.com>
Signed-off-by: sifa1024 <stu95440@gmail.com>
Signed-off-by: sifa1024 <stu95440@gmail.com>
Signed-off-by: sifa1024 <m11263004@gemail.yuntech.edu.tw>
Signed-off-by: sifa1024 <stu95440@gmail.com>
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@google-oss-prow google-oss-prow bot added size/XXL and removed size/S labels Mar 5, 2024
@sifa1024
Copy link
Contributor Author

sifa1024 commented Mar 5, 2024

I'm going to turn this closed. I feel like I messed up.

@sifa1024 sifa1024 marked this pull request as draft March 5, 2024 09:23
@sifa1024 sifa1024 closed this Mar 5, 2024
@sifa1024
Copy link
Contributor Author

sifa1024 commented Mar 5, 2024

@tenzen-y Could I creat a new PR?

@tenzen-y
Copy link
Member

tenzen-y commented Mar 5, 2024

@tenzen-y Could I creat a new PR?

Sure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants