Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjust the device used in synthetic data generation #486

Merged
merged 3 commits into from
Sep 9, 2022

Conversation

karlhigley
Copy link
Contributor

It's important to use data on the same device that the model will be on during serving when tracing the model, so this makes it possible to generate synthetic data on GPU.

@karlhigley karlhigley self-assigned this Sep 8, 2022
@karlhigley karlhigley added area/pytorch chore Maintenance for the repository labels Sep 8, 2022
@karlhigley karlhigley added this to the Merlin 22.09 milestone Sep 8, 2022
@nvidia-merlin-bot
Copy link

Click to view CI Results
GitHub pull request #486 of commit d5fe2826657035297a86a48d3e63cb835c967f7d, no merge conflicts.
Running as SYSTEM
Setting status of d5fe2826657035297a86a48d3e63cb835c967f7d to PENDING with url http://10.20.17.181:8080/job/transformers4rec_tests/195/ and message: 'Build started for merge commit.'
Using context: Jenkins Unit Test Run
Building on master in workspace /var/jenkins_home/workspace/transformers4rec_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA-Merlin/Transformers4Rec.git
 > git init /var/jenkins_home/workspace/transformers4rec_tests/transformers4rec # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/Transformers4Rec.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/Transformers4Rec.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/Transformers4Rec.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/Transformers4Rec.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/Transformers4Rec.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/Transformers4Rec.git +refs/pull/486/*:refs/remotes/origin/pr/486/* # timeout=10
 > git rev-parse d5fe2826657035297a86a48d3e63cb835c967f7d^{commit} # timeout=10
Checking out Revision d5fe2826657035297a86a48d3e63cb835c967f7d (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f d5fe2826657035297a86a48d3e63cb835c967f7d # timeout=10
Commit message: "Adjust device for synthetic data generation"
 > git rev-list --no-walk e5d579aa58d5b906996d3ff66c4a1587cebc4352 # timeout=10
[transformers4rec_tests] $ /bin/bash /tmp/jenkins9973369823515916597.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/transformers4rec_tests/transformers4rec
plugins: anyio-3.6.1, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 1 item

tests/unit/test_notebooks.py . [100%]

============================== 1 passed in 36.37s ==============================
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=2 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/Transformers4Rec/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[transformers4rec_tests] $ /bin/bash /tmp/jenkins8376419147968644139.sh

@github-actions
Copy link

github-actions bot commented Sep 8, 2022

@nvidia-merlin-bot
Copy link

Click to view CI Results
GitHub pull request #486 of commit f2a1cd5770f0d65274792b7142d4d8fd1b756761, no merge conflicts.
Running as SYSTEM
Setting status of f2a1cd5770f0d65274792b7142d4d8fd1b756761 to PENDING with url http://10.20.17.181:8080/job/transformers4rec_tests/196/ and message: 'Build started for merge commit.'
Using context: Jenkins Unit Test Run
Building on master in workspace /var/jenkins_home/workspace/transformers4rec_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA-Merlin/Transformers4Rec.git
 > git init /var/jenkins_home/workspace/transformers4rec_tests/transformers4rec # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/Transformers4Rec.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/Transformers4Rec.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/Transformers4Rec.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/Transformers4Rec.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/Transformers4Rec.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/Transformers4Rec.git +refs/pull/486/*:refs/remotes/origin/pr/486/* # timeout=10
 > git rev-parse f2a1cd5770f0d65274792b7142d4d8fd1b756761^{commit} # timeout=10
Checking out Revision f2a1cd5770f0d65274792b7142d4d8fd1b756761 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f f2a1cd5770f0d65274792b7142d4d8fd1b756761 # timeout=10
Commit message: "Merge branch 'main' into fix/synthetic-data-device"
 > git rev-list --no-walk d5fe2826657035297a86a48d3e63cb835c967f7d # timeout=10
[transformers4rec_tests] $ /bin/bash /tmp/jenkins3247530333803956841.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/transformers4rec_tests/transformers4rec
plugins: anyio-3.6.1, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 1 item

tests/unit/test_notebooks.py . [100%]

============================== 1 passed in 38.25s ==============================
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=2 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/Transformers4Rec/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[transformers4rec_tests] $ /bin/bash /tmp/jenkins15209332355384846126.sh

@karlhigley
Copy link
Contributor Author

rerun tests

@nvidia-merlin-bot
Copy link

Click to view CI Results
GitHub pull request #486 of commit f2a1cd5770f0d65274792b7142d4d8fd1b756761, no merge conflicts.
GitHub pull request #486 of commit f2a1cd5770f0d65274792b7142d4d8fd1b756761, no merge conflicts.
Running as SYSTEM
Setting status of f2a1cd5770f0d65274792b7142d4d8fd1b756761 to PENDING with url http://10.20.17.181:8080/job/transformers4rec_tests/197/ and message: 'Build started for merge commit.'
Using context: Jenkins Unit Test Run
Building on master in workspace /var/jenkins_home/workspace/transformers4rec_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA-Merlin/Transformers4Rec.git
 > git init /var/jenkins_home/workspace/transformers4rec_tests/transformers4rec # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/Transformers4Rec.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/Transformers4Rec.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/Transformers4Rec.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/Transformers4Rec.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/Transformers4Rec.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/Transformers4Rec.git +refs/pull/486/*:refs/remotes/origin/pr/486/* # timeout=10
 > git rev-parse f2a1cd5770f0d65274792b7142d4d8fd1b756761^{commit} # timeout=10
Checking out Revision f2a1cd5770f0d65274792b7142d4d8fd1b756761 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f f2a1cd5770f0d65274792b7142d4d8fd1b756761 # timeout=10
Commit message: "Merge branch 'main' into fix/synthetic-data-device"
 > git rev-list --no-walk f2a1cd5770f0d65274792b7142d4d8fd1b756761 # timeout=10
[transformers4rec_tests] $ /bin/bash /tmp/jenkins14302399178139978326.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/transformers4rec_tests/transformers4rec
plugins: anyio-3.6.1, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 1 item

tests/unit/test_notebooks.py . [100%]

============================== 1 passed in 37.02s ==============================
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=2 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/Transformers4Rec/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[transformers4rec_tests] $ /bin/bash /tmp/jenkins15396537597445292023.sh

Copy link
Member

@oliverholworthy oliverholworthy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a useful feature to me. Is it possible to set default device of a torch.rand tensor to based on a global device setting?

@oliverholworthy
Copy link
Member

It's important to use data on the same device that the model will be on during serving when tracing the model

important or necessary? do you know why this has to be the case for the tracing to work correctly?

I can see something in the TorchScript FAQ

recommended because the tracer may witness tensor creation on a specific device, so casting an already-loaded model may have unexpected effects. Casting the model before saving it ensures that the tracer has the correct device information.

Does that mean it has to be exactly the same device, not only GPU vs CPU. (but the same GPU architecture and device count for example?)

@karlhigley
Copy link
Contributor Author

I suppose you could do torch.set_default_tensor_type(torch.cuda.FloatTensor), but I think passing around a device argument is the recommended way to do it.

@karlhigley karlhigley merged commit bcc9392 into NVIDIA-Merlin:main Sep 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/pytorch chore Maintenance for the repository
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants