Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

quick fix: apply masking when training next item prediction #514

Merged
merged 4 commits into from
Nov 7, 2022

Conversation

nzarif
Copy link
Contributor

@nzarif nzarif commented Nov 4, 2022

Goals ⚽

  • Apply masking when training the NextItemPredictionTask through HF transfomers Trainer.

Implementation Details 🚧

  • Observed when training for the NextItemPredictionTask in the forward() function, the correct value is not set for the ignore_masking flag therefore the correct flow was not happening.
  • Added the training flag to NextItemPredictionTask.forward() and passed the correct value to it from Head.forward() to make sure the forward function of NextItemPrediciton is following the correct flow during training.

Testing Details 🔍

  • You can test with any of the CI test that use the NextItemPredictionTask and use a debugger to follow the control flow inside NextItemPredictionTask.forward()

@nzarif nzarif added bug Something isn't working status/needs-review labels Nov 4, 2022
@nzarif nzarif requested a review from sararb November 4, 2022 22:31
@nzarif nzarif self-assigned this Nov 4, 2022
@nvidia-merlin-bot
Copy link

Click to view CI Results
GitHub pull request #514 of commit 399e5a10825eaca5412e2bc34870d4e0e0ebe87b, no merge conflicts.
Running as SYSTEM
Setting status of 399e5a10825eaca5412e2bc34870d4e0e0ebe87b to PENDING with url http://merlin-infra1.nvidia.com:8080/job/transformers4rec_tests/236/ and message: 'Build started for merge commit.'
Using context: Jenkins Unit Test Run
Building on master in workspace /var/jenkins_home/workspace/transformers4rec_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA-Merlin/Transformers4Rec.git
 > git init /var/jenkins_home/workspace/transformers4rec_tests/transformers4rec # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/Transformers4Rec.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/Transformers4Rec.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/Transformers4Rec.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/Transformers4Rec.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/Transformers4Rec.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/Transformers4Rec.git +refs/pull/514/*:refs/remotes/origin/pr/514/* # timeout=10
 > git rev-parse 399e5a10825eaca5412e2bc34870d4e0e0ebe87b^{commit} # timeout=10
Checking out Revision 399e5a10825eaca5412e2bc34870d4e0e0ebe87b (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 399e5a10825eaca5412e2bc34870d4e0e0ebe87b # timeout=10
Commit message: "quick fix: apply masking when training next item prediction"
 > git rev-list --no-walk c76b416a920916779dfcba953e80a3a02c5c3538 # timeout=10
[transformers4rec_tests] $ /bin/bash /tmp/jenkins12025857592881763472.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.3, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/transformers4rec_tests/transformers4rec
plugins: anyio-3.6.1, xdist-3.0.2, cov-4.0.0
collected 1 item

tests/unit/test_notebooks.py . [100%]

============================== 1 passed in 36.35s ==============================
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=2 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/Transformers4Rec/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[transformers4rec_tests] $ /bin/bash /tmp/jenkins10518345588757153841.sh

@nvidia-merlin-bot
Copy link

Click to view CI Results
GitHub pull request #514 of commit 4935a97e572f1aed20b0a6c0d4b9b70df0f80b91, no merge conflicts.
Running as SYSTEM
Setting status of 4935a97e572f1aed20b0a6c0d4b9b70df0f80b91 to PENDING with url http://merlin-infra1.nvidia.com:8080/job/transformers4rec_tests/237/ and message: 'Build started for merge commit.'
Using context: Jenkins Unit Test Run
Building on master in workspace /var/jenkins_home/workspace/transformers4rec_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA-Merlin/Transformers4Rec.git
 > git init /var/jenkins_home/workspace/transformers4rec_tests/transformers4rec # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/Transformers4Rec.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/Transformers4Rec.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/Transformers4Rec.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/Transformers4Rec.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/Transformers4Rec.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/Transformers4Rec.git +refs/pull/514/*:refs/remotes/origin/pr/514/* # timeout=10
 > git rev-parse 4935a97e572f1aed20b0a6c0d4b9b70df0f80b91^{commit} # timeout=10
Checking out Revision 4935a97e572f1aed20b0a6c0d4b9b70df0f80b91 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 4935a97e572f1aed20b0a6c0d4b9b70df0f80b91 # timeout=10
Commit message: "fixing flake8 error"
 > git rev-list --no-walk 399e5a10825eaca5412e2bc34870d4e0e0ebe87b # timeout=10
[transformers4rec_tests] $ /bin/bash /tmp/jenkins2833483065735617484.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.3, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/transformers4rec_tests/transformers4rec
plugins: anyio-3.6.1, xdist-3.0.2, cov-4.0.0
collected 1 item

tests/unit/test_notebooks.py . [100%]

============================== 1 passed in 36.21s ==============================
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=2 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/Transformers4Rec/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[transformers4rec_tests] $ /bin/bash /tmp/jenkins4588604924807922346.sh

@nvidia-merlin-bot
Copy link

Click to view CI Results
GitHub pull request #514 of commit 83a6200f983ddad907158114c40be34d9b36ddc0, no merge conflicts.
Running as SYSTEM
Setting status of 83a6200f983ddad907158114c40be34d9b36ddc0 to PENDING with url http://merlin-infra1.nvidia.com:8080/job/transformers4rec_tests/238/ and message: 'Build started for merge commit.'
Using context: Jenkins Unit Test Run
Building on master in workspace /var/jenkins_home/workspace/transformers4rec_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA-Merlin/Transformers4Rec.git
 > git init /var/jenkins_home/workspace/transformers4rec_tests/transformers4rec # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/Transformers4Rec.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/Transformers4Rec.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/Transformers4Rec.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/Transformers4Rec.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/Transformers4Rec.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/Transformers4Rec.git +refs/pull/514/*:refs/remotes/origin/pr/514/* # timeout=10
 > git rev-parse 83a6200f983ddad907158114c40be34d9b36ddc0^{commit} # timeout=10
Checking out Revision 83a6200f983ddad907158114c40be34d9b36ddc0 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 83a6200f983ddad907158114c40be34d9b36ddc0 # timeout=10
Commit message: "checked and fixed with flake8 and black"
 > git rev-list --no-walk 4935a97e572f1aed20b0a6c0d4b9b70df0f80b91 # timeout=10
[transformers4rec_tests] $ /bin/bash /tmp/jenkins17365786219193940595.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.3, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/transformers4rec_tests/transformers4rec
plugins: anyio-3.6.1, xdist-3.0.2, cov-4.0.0
collected 1 item

tests/unit/test_notebooks.py . [100%]

============================== 1 passed in 36.65s ==============================
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=2 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/Transformers4Rec/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[transformers4rec_tests] $ /bin/bash /tmp/jenkins5683765906642754855.sh

@nvidia-merlin-bot
Copy link

Click to view CI Results
GitHub pull request #514 of commit 95c477db1da8278b09521884c30bb504292dac6e, no merge conflicts.
Running as SYSTEM
Setting status of 95c477db1da8278b09521884c30bb504292dac6e to PENDING with url http://merlin-infra1.nvidia.com:8080/job/transformers4rec_tests/242/ and message: 'Build started for merge commit.'
Using context: Jenkins Unit Test Run
Building on master in workspace /var/jenkins_home/workspace/transformers4rec_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA-Merlin/Transformers4Rec.git
 > git init /var/jenkins_home/workspace/transformers4rec_tests/transformers4rec # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/Transformers4Rec.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/Transformers4Rec.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/Transformers4Rec.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/Transformers4Rec.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/Transformers4Rec.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/Transformers4Rec.git +refs/pull/514/*:refs/remotes/origin/pr/514/* # timeout=10
 > git rev-parse 95c477db1da8278b09521884c30bb504292dac6e^{commit} # timeout=10
Checking out Revision 95c477db1da8278b09521884c30bb504292dac6e (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 95c477db1da8278b09521884c30bb504292dac6e # timeout=10
Commit message: "Merge branch 'main' into masking_quick_fix"
 > git rev-list --no-walk e943edd2ff80c338fe63a2e17c3539c31d186289 # timeout=10
[transformers4rec_tests] $ /bin/bash /tmp/jenkins8901256520924112050.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.3, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/transformers4rec_tests/transformers4rec
plugins: anyio-3.6.1, xdist-3.0.2, cov-4.0.0
collected 1 item

tests/unit/test_notebooks.py . [100%]

============================== 1 passed in 36.59s ==============================
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=2 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/Transformers4Rec/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[transformers4rec_tests] $ /bin/bash /tmp/jenkins206540360361826361.sh

@github-actions
Copy link

github-actions bot commented Nov 7, 2022

@sararb sararb merged commit afaff11 into main Nov 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working status/needs-review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants