Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add save/load & input/output schema methods to T4Rec Model class #507

Merged
merged 6 commits into from
Nov 7, 2022

Conversation

sararb
Copy link
Contributor

@sararb sararb commented Oct 21, 2022

Fixes #499

Goals ⚽

  • Add save/load methods of T4Rec models using CloudPickle and following the protocol defined in Merlin-Model

  • Add an input_schema property to the T4Rec base class Model that builds the model schema from the inputs modules of the heads and returns the merlin schema object.

  • Add an output_schema property to the T4Rec base class Model that builds the model schema from the predictions tasks specified in the heads and returns a merlin schema object with as many ColumnSchemas as the predictions tasks.

Implementation Details 🚧

  • Add save/load methods to the T4Rec base class Model using cloudpickle and following the same protocol proposed in merlin models (here)

  • Add a shape property to input/output schema to provide the length/shape information of list features

  • I used the code of this unit test in merlin system as a starting point to convert the input T4Rec schema to a merlin schema object.

  • The output schema is built based on the prediction tasks provided to the model. The stored information is:
    name, int_domain, value_counts, is_list, shape, and is_ragged.

Constraints

  • The format of the T4Rec model outputs is not standardized and varies a lot based on the PredictionTask and some specific boolean flags such as hf_format. There is a working going on to simplify the output API format ([Task] Standardize the model output format.  #505) which will simplify the output of the model at inference (one prediction tensor is returned in the case of a single task learning or a dictionary of tensors where keys are the task name and values are the predictions tensors).

  • The output dictionary needs to be converted to a NamedTuple for PyTorch serving.

Testing Details 🔍

  • Update model tests with testing input/output schemas.
  • Add a unit test test_save_next_item_prediction_model: saving/loading the model trained with the next item prediction task (the model used in the inference example)

@nvidia-merlin-bot
Copy link

Click to view CI Results
GitHub pull request #507 of commit ad37cb1742c48226ee48b10b556e9d3af7ab4448, no merge conflicts.
Running as SYSTEM
Setting status of ad37cb1742c48226ee48b10b556e9d3af7ab4448 to PENDING with url http://10.20.17.181:8080/job/transformers4rec_tests/230/ and message: 'Build started for merge commit.'
Using context: Jenkins Unit Test Run
Building on master in workspace /var/jenkins_home/workspace/transformers4rec_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA-Merlin/Transformers4Rec.git
 > git init /var/jenkins_home/workspace/transformers4rec_tests/transformers4rec # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/Transformers4Rec.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/Transformers4Rec.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/Transformers4Rec.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/Transformers4Rec.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/Transformers4Rec.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/Transformers4Rec.git +refs/pull/507/*:refs/remotes/origin/pr/507/* # timeout=10
 > git rev-parse ad37cb1742c48226ee48b10b556e9d3af7ab4448^{commit} # timeout=10
Checking out Revision ad37cb1742c48226ee48b10b556e9d3af7ab4448 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f ad37cb1742c48226ee48b10b556e9d3af7ab4448 # timeout=10
Commit message: "add suport of list outputs"
 > git rev-list --no-walk d532234b241f46d77366b98d3450b08f83133c20 # timeout=10
First time build. Skipping changelog.
[transformers4rec_tests] $ /bin/bash /tmp/jenkins16443494207033087218.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.3, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/transformers4rec_tests/transformers4rec
plugins: anyio-3.6.1, xdist-2.5.0, forked-1.4.0, cov-4.0.0
collected 1 item

tests/unit/test_notebooks.py . [100%]

============================== 1 passed in 34.29s ==============================
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=2 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/Transformers4Rec/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[transformers4rec_tests] $ /bin/bash /tmp/jenkins4990450691877199361.sh

@github-actions
Copy link

@sararb sararb added the enhancement New feature or request label Oct 21, 2022
@sararb sararb added this to the Merlin 22.11 milestone Oct 21, 2022
@sararb sararb changed the title [WIP] Add save/load method to T4Rec model class & input/output schema methods [WIP] Add save/load & input/output schema methods to T4Rec Model class Oct 24, 2022
@rnyak rnyak requested review from nzarif, rnyak and gabrielspmoreira and removed request for nzarif October 24, 2022 16:25
@nvidia-merlin-bot
Copy link

Click to view CI Results
GitHub pull request #507 of commit 6dea3fe7ad046fa643b27e77439037ead84b51d3, no merge conflicts.
Running as SYSTEM
Setting status of 6dea3fe7ad046fa643b27e77439037ead84b51d3 to PENDING with url http://10.20.17.181:8080/job/transformers4rec_tests/231/ and message: 'Build started for merge commit.'
Using context: Jenkins Unit Test Run
Building on master in workspace /var/jenkins_home/workspace/transformers4rec_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA-Merlin/Transformers4Rec.git
 > git init /var/jenkins_home/workspace/transformers4rec_tests/transformers4rec # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/Transformers4Rec.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/Transformers4Rec.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/Transformers4Rec.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/Transformers4Rec.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/Transformers4Rec.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/Transformers4Rec.git +refs/pull/507/*:refs/remotes/origin/pr/507/* # timeout=10
 > git rev-parse 6dea3fe7ad046fa643b27e77439037ead84b51d3^{commit} # timeout=10
Checking out Revision 6dea3fe7ad046fa643b27e77439037ead84b51d3 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 6dea3fe7ad046fa643b27e77439037ead84b51d3 # timeout=10
Commit message: "add shape property and fix pr comment"
 > git rev-list --no-walk ad37cb1742c48226ee48b10b556e9d3af7ab4448 # timeout=10
[transformers4rec_tests] $ /bin/bash /tmp/jenkins12034054313781789087.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.3, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/transformers4rec_tests/transformers4rec
plugins: anyio-3.6.1, xdist-2.5.0, forked-1.4.0, cov-4.0.0
collected 1 item

tests/unit/test_notebooks.py . [100%]

============================== 1 passed in 34.83s ==============================
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=2 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/Transformers4Rec/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[transformers4rec_tests] $ /bin/bash /tmp/jenkins15061753303332900603.sh

@sararb sararb changed the title [WIP] Add save/load & input/output schema methods to T4Rec Model class Add save/load & input/output schema methods to T4Rec Model class Oct 26, 2022
@nvidia-merlin-bot
Copy link

Click to view CI Results
GitHub pull request #507 of commit e6f6e58ba93a1f67557c686ce6aafd5ed9891c7c, no merge conflicts.
Running as SYSTEM
Setting status of e6f6e58ba93a1f67557c686ce6aafd5ed9891c7c to PENDING with url http://10.20.17.181:8080/job/transformers4rec_tests/233/ and message: 'Build started for merge commit.'
Using context: Jenkins Unit Test Run
Building on master in workspace /var/jenkins_home/workspace/transformers4rec_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA-Merlin/Transformers4Rec.git
 > git init /var/jenkins_home/workspace/transformers4rec_tests/transformers4rec # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/Transformers4Rec.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/Transformers4Rec.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/Transformers4Rec.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/Transformers4Rec.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/Transformers4Rec.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/Transformers4Rec.git +refs/pull/507/*:refs/remotes/origin/pr/507/* # timeout=10
 > git rev-parse e6f6e58ba93a1f67557c686ce6aafd5ed9891c7c^{commit} # timeout=10
Checking out Revision e6f6e58ba93a1f67557c686ce6aafd5ed9891c7c (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f e6f6e58ba93a1f67557c686ce6aafd5ed9891c7c # timeout=10
Commit message: "update shape property with the convention used in systems"
 > git rev-list --no-walk 0732292df37d1cf427785608858f5590e0bcf6ab # timeout=10
First time build. Skipping changelog.
[transformers4rec_tests] $ /bin/bash /tmp/jenkins15048533315320923114.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.3, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/transformers4rec_tests/transformers4rec
plugins: anyio-3.6.1, xdist-3.0.2, cov-4.0.0
collected 1 item

tests/unit/test_notebooks.py . [100%]

============================== 1 passed in 34.96s ==============================
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=2 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/Transformers4Rec/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[transformers4rec_tests] $ /bin/bash /tmp/jenkins4633483700100336007.sh

@nvidia-merlin-bot
Copy link

Click to view CI Results
GitHub pull request #507 of commit 3fb91d2b5b4d584a37f574e7c679896e1885b12a, no merge conflicts.
Running as SYSTEM
Setting status of 3fb91d2b5b4d584a37f574e7c679896e1885b12a to PENDING with url http://10.20.17.181:8080/job/transformers4rec_tests/234/ and message: 'Build started for merge commit.'
Using context: Jenkins Unit Test Run
Building on master in workspace /var/jenkins_home/workspace/transformers4rec_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA-Merlin/Transformers4Rec.git
 > git init /var/jenkins_home/workspace/transformers4rec_tests/transformers4rec # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/Transformers4Rec.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/Transformers4Rec.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/Transformers4Rec.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/Transformers4Rec.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/Transformers4Rec.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/Transformers4Rec.git +refs/pull/507/*:refs/remotes/origin/pr/507/* # timeout=10
 > git rev-parse 3fb91d2b5b4d584a37f574e7c679896e1885b12a^{commit} # timeout=10
Checking out Revision 3fb91d2b5b4d584a37f574e7c679896e1885b12a (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 3fb91d2b5b4d584a37f574e7c679896e1885b12a # timeout=10
Commit message: "remove max_sequence_length from in/out schema methods"
 > git rev-list --no-walk e6f6e58ba93a1f67557c686ce6aafd5ed9891c7c # timeout=10
[transformers4rec_tests] $ /bin/bash /tmp/jenkins1055468907347027097.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.3, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/transformers4rec_tests/transformers4rec
plugins: anyio-3.6.1, xdist-3.0.2, cov-4.0.0
collected 1 item

tests/unit/test_notebooks.py . [100%]

============================== 1 passed in 34.80s ==============================
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=2 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/Transformers4Rec/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[transformers4rec_tests] $ /bin/bash /tmp/jenkins13660976709665326890.sh

# At inference, we just need the predictions tensors.
# TODO: We are simplifying the logic around `hf_format` in the multi-gpu
# support work.
if not training and not self.hf_format:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove not training from here so that the hf_format controls the output. That would make it work in systems without a model adapter wrapper class

@nvidia-merlin-bot
Copy link

Click to view CI Results
GitHub pull request #507 of commit c76b416a920916779dfcba953e80a3a02c5c3538, no merge conflicts.
Running as SYSTEM
Setting status of c76b416a920916779dfcba953e80a3a02c5c3538 to PENDING with url http://10.20.17.181:8080/job/transformers4rec_tests/235/ and message: 'Build started for merge commit.'
Using context: Jenkins Unit Test Run
Building on master in workspace /var/jenkins_home/workspace/transformers4rec_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA-Merlin/Transformers4Rec.git
 > git init /var/jenkins_home/workspace/transformers4rec_tests/transformers4rec # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/Transformers4Rec.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/Transformers4Rec.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/Transformers4Rec.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/Transformers4Rec.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/Transformers4Rec.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/Transformers4Rec.git +refs/pull/507/*:refs/remotes/origin/pr/507/* # timeout=10
 > git rev-parse c76b416a920916779dfcba953e80a3a02c5c3538^{commit} # timeout=10
Checking out Revision c76b416a920916779dfcba953e80a3a02c5c3538 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f c76b416a920916779dfcba953e80a3a02c5c3538 # timeout=10
Commit message: "fix PR comments"
 > git rev-list --no-walk 3fb91d2b5b4d584a37f574e7c679896e1885b12a # timeout=10
[transformers4rec_tests] $ /bin/bash /tmp/jenkins14522582708295971654.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.3, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/transformers4rec_tests/transformers4rec
plugins: anyio-3.6.1, xdist-3.0.2, cov-4.0.0
collected 1 item

tests/unit/test_notebooks.py . [100%]

============================== 1 passed in 34.73s ==============================
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=2 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/Transformers4Rec/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[transformers4rec_tests] $ /bin/bash /tmp/jenkins2293366441820960841.sh

@nvidia-merlin-bot
Copy link

Click to view CI Results
GitHub pull request #507 of commit a399724271a5c77c0fc25f9873afc1456e003f6e, no merge conflicts.
Running as SYSTEM
Setting status of a399724271a5c77c0fc25f9873afc1456e003f6e to PENDING with url http://merlin-infra1.nvidia.com:8080/job/transformers4rec_tests/240/ and message: 'Build started for merge commit.'
Using context: Jenkins Unit Test Run
Building on master in workspace /var/jenkins_home/workspace/transformers4rec_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA-Merlin/Transformers4Rec.git
 > git init /var/jenkins_home/workspace/transformers4rec_tests/transformers4rec # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/Transformers4Rec.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/Transformers4Rec.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/Transformers4Rec.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/Transformers4Rec.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/Transformers4Rec.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/Transformers4Rec.git +refs/pull/507/*:refs/remotes/origin/pr/507/* # timeout=10
 > git rev-parse a399724271a5c77c0fc25f9873afc1456e003f6e^{commit} # timeout=10
Checking out Revision a399724271a5c77c0fc25f9873afc1456e003f6e (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f a399724271a5c77c0fc25f9873afc1456e003f6e # timeout=10
Commit message: "Merge branch 'main' into save-schema-for-t4rec-model"
 > git rev-list --no-walk ecae4337558075f1282ad3a5e40bbf6346b57243 # timeout=10
First time build. Skipping changelog.
[transformers4rec_tests] $ /bin/bash /tmp/jenkins17120315256631438005.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.3, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/transformers4rec_tests/transformers4rec
plugins: anyio-3.6.1, xdist-3.0.2, cov-4.0.0
collected 1 item

tests/unit/test_notebooks.py . [100%]

============================== 1 passed in 34.74s ==============================
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=2 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/Transformers4Rec/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[transformers4rec_tests] $ /bin/bash /tmp/jenkins9450939901728364725.sh

@nvidia-merlin-bot
Copy link

Click to view CI Results
GitHub pull request #507 of commit 9c513119b2f522c662a288dd6dade872b906af14, no merge conflicts.
Running as SYSTEM
Setting status of 9c513119b2f522c662a288dd6dade872b906af14 to PENDING with url http://merlin-infra1.nvidia.com:8080/job/transformers4rec_tests/244/ and message: 'Build started for merge commit.'
Using context: Jenkins Unit Test Run
Building on master in workspace /var/jenkins_home/workspace/transformers4rec_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA-Merlin/Transformers4Rec.git
 > git init /var/jenkins_home/workspace/transformers4rec_tests/transformers4rec # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/Transformers4Rec.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/Transformers4Rec.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/Transformers4Rec.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/Transformers4Rec.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/Transformers4Rec.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/Transformers4Rec.git +refs/pull/507/*:refs/remotes/origin/pr/507/* # timeout=10
 > git rev-parse 9c513119b2f522c662a288dd6dade872b906af14^{commit} # timeout=10
Checking out Revision 9c513119b2f522c662a288dd6dade872b906af14 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 9c513119b2f522c662a288dd6dade872b906af14 # timeout=10
Commit message: "Merge branch 'main' into save-schema-for-t4rec-model"
 > git rev-list --no-walk 9e8632f3e5567381999a8da5a3edcfbe98529a9a # timeout=10
First time build. Skipping changelog.
[transformers4rec_tests] $ /bin/bash /tmp/jenkins1417152996509697193.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.3, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/transformers4rec_tests/transformers4rec
plugins: anyio-3.6.1, xdist-3.0.2, cov-4.0.0
collected 1 item

tests/unit/test_notebooks.py F [100%]

=================================== FAILURES ===================================
_________________________________ test_session _________________________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-38/test_session0')

@pytest.mark.skipif(importlib.util.find_spec("cudf") is None, reason="needs cudf")
def test_session(tmpdir):
    BASE_PATH = os.path.join(dirname(TEST_PATH), SESSION_PATH)
    os.environ["INPUT_DATA_DIR"] = "/tmp/data/"
    # Run ETL
    nb_path = os.path.join(BASE_PATH, "01-ETL-with-NVTabular.ipynb")
    _run_notebook(tmpdir, nb_path)

    # Run session based
    torch = importlib.util.find_spec("torch")
    if torch is not None:
        os.environ["INPUT_SCHEMA_PATH"] = BASE_PATH + "schema.pb"
        nb_path = os.path.join(BASE_PATH, "02-session-based-XLNet-with-PyT.ipynb")
      _run_notebook(tmpdir, nb_path)

tests/unit/test_notebooks.py:44:


tests/unit/test_notebooks.py:66: in _run_notebook
subprocess.check_output([sys.executable, script_path])
/usr/lib/python3.8/subprocess.py:415: in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,


input = None, capture_output = False, timeout = None, check = True
popenargs = (['/usr/bin/python3', '/tmp/pytest-of-jenkins/pytest-38/test_session0/notebook.py'],)
kwargs = {'stdout': -1}, process = <subprocess.Popen object at 0x7f07cb6151f0>
stdout = b"['/tmp/data//sessions_by_day/1/train.parquet']\n********************\nLaunch training for day 1 are:\n********************\n\n"
stderr = None, retcode = 1

def run(*popenargs,
        input=None, capture_output=False, timeout=None, check=False, **kwargs):
    """Run command with arguments and return a CompletedProcess instance.

    The returned instance will have attributes args, returncode, stdout and
    stderr. By default, stdout and stderr are not captured, and those attributes
    will be None. Pass stdout=PIPE and/or stderr=PIPE in order to capture them.

    If check is True and the exit code was non-zero, it raises a
    CalledProcessError. The CalledProcessError object will have the return code
    in the returncode attribute, and output & stderr attributes if those streams
    were captured.

    If timeout is given, and the process takes too long, a TimeoutExpired
    exception will be raised.

    There is an optional argument "input", allowing you to
    pass bytes or a string to the subprocess's stdin.  If you use this argument
    you may not also use the Popen constructor's "stdin" argument, as
    it will be used internally.

    By default, all communication is in bytes, and therefore any "input" should
    be bytes, and the stdout and stderr will be bytes. If in text mode, any
    "input" should be a string, and stdout and stderr will be strings decoded
    according to locale encoding, or by "encoding" if set. Text mode is
    triggered by setting any of text, encoding, errors or universal_newlines.

    The other arguments are the same as for the Popen constructor.
    """
    if input is not None:
        if kwargs.get('stdin') is not None:
            raise ValueError('stdin and input arguments may not both be used.')
        kwargs['stdin'] = PIPE

    if capture_output:
        if kwargs.get('stdout') is not None or kwargs.get('stderr') is not None:
            raise ValueError('stdout and stderr arguments may not be used '
                             'with capture_output.')
        kwargs['stdout'] = PIPE
        kwargs['stderr'] = PIPE

    with Popen(*popenargs, **kwargs) as process:
        try:
            stdout, stderr = process.communicate(input, timeout=timeout)
        except TimeoutExpired as exc:
            process.kill()
            if _mswindows:
                # Windows accumulates the output in a single blocking
                # read() call run on child threads, with the timeout
                # being done in a join() on those threads.  communicate()
                # _after_ kill() is required to collect that and add it
                # to the exception.
                exc.stdout, exc.stderr = process.communicate()
            else:
                # POSIX _communicate already populated the output so
                # far into the TimeoutExpired exception.
                process.wait()
            raise
        except:  # Including KeyboardInterrupt, communicate handled that.
            process.kill()
            # We don't call process.wait() as .__exit__ does that for us.
            raise
        retcode = process.poll()
        if check and retcode:
          raise CalledProcessError(retcode, process.args,
                                     output=stdout, stderr=stderr)

E subprocess.CalledProcessError: Command '['/usr/bin/python3', '/tmp/pytest-of-jenkins/pytest-38/test_session0/notebook.py']' returned non-zero exit status 1.

/usr/lib/python3.8/subprocess.py:516: CalledProcessError
----------------------------- Captured stderr call -----------------------------
/usr/local/lib/python3.8/dist-packages/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.ITEM_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.ITEM: 'item'>, <Tags.ID: 'id'>].
warnings.warn(
/usr/local/lib/python3.8/dist-packages/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.ITEM_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.ITEM: 'item'>, <Tags.ID: 'id'>].
warnings.warn(
/usr/local/lib/python3.8/dist-packages/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.ITEM_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.ITEM: 'item'>, <Tags.ID: 'id'>].
warnings.warn(

Creating time-based splits: 0%| | 0/9 [00:00<?, ?it/s]
Creating time-based splits: 11%|█ | 1/9 [00:00<00:01, 5.40it/s]
Creating time-based splits: 22%|██▏ | 2/9 [00:00<00:01, 6.80it/s]
Creating time-based splits: 33%|███▎ | 3/9 [00:00<00:00, 7.18it/s]
Creating time-based splits: 44%|████▍ | 4/9 [00:00<00:00, 7.40it/s]
Creating time-based splits: 56%|█████▌ | 5/9 [00:00<00:00, 7.52it/s]
Creating time-based splits: 67%|██████▋ | 6/9 [00:00<00:00, 7.70it/s]
Creating time-based splits: 78%|███████▊ | 7/9 [00:00<00:00, 7.74it/s]
Creating time-based splits: 89%|████████▉ | 8/9 [00:01<00:00, 7.61it/s]
Creating time-based splits: 100%|██████████| 9/9 [00:01<00:00, 7.67it/s]
Creating time-based splits: 100%|██████████| 9/9 [00:01<00:00, 7.44it/s]
/usr/lib/python3/dist-packages/requests/init.py:89: RequestsDependencyWarning: urllib3 (1.26.12) or chardet (3.0.4) doesn't match a supported version!
warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
***** Running training *****
Num examples = 1792
Num Epochs = 5
Instantaneous batch size per device = 128
Total train batch size (w. parallel, distributed & accumulation) = 128
Gradient Accumulation steps = 1
Total optimization steps = 70

0%| | 0/70 [00:00<?, ?it/s]Traceback (most recent call last):
File "/tmp/pytest-of-jenkins/pytest-38/test_session0/notebook.py", line 202, in
trainer.train()
File "/usr/local/lib/python3.8/dist-packages/transformers/trainer.py", line 1316, in train
tr_loss_step = self.training_step(model, inputs)
File "/usr/local/lib/python3.8/dist-packages/transformers/trainer.py", line 1849, in training_step
loss = self.compute_loss(model, inputs)
File "/usr/local/lib/python3.8/dist-packages/transformers/trainer.py", line 1881, in compute_loss
outputs = model(**inputs)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/var/jenkins_home/workspace/transformers4rec_tests/transformers4rec/transformers4rec/torch/trainer.py", line 830, in forward
return self.wrapper_module(inputs, *args)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/var/jenkins_home/workspace/transformers4rec_tests/transformers4rec/transformers4rec/torch/model/base.py", line 553, in forward
head(inputs, call_body=True, training=training, always_output_dict=True, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/var/jenkins_home/workspace/transformers4rec_tests/transformers4rec/transformers4rec/torch/model/base.py", line 401, in forward
outputs[name] = task(
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/var/jenkins_home/workspace/transformers4rec_tests/transformers4rec/transformers4rec/torch/model/prediction_task.py", line 230, in forward
if training or not ignore_masking:
NameError: name 'training' is not defined

0%| | 0/70 [00:01<?, ?it/s]
=========================== short test summary info ============================
FAILED tests/unit/test_notebooks.py::test_session - subprocess.CalledProcessE...
============================== 1 failed in 22.89s ==============================
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=2 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/Transformers4Rec/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[transformers4rec_tests] $ /bin/bash /tmp/jenkins18192067604805347325.sh

@sararb sararb force-pushed the save-schema-for-t4rec-model branch from 9c51311 to 0a2f6bd Compare November 7, 2022 22:37
@nvidia-merlin-bot
Copy link

Click to view CI Results
GitHub pull request #507 of commit 0a2f6bd2c47536c52ed305f116eed260d5ce2d9b, no merge conflicts.
Running as SYSTEM
Setting status of 0a2f6bd2c47536c52ed305f116eed260d5ce2d9b to PENDING with url http://merlin-infra1.nvidia.com:8080/job/transformers4rec_tests/245/ and message: 'Build started for merge commit.'
Using context: Jenkins Unit Test Run
Building on master in workspace /var/jenkins_home/workspace/transformers4rec_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA-Merlin/Transformers4Rec.git
 > git init /var/jenkins_home/workspace/transformers4rec_tests/transformers4rec # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/Transformers4Rec.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/Transformers4Rec.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/Transformers4Rec.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/Transformers4Rec.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/Transformers4Rec.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/Transformers4Rec.git +refs/pull/507/*:refs/remotes/origin/pr/507/* # timeout=10
 > git rev-parse 0a2f6bd2c47536c52ed305f116eed260d5ce2d9b^{commit} # timeout=10
Checking out Revision 0a2f6bd2c47536c52ed305f116eed260d5ce2d9b (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 0a2f6bd2c47536c52ed305f116eed260d5ce2d9b # timeout=10
Commit message: "fix PR comments"
 > git rev-list --no-walk 9c513119b2f522c662a288dd6dade872b906af14 # timeout=10
First time build. Skipping changelog.
[transformers4rec_tests] $ /bin/bash /tmp/jenkins14261443200611807130.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.3, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/transformers4rec_tests/transformers4rec
plugins: anyio-3.6.1, xdist-3.0.2, cov-4.0.0
collected 1 item

tests/unit/test_notebooks.py . [100%]

============================== 1 passed in 36.55s ==============================
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=2 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/Transformers4Rec/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[transformers4rec_tests] $ /bin/bash /tmp/jenkins3973122126145648360.sh

@sararb sararb merged commit 0adb10c into main Nov 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Task] Transformers4Rec Serving using Merlin Schema
3 participants