Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix CI test based on the requirements of the new merlin loader #536

Merged
merged 3 commits into from
Nov 16, 2022
Merged

Conversation

sararb
Copy link
Contributor

@sararb sararb commented Nov 16, 2022

  • The objective of the PR is to fix the failing CI test. The test was failing because nvtabular was updated to use the new merlin loader but T4Rec was built on top of the nvtabular one. In fact, the new loader changed how the input features dtypes are set and how the batch features are selected, which breaks some logic used in T4Rec.

Note: This PR includes quick fixes to ensure the CI test is passing but we need to discuss/implement a long-term solution that ensures a stable integration of the new merlin loader in T4Rec:

  • In the previous loader, all features were set to int32 or float32 without taking into account their original types in the parquet file. In the new version, the type of the tensor returned matches the types specified in the source parquet files. The ecomrees46 dataset used in CI included a hexadecimal feature (user_session) which raises an error in the new data loader as it expects to have only numerical data in the dataset. In this PR, a new dataset was uploaded to drive with user_session converted to numerical.

  • The previous loader used the names of the features specified in the parameters conts, cats, and labels to filter the inputs tensors returned by the data loader (even if the dataset schema contained additional column names). In the new version, it seems that the data loader will always return the features specified in the dataset schema and augment it with missing features from conts, cats, and labels. This breaks some logic in T4Rec blocks (like Stochastic-Swap noise that iterates over all inputs of the data loader without taking into account the model's schema )

  • Cast the dtype of targets to int64 when using the torch.scatter_ method (this method requires the index to be explicitly set at torch.int64)

@sararb sararb added bug Something isn't working ci labels Nov 16, 2022
@sararb sararb added this to the Merlin 22.11 milestone Nov 16, 2022
@sararb sararb self-assigned this Nov 16, 2022
@nvidia-merlin-bot
Copy link

Click to view CI Results
GitHub pull request #536 of commit ef9614269dc0b0318ff9ac9e11c320142ad1df61, no merge conflicts.
Running as SYSTEM
Setting status of ef9614269dc0b0318ff9ac9e11c320142ad1df61 to PENDING with url http://merlin-infra1.nvidia.com:8080/job/transformers4rec_tests/293/ and message: 'Build started for merge commit.'
Using context: Jenkins Unit Test Run
Building on master in workspace /var/jenkins_home/workspace/transformers4rec_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA-Merlin/Transformers4Rec.git
 > git init /var/jenkins_home/workspace/transformers4rec_tests/transformers4rec # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/Transformers4Rec.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/Transformers4Rec.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/Transformers4Rec.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/Transformers4Rec.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/Transformers4Rec.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/Transformers4Rec.git +refs/pull/536/*:refs/remotes/origin/pr/536/* # timeout=10
 > git rev-parse ef9614269dc0b0318ff9ac9e11c320142ad1df61^{commit} # timeout=10
Checking out Revision ef9614269dc0b0318ff9ac9e11c320142ad1df61 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f ef9614269dc0b0318ff9ac9e11c320142ad1df61 # timeout=10
Commit message: "update exomrees46 data used in ci with hex feature converted to numerical"
 > git rev-list --no-walk 3ccd2aeab866b703cf141af09395dceabd28addb # timeout=10
First time build. Skipping changelog.
[transformers4rec_tests] $ /bin/bash /tmp/jenkins17988351237400384565.sh
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Collecting git+https://github.com/NVIDIA-Merlin/NVTabular.git
  Cloning https://github.com/NVIDIA-Merlin/NVTabular.git to /tmp/pip-req-build-cxzj3ocu
  Running command git clone --filter=blob:none --quiet https://github.com/NVIDIA-Merlin/NVTabular.git /tmp/pip-req-build-cxzj3ocu
  Resolved https://github.com/NVIDIA-Merlin/NVTabular.git to commit ba4c14159a8e858c8998d4158a4376e65a8fa266
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Requirement already satisfied: merlin-dataloader>=0.0.2 in /usr/local/lib/python3.8/dist-packages (from nvtabular==1.6.0+4.gba4c1415) (0.0.2)
Requirement already satisfied: scipy in /usr/local/lib/python3.8/dist-packages (from nvtabular==1.6.0+4.gba4c1415) (1.8.1)
Requirement already satisfied: merlin-core>=0.2.0 in /usr/local/lib/python3.8/dist-packages (from nvtabular==1.6.0+4.gba4c1415) (0.6.0+1.g5926fcf)
Requirement already satisfied: dask>=2022.3.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (2022.5.1)
Requirement already satisfied: betterproto<2.0.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (1.2.5)
Requirement already satisfied: tqdm>=4.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (4.64.1)
Requirement already satisfied: numba>=0.54 in /usr/local/lib/python3.8/dist-packages (from merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (0.56.2)
Requirement already satisfied: protobuf>=3.0.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (3.19.5)
Requirement already satisfied: tensorflow-metadata>=1.2.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (1.10.0)
Requirement already satisfied: fsspec==2022.5.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (2022.5.0)
Requirement already satisfied: distributed>=2022.3.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (2022.5.1)
Requirement already satisfied: pyarrow>=5.0.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (7.0.0)
Requirement already satisfied: packaging in /usr/local/lib/python3.8/dist-packages (from merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (21.3)
Requirement already satisfied: pandas<1.4.0dev0,>=1.2.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (1.3.5)
Requirement already satisfied: numpy<1.25.0,>=1.17.3 in /usr/local/lib/python3.8/dist-packages (from scipy->nvtabular==1.6.0+4.gba4c1415) (1.22.4)
Requirement already satisfied: grpclib in /usr/local/lib/python3.8/dist-packages (from betterproto<2.0.0->merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (0.4.3)
Requirement already satisfied: stringcase in /usr/local/lib/python3.8/dist-packages (from betterproto<2.0.0->merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (1.2.0)
Requirement already satisfied: partd>=0.3.10 in /usr/local/lib/python3.8/dist-packages (from dask>=2022.3.0->merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (1.3.0)
Requirement already satisfied: cloudpickle>=1.1.1 in /usr/local/lib/python3.8/dist-packages (from dask>=2022.3.0->merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (2.2.0)
Requirement already satisfied: toolz>=0.8.2 in /usr/local/lib/python3.8/dist-packages (from dask>=2022.3.0->merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (0.12.0)
Requirement already satisfied: pyyaml>=5.3.1 in /usr/local/lib/python3.8/dist-packages (from dask>=2022.3.0->merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (5.4.1)
Requirement already satisfied: zict>=0.1.3 in /usr/local/lib/python3.8/dist-packages (from distributed>=2022.3.0->merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (2.2.0)
Requirement already satisfied: tblib>=1.6.0 in /usr/local/lib/python3.8/dist-packages (from distributed>=2022.3.0->merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (1.7.0)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.8/dist-packages (from distributed>=2022.3.0->merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (3.1.2)
Requirement already satisfied: locket>=1.0.0 in /usr/local/lib/python3.8/dist-packages (from distributed>=2022.3.0->merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (1.0.0)
Requirement already satisfied: sortedcontainers!=2.0.0,!=2.0.1 in /usr/local/lib/python3.8/dist-packages (from distributed>=2022.3.0->merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (2.4.0)
Requirement already satisfied: urllib3 in /usr/local/lib/python3.8/dist-packages (from distributed>=2022.3.0->merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (1.26.12)
Requirement already satisfied: click>=6.6 in /usr/local/lib/python3.8/dist-packages (from distributed>=2022.3.0->merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (8.1.3)
Requirement already satisfied: tornado>=6.0.3 in /usr/local/lib/python3.8/dist-packages (from distributed>=2022.3.0->merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (6.2)
Requirement already satisfied: psutil>=5.0 in /usr/local/lib/python3.8/dist-packages (from distributed>=2022.3.0->merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (5.9.2)
Requirement already satisfied: msgpack>=0.6.0 in /usr/local/lib/python3.8/dist-packages (from distributed>=2022.3.0->merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (1.0.4)
Requirement already satisfied: importlib-metadata in /usr/local/lib/python3.8/dist-packages (from numba>=0.54->merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (4.12.0)
Requirement already satisfied: llvmlite<0.40,>=0.39.0dev0 in /usr/local/lib/python3.8/dist-packages (from numba>=0.54->merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (0.39.1)
Requirement already satisfied: setuptools<60 in /usr/lib/python3/dist-packages (from numba>=0.54->merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (45.2.0)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /usr/local/lib/python3.8/dist-packages (from packaging->merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (3.0.9)
Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.8/dist-packages (from pandas<1.4.0dev0,>=1.2.0->merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (2.8.2)
Requirement already satisfied: pytz>=2017.3 in /usr/local/lib/python3.8/dist-packages (from pandas<1.4.0dev0,>=1.2.0->merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (2022.2.1)
Requirement already satisfied: absl-py<2.0.0,>=0.9 in /usr/local/lib/python3.8/dist-packages (from tensorflow-metadata>=1.2.0->merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (1.2.0)
Requirement already satisfied: googleapis-common-protos<2,>=1.52.0 in /usr/local/lib/python3.8/dist-packages (from tensorflow-metadata>=1.2.0->merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (1.52.0)
Requirement already satisfied: six>=1.5 in /var/jenkins_home/.local/lib/python3.8/site-packages (from python-dateutil>=2.7.3->pandas<1.4.0dev0,>=1.2.0->merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (1.15.0)
Requirement already satisfied: heapdict in /usr/local/lib/python3.8/dist-packages (from zict>=0.1.3->distributed>=2022.3.0->merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (1.0.1)
Requirement already satisfied: h2<5,>=3.1.0 in /usr/local/lib/python3.8/dist-packages (from grpclib->betterproto<2.0.0->merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (4.1.0)
Requirement already satisfied: multidict in /usr/local/lib/python3.8/dist-packages (from grpclib->betterproto<2.0.0->merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (6.0.2)
Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.8/dist-packages (from importlib-metadata->numba>=0.54->merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (3.8.1)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.8/dist-packages (from jinja2->distributed>=2022.3.0->merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (2.1.1)
Requirement already satisfied: hpack<5,>=4.0 in /usr/local/lib/python3.8/dist-packages (from h2<5,>=3.1.0->grpclib->betterproto<2.0.0->merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (4.0.0)
Requirement already satisfied: hyperframe<7,>=6.0 in /usr/local/lib/python3.8/dist-packages (from h2<5,>=3.1.0->grpclib->betterproto<2.0.0->merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (6.0.1)
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.3, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/transformers4rec_tests/transformers4rec
plugins: anyio-3.6.1, xdist-3.0.2, cov-4.0.0
collected 1 item

tests/unit/test_notebooks.py . [100%]

============================== 1 passed in 37.03s ==============================
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=2 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/Transformers4Rec/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[transformers4rec_tests] $ /bin/bash /tmp/jenkins8357412832706979962.sh

@sararb sararb merged commit 5060726 into main Nov 16, 2022
@sararb sararb deleted the fix-ci branch November 16, 2022 19:11
sararb added a commit that referenced this pull request Nov 16, 2022
* cast targets to int64 dtype required by torch.scatter_

* update the dataset schema to use the same as the one defined in the model

* update exomrees46 data used in ci with hex feature converted to numerical
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working ci P0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants