Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix dtype mismatch in CLM masking class due to new data loader changes #539

Merged
merged 1 commit into from
Nov 18, 2022

Conversation

sararb
Copy link
Contributor

@sararb sararb commented Nov 18, 2022

This is a quick fix of the error raised by the ci tests related to ### GPT-2 (CLM) - Item Id feature and ### Transformer-XL (CLM) - Item Id feature. The cause of the error is the dtype mismatch between the sequence of item ids returned by the data loader (the new loader keeps the dtype specified in the original parquet file) and the tensor label_seq_trg_eval created in the CLM masking class to store the masked labels (was always set to torch.long)

@sararb sararb added the bug Something isn't working label Nov 18, 2022
@sararb sararb added this to the Merlin 22.11 milestone Nov 18, 2022
@sararb sararb self-assigned this Nov 18, 2022
@nvidia-merlin-bot
Copy link

Click to view CI Results
GitHub pull request #539 of commit 5c863539b0a2528337db56544ac513238ca82b7d, no merge conflicts.
Running as SYSTEM
Setting status of 5c863539b0a2528337db56544ac513238ca82b7d to PENDING with url http://merlin-infra1.nvidia.com:8080/job/transformers4rec_tests/297/ and message: 'Build started for merge commit.'
Using context: Jenkins Unit Test Run
Building on master in workspace /var/jenkins_home/workspace/transformers4rec_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA-Merlin/Transformers4Rec.git
 > git init /var/jenkins_home/workspace/transformers4rec_tests/transformers4rec # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/Transformers4Rec.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/Transformers4Rec.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/Transformers4Rec.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/Transformers4Rec.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/Transformers4Rec.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/Transformers4Rec.git +refs/pull/539/*:refs/remotes/origin/pr/539/* # timeout=10
 > git rev-parse 5c863539b0a2528337db56544ac513238ca82b7d^{commit} # timeout=10
Checking out Revision 5c863539b0a2528337db56544ac513238ca82b7d (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 5c863539b0a2528337db56544ac513238ca82b7d # timeout=10
Commit message: "fix dtype mismatch in CLM masking class"
 > git rev-list --no-walk aa1bfaf9568bbe62c7a9c04554360cd011c5d262 # timeout=10
First time build. Skipping changelog.
[transformers4rec_tests] $ /bin/bash /tmp/jenkins2245853591361396115.sh
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Collecting git+https://github.com/NVIDIA-Merlin/NVTabular.git
  Cloning https://github.com/NVIDIA-Merlin/NVTabular.git to /tmp/pip-req-build-0ygw9xt7
  Running command git clone --filter=blob:none --quiet https://github.com/NVIDIA-Merlin/NVTabular.git /tmp/pip-req-build-0ygw9xt7
  Resolved https://github.com/NVIDIA-Merlin/NVTabular.git to commit ba4c14159a8e858c8998d4158a4376e65a8fa266
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Requirement already satisfied: merlin-core>=0.2.0 in /usr/local/lib/python3.8/dist-packages (from nvtabular==1.6.0+4.gba4c1415) (0.6.0+1.g5926fcf)
Requirement already satisfied: scipy in /usr/local/lib/python3.8/dist-packages (from nvtabular==1.6.0+4.gba4c1415) (1.8.1)
Collecting merlin-dataloader>=0.0.2
  Downloading merlin-dataloader-0.0.2.tar.gz (44 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 44.1/44.1 kB 1.6 MB/s eta 0:00:00
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Requirement already satisfied: tqdm>=4.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (4.64.1)
Requirement already satisfied: pyarrow>=5.0.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (7.0.0)
Requirement already satisfied: protobuf>=3.0.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (3.19.5)
Requirement already satisfied: pandas<1.4.0dev0,>=1.2.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (1.3.5)
Requirement already satisfied: packaging in /usr/local/lib/python3.8/dist-packages (from merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (21.3)
Requirement already satisfied: fsspec==2022.5.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (2022.5.0)
Collecting dask>=2022.3.0
  Downloading dask-2022.11.0-py3-none-any.whl (1.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.1/1.1 MB 11.6 MB/s eta 0:00:00
Requirement already satisfied: numba>=0.54 in /usr/local/lib/python3.8/dist-packages (from merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (0.56.2)
Requirement already satisfied: betterproto<2.0.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (1.2.5)
Requirement already satisfied: tensorflow-metadata>=1.2.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (1.10.0)
Requirement already satisfied: distributed>=2022.3.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (2022.5.1)
Requirement already satisfied: numpy<1.25.0,>=1.17.3 in /usr/local/lib/python3.8/dist-packages (from scipy->nvtabular==1.6.0+4.gba4c1415) (1.22.4)
Requirement already satisfied: stringcase in /usr/local/lib/python3.8/dist-packages (from betterproto<2.0.0->merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (1.2.0)
Requirement already satisfied: grpclib in /usr/local/lib/python3.8/dist-packages (from betterproto<2.0.0->merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (0.4.3)
Requirement already satisfied: partd>=0.3.10 in /usr/local/lib/python3.8/dist-packages (from dask>=2022.3.0->merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (1.3.0)
Requirement already satisfied: cloudpickle>=1.1.1 in /usr/local/lib/python3.8/dist-packages (from dask>=2022.3.0->merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (2.2.0)
Requirement already satisfied: pyyaml>=5.3.1 in /usr/local/lib/python3.8/dist-packages (from dask>=2022.3.0->merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (5.4.1)
Requirement already satisfied: click>=7.0 in /usr/local/lib/python3.8/dist-packages (from dask>=2022.3.0->merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (8.1.3)
Requirement already satisfied: toolz>=0.8.2 in /usr/local/lib/python3.8/dist-packages (from dask>=2022.3.0->merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (0.12.0)
Requirement already satisfied: tornado>=6.0.3 in /usr/local/lib/python3.8/dist-packages (from distributed>=2022.3.0->merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (6.2)
Requirement already satisfied: sortedcontainers!=2.0.0,!=2.0.1 in /usr/local/lib/python3.8/dist-packages (from distributed>=2022.3.0->merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (2.4.0)
  Downloading dask-2022.5.1-py3-none-any.whl (1.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.1/1.1 MB 78.0 MB/s eta 0:00:00
Requirement already satisfied: psutil>=5.0 in /usr/local/lib/python3.8/dist-packages (from distributed>=2022.3.0->merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (5.9.2)
Requirement already satisfied: urllib3 in /usr/local/lib/python3.8/dist-packages (from distributed>=2022.3.0->merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (1.26.12)
Requirement already satisfied: zict>=0.1.3 in /usr/local/lib/python3.8/dist-packages (from distributed>=2022.3.0->merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (2.2.0)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.8/dist-packages (from distributed>=2022.3.0->merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (3.1.2)
Requirement already satisfied: tblib>=1.6.0 in /usr/local/lib/python3.8/dist-packages (from distributed>=2022.3.0->merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (1.7.0)
Requirement already satisfied: locket>=1.0.0 in /usr/local/lib/python3.8/dist-packages (from distributed>=2022.3.0->merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (1.0.0)
Requirement already satisfied: msgpack>=0.6.0 in /usr/local/lib/python3.8/dist-packages (from distributed>=2022.3.0->merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (1.0.4)
Requirement already satisfied: llvmlite<0.40,>=0.39.0dev0 in /usr/local/lib/python3.8/dist-packages (from numba>=0.54->merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (0.39.1)
Requirement already satisfied: importlib-metadata in /usr/local/lib/python3.8/dist-packages (from numba>=0.54->merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (4.12.0)
Requirement already satisfied: setuptools<60 in /usr/lib/python3/dist-packages (from numba>=0.54->merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (45.2.0)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /usr/local/lib/python3.8/dist-packages (from packaging->merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (3.0.9)
Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.8/dist-packages (from pandas<1.4.0dev0,>=1.2.0->merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (2.8.2)
Requirement already satisfied: pytz>=2017.3 in /usr/local/lib/python3.8/dist-packages (from pandas<1.4.0dev0,>=1.2.0->merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (2022.2.1)
Requirement already satisfied: absl-py<2.0.0,>=0.9 in /usr/local/lib/python3.8/dist-packages (from tensorflow-metadata>=1.2.0->merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (1.2.0)
Requirement already satisfied: googleapis-common-protos<2,>=1.52.0 in /usr/local/lib/python3.8/dist-packages (from tensorflow-metadata>=1.2.0->merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (1.52.0)
Requirement already satisfied: six>=1.5 in /var/jenkins_home/.local/lib/python3.8/site-packages (from python-dateutil>=2.7.3->pandas<1.4.0dev0,>=1.2.0->merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (1.15.0)
Requirement already satisfied: heapdict in /usr/local/lib/python3.8/dist-packages (from zict>=0.1.3->distributed>=2022.3.0->merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (1.0.1)
Requirement already satisfied: h2<5,>=3.1.0 in /usr/local/lib/python3.8/dist-packages (from grpclib->betterproto<2.0.0->merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (4.1.0)
Requirement already satisfied: multidict in /usr/local/lib/python3.8/dist-packages (from grpclib->betterproto<2.0.0->merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (6.0.2)
Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.8/dist-packages (from importlib-metadata->numba>=0.54->merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (3.8.1)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.8/dist-packages (from jinja2->distributed>=2022.3.0->merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (2.1.1)
Requirement already satisfied: hpack<5,>=4.0 in /usr/local/lib/python3.8/dist-packages (from h2<5,>=3.1.0->grpclib->betterproto<2.0.0->merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (4.0.0)
Requirement already satisfied: hyperframe<7,>=6.0 in /usr/local/lib/python3.8/dist-packages (from h2<5,>=3.1.0->grpclib->betterproto<2.0.0->merlin-core>=0.2.0->nvtabular==1.6.0+4.gba4c1415) (6.0.1)
Building wheels for collected packages: nvtabular, merlin-dataloader
  Building wheel for nvtabular (pyproject.toml): started
  Building wheel for nvtabular (pyproject.toml): finished with status 'done'
  Created wheel for nvtabular: filename=nvtabular-1.6.0+4.gba4c1415-cp38-cp38-linux_x86_64.whl size=257596 sha256=b7ead6c0b67992173a85b12391177c95517354ac8b48db660a495a9d53d09f29
  Stored in directory: /tmp/pip-ephem-wheel-cache-so5w___a/wheels/c2/16/76/39994bff39d812513de5b5572bff0903b9eb8f6c645b44cedc
  Building wheel for merlin-dataloader (pyproject.toml): started
  Building wheel for merlin-dataloader (pyproject.toml): finished with status 'done'
  Created wheel for merlin-dataloader: filename=merlin_dataloader-0.0.2-py3-none-any.whl size=29203 sha256=ee69b2817eede6db7982df03bbfab555b74a14f0336025326d963e8725450ed8
  Stored in directory: /tmp/pip-ephem-wheel-cache-so5w___a/wheels/76/ef/ed/cb880e3ef5192ec5940e26fd9442247b569fb0cf8602f97137
Successfully built nvtabular merlin-dataloader
Installing collected packages: dask, merlin-dataloader, nvtabular
  Attempting uninstall: dask
    Found existing installation: dask 2022.1.1
    Uninstalling dask-2022.1.1:
      Successfully uninstalled dask-2022.1.1
  Attempting uninstall: nvtabular
    Found existing installation: nvtabular 1.4.0+8.g95e12d347
    Uninstalling nvtabular-1.4.0+8.g95e12d347:
      Successfully uninstalled nvtabular-1.4.0+8.g95e12d347
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
feast 0.19.4 requires dask<2022.02.0,>=2021.*, but you have dask 2022.5.1 which is incompatible.
Successfully installed dask-2022.5.1 merlin-dataloader-0.0.2 nvtabular-1.6.0+4.gba4c1415
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.3, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/transformers4rec_tests/transformers4rec
plugins: anyio-3.6.1, xdist-3.0.2, cov-4.0.0
collected 1 item

tests/unit/test_notebooks.py . [100%]

============================== 1 passed in 37.21s ==============================
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=2 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/Transformers4Rec/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[transformers4rec_tests] $ /bin/bash /tmp/jenkins16476570616447363070.sh

@github-actions
Copy link

@sararb sararb merged commit fbf1ccc into main Nov 18, 2022
sararb added a commit that referenced this pull request Nov 18, 2022
@rnyak rnyak deleted the fix-ci branch November 18, 2022 19:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants