Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: "RuntimeError: shape '[1, 0, 2]' is invalid for input of size 15319040" #53

Open
npovey opened this issue Aug 31, 2023 · 4 comments

Comments

@npovey
Copy link

npovey commented Aug 31, 2023

Error: "RuntimeError: shape '[1, 0, 2]' is invalid for input of size 15319040"

I had an environment that works on libriheavy and I described here.

But I needed to update the torch and torchaudio to a new version using this command

pip install torch==2.0.1+cu117 torchvision==0.15.2+cu117 torchaudio==2.0.2 --extra-index-url https://download.pytorch.org/whl/cu117

as lhotse scripts were not recognizing duration of m4a audio files in stage 1 when creating manifest. After upgrading those libraries I started having this error "RuntimeError: shape '[1, 0, 2]' is invalid for input of size 15319040". The code below started throwing an error in this script:

if [ $stage -le 3 ] && [ $stop_stage -ge 3 ]; then
  # This script loads torchscript models, exported by `torch.jit.script()`,
  # and uses it to decode waves.
  # You can download the jit model from
  # https://huggingface.co/Zengwei/icefall-asr-librispeech-zipformer-2023-05-15

  # We will get librilight_asr_cuts_{subset}.jsonl.gz
  # saved in $output_dir/manifests
  log "Stage 3: Perform speech recognition on splitted chunks"
  for subset in small; do
  #for subset in small medium large; do
    ./tools/recognize.py \
      --world-size $world_size \
      --num-workers 8 \
      --manifest-in $output_dir/manifests/librilight_chunk_cuts_${subset}.jsonl.gz \
      --manifest-out $output_dir/manifests/librilight_asr_cuts_${subset}.jsonl.gz \
      --nn-model-filename exp/exp/jit_script.pt \
      --tokens exp/data/lang_bpe_500/tokens.txt \
      --max-duration 1200 \
      --decoding-method greedy_search \
      --master 12346
  done
fi

Output of the error:

(np_env) np@np-INTEL:/mnt/speech1/anna/text_search/examples/libriheavy$ ./run_small.sh
2023-08-30 21:21:35 (run_small.sh:59:main) Stage 1: Prepare LibriLight manifest
2023-08-30 21:21:36,682 INFO [prepare_manifest.py:230] {'corpus_dir': PosixPath('/mnt/speech1/anna/text_search/examples/libriheavy/download/libri-light'), 'books_dir': PosixPath('/mnt/speech1/anna/text_search/examples/libriheavy/download/librilight_text'), 'output_dir': PosixPath('/mnt/speech1/anna/text_search/examples/libriheavy/data/manifests'), 'num_jobs': 10}
2023-08-30 21:21:36,682 INFO [prepare_manifest.py:196] Preparing LibriLight...
Dataset parts:   0%|                                                                                                                                              | 0/3 [00:00<?, ?it/s]2023-08-30 21:21:36,684 INFO [prepare_manifest.py:205] Processing LibriLight subset: small
Distributing tasks: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2588/2588 [00:00<00:00, 97380.94it/s]
Processing: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2588/2588 [00:15<00:00, 171.84it/s]
Dataset parts:  33%|████████████████████████████████████████████▋                                                                                         
Dataset parts: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:15<00:00,  5.28s/it]
2023-08-30 21:21:52,512 INFO [prepare_manifest.py:239] Done.
2023-08-30 21:21:52 (run_small.sh:70:main) Stage 2: Split long audio into chunks
2023-08-30 21:21:53,409 INFO [split_into_chunks.py:68] {'manifest_in': PosixPath('/mnt/speech1/anna/text_search/examples/libriheavy/data/manifests/librilight_raw_cuts_small.jsonl.gz'), 'manifest_out': PosixPath('/mnt/speech1/anna/text_search/examples/libriheavy/data/manifests/librilight_chunk_cuts_small.jsonl.gz'), 'chunk': 30.0, 'extra': 2.0}
2023-08-30 21:21:53,409 INFO [split_into_chunks.py:79] Processing /mnt/speech1/anna/text_search/examples/libriheavy/data/manifests/librilight_raw_cuts_small.jsonl.gz.
2023-08-30 21:21:58,185 INFO [split_into_chunks.py:93] Cuts saved to /mnt/speech1/anna/text_search/examples/libriheavy/data/manifests/librilight_chunk_cuts_small.jsonl.gz
2023-08-30 21:21:58 (run_small.sh:89:main) Stage 3: Perform speech recognition on splitted chunks
2023-08-30 21:22:04,385 INFO [recognize.py:314] (0/2) Decoding started
2023-08-30 21:22:04,399 INFO [recognize.py:314] (1/2) Decoding started
2023-08-30 21:22:04,405 INFO [recognize.py:324] (1/2) {'subsampling_factor': 4, 'frame_shift_ms': 10, 'world_size': 2, 'master_port': 12346, 'manifest_in': PosixPath('/mnt/speech1/anna/text_search/examples/libriheavy/data/manifests/librilight_chunk_cuts_small.jsonl.gz'), 'manifest_out': PosixPath('/mnt/speech1/anna/text_search/examples/libriheavy/data/manifests/librilight_asr_cuts_small.jsonl.gz'), 'log_dir': PosixPath('logs'), 'nn_model_filename': 'exp/exp/jit_script.pt', 'tokens': 'exp/data/lang_bpe_500/tokens.txt', 'decoding_method': 'greedy_search', 'max_duration': 1200, 'return_cuts': True, 'num_mel_bins': 80, 'num_workers': 8, 'manifest_out_dir': PosixPath('/mnt/speech1/anna/text_search/examples/libriheavy/data/manifests'), 'suffix': '.jsonl.gz', 'cuts_filename': 'librilight_asr_cuts_small', 'blank_id': 0, 'unk_id': 2, 'vocab_size': 500}
2023-08-30 21:22:04,405 INFO [recognize.py:324] (0/2) {'subsampling_factor': 4, 'frame_shift_ms': 10, 'world_size': 2, 'master_port': 12346, 'manifest_in': PosixPath('/mnt/speech1/anna/text_search/examples/libriheavy/data/manifests/librilight_chunk_cuts_small.jsonl.gz'), 'manifest_out': PosixPath('/mnt/speech1/anna/text_search/examples/libriheavy/data/manifests/librilight_asr_cuts_small.jsonl.gz'), 'log_dir': PosixPath('logs'), 'nn_model_filename': 'exp/exp/jit_script.pt', 'tokens': 'exp/data/lang_bpe_500/tokens.txt', 'decoding_method': 'greedy_search', 'max_duration': 1200, 'return_cuts': True, 'num_mel_bins': 80, 'num_workers': 8, 'manifest_out_dir': PosixPath('/mnt/speech1/anna/text_search/examples/libriheavy/data/manifests'), 'suffix': '.jsonl.gz', 'cuts_filename': 'librilight_asr_cuts_small', 'blank_id': 0, 'unk_id': 2, 'vocab_size': 500}
2023-08-30 21:22:05,773 INFO [recognize.py:329] (0/2) device: cuda:0
2023-08-30 21:22:05,773 INFO [recognize.py:331] (0/2) Loading jit model
2023-08-30 21:22:05,773 INFO [recognize.py:329] (1/2) device: cuda:1
2023-08-30 21:22:05,773 INFO [recognize.py:331] (1/2) Loading jit model
2023-08-30 21:22:18,183 INFO [recognize.py:290] (0/2) cuts processed until now is 40
2023-08-30 21:22:23,089 INFO [recognize.py:290] (1/2) cuts processed until now is 40
/home/np/anna/np_env/lib/python3.8/site-packages/torch/nn/modules/module.py:1501: UserWarning: operator() profile_node %626 : int = prim::profile_ivalue(%624)
 does not have profile information (Triggered internally at ../third_party/nvfuser/csrc/graph_fuser.cpp:104.)
  return forward_call(*args, **kwargs)
/home/np/anna/np_env/lib/python3.8/site-packages/torch/nn/modules/module.py:1501: UserWarning: FALLBACK path has been taken inside: compileCudaFusionGroup. This is an indication that codegen Failed for some reason.
To debug try disable codegen fallback path via setting the env variable `export PYTORCH_NVFUSER_DISABLE=fallback`
To report the issue, try enable logging via setting the envvariable ` export PYTORCH_JIT_LOG_LEVEL=manager.cpp`
 (Triggered internally at ../third_party/nvfuser/csrc/manager.cpp:243.)
  return forward_call(*args, **kwargs)
/home/np/anna/np_env/lib/python3.8/site-packages/torch/nn/modules/module.py:1501: UserWarning: FALLBACK path has been taken inside: runCudaFusionGroup. This is an indication that codegen Failed for some reason.
To debug try disable codegen fallback path via setting the env variable `export PYTORCH_NVFUSER_DISABLE=fallback`
 (Triggered internally at ../third_party/nvfuser/csrc/manager.cpp:335.)
  return forward_call(*args, **kwargs)
Traceback (most recent call last):
  File "./tools/recognize.py", line 426, in <module>
    main()
  File "./tools/recognize.py", line 402, in main
    mp.spawn(
  File "/home/np/anna/np_env/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 239, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/home/np/anna/np_env/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 197, in start_processes
    while not context.join():
  File "/home/np/anna/np_env/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 160, in join
    raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException: 

-- Process 0 terminated with the following error:
Traceback (most recent call last):
  File "/home/np/anna/np_env/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
    fn(i, *args)
  File "/home/np/anna/np_env/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/mnt/speech1/anna/text_search/examples/libriheavy/tools/recognize.py", line 353, in run
    decode_dataset(
  File "/mnt/speech1/anna/text_search/examples/libriheavy/tools/recognize.py", line 278, in decode_dataset
    hyps, timestamps, scores = decode_one_batch(
  File "/mnt/speech1/anna/text_search/examples/libriheavy/tools/recognize.py", line 182, in decode_one_batch
    encoder_out, encoder_out_lens = model.encoder(
  File "/home/np/anna/np_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
RuntimeError: shape '[1, 0, 2]' is invalid for input of size 15319040
(np_env) np@np-INTEL:/mnt/speech1/anna/text_search/examples/libriheavy$

my environment that is not working for facebook dataset or my dataset. always failing in step3


(np_env) np@np-INTEL:/mnt/speech1/anna/text_search/examples/libriheavy$ pip list
Package                  Version                               
------------------------ --------------------------------------
absl-py                  1.4.0                                 
audioread                3.0.0                                 
cachetools               5.3.1                                 
certifi                  2023.7.22                             
cffi                     1.15.1                                
charset-normalizer       3.2.0                                 
click                    8.1.7                                 
cmake                    3.27.2                                
cytoolz                  0.12.2                                
dataclasses              0.6                                   
dill                     0.3.7                                 
fasttextsearch           0.6                                   
filelock                 3.12.2                                
google-auth              2.22.0                                
google-auth-oauthlib     1.0.0                                 
graphviz                 0.20.1                                
grpcio                   1.57.0                                
idna                     3.4                                   
importlib-metadata       6.8.0                                 
intervaltree             3.1.0                                 
Jinja2                   3.1.2                                 
k2                       1.24.3.dev20230718+cuda11.7.torch2.0.1
kaldialign               0.7.1                                 
kaldifeat                1.25.0.dev20230726+cuda11.7.torch2.0.1
kaldifst                 1.6                                   
kaldilm                  1.15                                  
lhotse                   1.16.0.dev0+git.2046b55d.clean        
lilcom                   1.7                                   
lit                      16.0.6                                
Markdown                 3.4.4                                 
MarkupSafe               2.1.3                                 
mpmath                   1.3.0                                 
networkx                 3.1                                   
numpy                    1.24.4                                
nvidia-cublas-cu11       11.10.3.66                            
nvidia-cuda-cupti-cu11   11.7.101                              
nvidia-cuda-nvrtc-cu11   11.7.99                               
nvidia-cuda-runtime-cu11 11.7.99                               
nvidia-cudnn-cu11        8.5.0.96                              
nvidia-cufft-cu11        10.9.0.58                             
nvidia-curand-cu11       10.2.10.91                            
nvidia-cusolver-cu11     11.4.0.1                              
nvidia-cusparse-cu11     11.7.4.91                             
nvidia-nccl-cu11         2.14.3                                
nvidia-nvtx-cu11         11.7.91                               
oauthlib                 3.2.2                                 
packaging                23.1                                  
Pillow                   10.0.0                                
pip                      20.0.2                                
pkg-resources            0.0.0                                 
protobuf                 4.24.1                                
pyasn1                   0.5.0                                 
pyasn1-modules           0.3.0                                 
pycipher                 0.5.2                                 
pycparser                2.21                                  
PyYAML                   6.0.1                                 
regex                    2023.8.8                              
requests                 2.31.0                                
requests-oauthlib        1.3.1                                 
rsa                      4.9                                   
sentencepiece            0.1.99                                
setuptools               44.0.0                                
six                      1.16.0                                
sortedcontainers         2.4.0                                 
soundfile                0.12.1                                
sympy                    1.12                                  
tabulate                 0.9.0                                 
tensorboard              2.14.0                                
tensorboard-data-server  0.7.1                                 
termcolor                2.3.0                                 
toolz                    0.12.0                                
torch                    2.0.1+cu117                           
torchaudio               2.0.2+cu117                           
torchvision              0.15.2+cu117                          
tqdm                     4.66.1                                
triton                   2.0.0                                 
typeguard                4.1.2                                 
typing-extensions        4.7.1                                 
urllib3                  2.0.4                                 
werkzeug                 2.3.7                                 
wheel                    0.41.1                                
zipp                     3.16.2                                
(np_env) np@np-INTEL:/mnt/speech1/anna/text_search/examples/libriheavy$

environment that works but duration for m4a audios was wrong in Lhotse scripts for my dataset, but works well with facebook dataset

np@np-INTEL:/mnt/speech3$ source venv_ts/bin/activate
(test-textsearch) np@np-INTEL:/mnt/speech3$ pip list
Package                 Version
----------------------- ---------------------------------------
absl-py                 1.4.0
appdirs                 1.4.3
apturl                  0.5.2
attrs                   19.3.0
audioread               3.0.0
Automat                 0.8.0
backcall                0.1.0
bcrypt                  3.1.7
beautifulsoup4          4.8.2
bleach                  6.0.0
blinker                 1.4
boto                    2.49.0
Brlapi                  0.7.0
cachetools              5.3.1
certifi                 2019.11.28
cffi                    1.15.1
chardet                 3.0.4
click                   8.1.6
colorama                0.4.3
command-not-found       0.3
constantly              15.1.0
cryptography            2.8
cssselect               1.1.0
cupshelpers             1.0
cytoolz                 0.12.1
dataclasses             0.6
dbus-python             1.2.16
decorator               4.4.2
defer                   1.0.6
dill                    0.3.7
distlib                 0.3.0
distro                  1.4.0
distro-info             0.23ubuntu1
duplicity               0.8.12.0
entrypoints             0.3
fasteners               0.14.1
fasttextsearch          0.6
filelock                3.0.12
future                  0.18.2
google-auth             2.22.0
google-auth-oauthlib    1.0.0
graphviz                0.20.1
grpcio                  1.56.2
html5lib                1.0.1
httplib2                0.14.0
hyperlink               19.0.0
idna                    2.8
importlib-metadata      6.8.0
incremental             16.10.1
intervaltree            3.1.0
ipython                 7.13.0
ipython_genutils        0.2.0
jedi                    0.15.2
k2                      1.24.3.dev20230725+cuda11.3.torch1.12.1
kaggle                  1.5.16
kaldialign              0.7.1
kaldifeat               1.25.0.dev20230726+cuda11.3.torch1.12.1
kaldifst                1.6
kaldilm                 1.15
keyring                 18.0.1
language-selector       0.1
launchpadlib            1.10.13
lazr.restfulclient      0.14.2
lazr.uri                1.0.3
lhotse                  1.16.0
lilcom                  1.7
lockfile                0.12.2
louis                   3.12.0
lxml                    4.5.0
macaroonbakery          1.3.1
Mako                    1.1.0
Markdown                3.4.3
MarkupSafe              2.1.3
monotonic               1.5
more-itertools          4.2.0
mysqlclient             1.4.4
netifaces               0.10.4
numpy                   1.24.4
oauthlib                3.1.0
olefile                 0.46
packaging               23.1
paramiko                2.6.0
parsel                  1.5.2
parso                   0.5.2
pexpect                 4.6.0
pickleshare             0.7.5
Pillow                  7.0.0
pip                     23.2
pipenv                  11.9.0
prompt-toolkit          2.0.10
protobuf                4.23.4
pyasn1                  0.4.2
pyasn1-modules          0.2.1
pycairo                 1.16.2
pycparser               2.21
pycups                  1.9.73
PyDispatcher            2.0.5
Pygments                2.3.1
PyGObject               3.36.0
PyHamcrest              1.9.0
PyJWT                   1.7.1
pymacaroons             0.13.0
PyNaCl                  1.3.0
pyOpenSSL               19.0.0
pyRFC3339               1.1
python-apt              2.0.0+ubuntu0.20.4.8
python-dateutil         2.7.3
python-debian           0.1.36ubuntu1
python-slugify          8.0.1
pytz                    2019.3
pyxdg                   0.26
PyYAML                  5.3.1
queuelib                1.5.0
regex                   2023.6.3
reportlab               3.5.34
requests                2.22.0
requests-oauthlib       1.3.1
requests-unixsocket     0.2.0
rsa                     4.9
Scrapy                  1.7.3
SecretStorage           2.3.1
sentencepiece           0.1.99
service-identity        18.1.0
setuptools              45.2.0
simplejson              3.16.0
six                     1.14.0
sortedcontainers        2.4.0
soundfile               0.12.1
soupsieve               1.9.5
ssh-import-id           5.10
systemd-python          234
tabulate                0.9.0
tensorboard             2.13.0
tensorboard-data-server 0.7.1
text-unidecode          1.3
toolz                   0.12.0
torch                   1.12.1+cu113
torchaudio              0.12.1+cu113
torchvision             0.13.1+cu113
tqdm                    4.65.0
traitlets               4.3.3
Twisted                 18.9.0
typeguard               4.0.0
typing_extensions       4.7.1
ubuntu-advantage-tools  27.9
ubuntu-drivers-common   0.0.0
ufw                     0.36
unattended-upgrades     0.1
urllib3                 1.25.8
usb-creator             0.3.7
virtualenv              20.0.17
virtualenv-clone        0.3.0
w3lib                   1.21.0
wadllib                 1.3.3
wcwidth                 0.1.8
webencodings            0.5.1
Werkzeug                2.3.6
wheel                   0.34.2
xkit                    0.0.0
youtube-dl              2021.12.17
zipp                    1.0.0
zope.interface          4.7.1

[notice] A new release of pip is available: 23.2 -> 23.2.1
[notice] To update, run: python3 -m pip install --upgrade pip
(test-textsearch) np@np-INTEL:/mnt/speech3$
@pkufool
Copy link
Collaborator

pkufool commented Aug 31, 2023

I think I can reproduce this issue with your envs.

Here is my logs:

python tools/recognize.py --world-s
ize 1 --manifest-in data/manifests/librilight_chunk_cuts_small.jsonl.gz --manifest-out librilight_cuts_test2.jsonl.gz --nn-model-filename exp/exp/jit_script.pt
 --tokens exp/data/lang_bpe_500/tokens.txt                                                                                                                     
2023-08-31 15:27:31,194 INFO [recognize.py:323] Decoding started                                                                                               
2023-08-31 15:27:31,197 INFO [recognize.py:336] {'subsampling_factor': 4, 'frame_shift_ms': 10, 'beam_size': 4, 'world_size': 1, 'master_port': 12354, 'manifes
t_in': PosixPath('data/manifests/librilight_chunk_cuts_small.jsonl.gz'), 'manifest_out': PosixPath('librilight_cuts_test2.jsonl.gz'), 'log_dir': PosixPath('log
s'), 'nn_model_filename': 'exp/exp/jit_script.pt', 'tokens': 'exp/data/lang_bpe_500/tokens.txt', 'decoding_method': 'greedy_search', 'max_duration': 600.0, 're
turn_cuts': True, 'num_mel_bins': 80, 'num_workers': 8, 'manifest_out_dir': PosixPath('.'), 'suffix': '.jsonl.gz', 'cuts_filename': 'librilight_cuts_test2', 'b
lank_id': 0, 'unk_id': 2, 'vocab_size': 500}                                                                                                                   
2023-08-31 15:27:31,268 INFO [recognize.py:341] device: cuda:0                                                                                                 
2023-08-31 15:27:31,269 INFO [recognize.py:343] Loading jit model                                                                                              
2023-08-31 15:27:44,860 INFO [recognize.py:299] cuts processed until now is 20                                                                                 
/star-kw/kangwei/dev_tools/anaconda/envs/textsearch/lib/python3.10/site-packages/torch/nn/modules/module.py:1501: UserWarning: operator() profile_node %626 : int = prim::profile_ivalue(%624)
 does not have profile information (Triggered internally at ../third_party/nvfuser/csrc/graph_fuser.cpp:104.)
  return forward_call(*args, **kwargs)
/star-kw/kangwei/dev_tools/anaconda/envs/textsearch/lib/python3.10/site-packages/torch/nn/modules/module.py:1501: UserWarning: FALLBACK path has been taken in$ide: compileCudaFusionGroup. This is an indication that codegen Failed for some reason.
To debug try disable codegen fallback path via setting the env variable `export PYTORCH_NVFUSER_DISABLE=fallback`
To report the issue, try enable logging via setting the envvariable ` export PYTORCH_JIT_LOG_LEVEL=manager.cpp`
 (Triggered internally at ../third_party/nvfuser/csrc/manager.cpp:243.)
  return forward_call(*args, **kwargs)
/star-kw/kangwei/dev_tools/anaconda/envs/textsearch/lib/python3.10/site-packages/torch/nn/modules/module.py:1501: UserWarning: FALLBACK path has been taken inside: runCudaFusionGroup. This is an indication that codegen Failed for some reason.
To debug try disable codegen fallback path via setting the env variable `export PYTORCH_NVFUSER_DISABLE=fallback`
 (Triggered internally at ../third_party/nvfuser/csrc/manager.cpp:335.)
  return forward_call(*args, **kwargs)
Traceback (most recent call last):
  File "/star-kw/kangwei/code/text_search/examples/libriheavy/tools/recognize.py", line 433, in <module>
    main()
  File "/star-kw/kangwei/code/text_search/examples/libriheavy/tools/recognize.py", line 426, in main
    run(rank=0, world_size=world_size, args=args, in_cuts=in_cuts)
  File "/star-kw/kangwei/dev_tools/anaconda/envs/textsearch/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)                                                
  File "/star-kw/kangwei/code/text_search/examples/libriheavy/tools/recognize.py", line 365, in run
    decode_dataset(                     
  File "/star-kw/kangwei/code/text_search/examples/libriheavy/tools/recognize.py", line 287, in decode_dataset
    hyps, timestamps, scores = decode_one_batch(                                  File "/star-kw/kangwei/code/text_search/examples/libriheavy/tools/recognize.py", line 184, in decode_one_batch
    encoder_out, encoder_out_lens = model.encoder(                              
  File "/star-kw/kangwei/dev_tools/anaconda/envs/textsearch/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):                               
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):                               
RuntimeError: shape '[1, 0, 2]' is invalid for input of size 7659520

When I set the export PYTORCH_NVFUSER_DISABLE=fallback the logs became:

python tools/recognize.py --world-size 1 --manifest-in data/manifests/librilight_chunk_cuts_small.jsonl.gz --manifest-out librilight_cuts_test2.jsonl.gz --nn-model-filename exp/exp/jit_script.pt --tokens exp/data/lang_bpe_500/tokens.txt 
2023-08-31 15:33:15,659 INFO [recognize.py:323] Decoding started
2023-08-31 15:33:15,663 INFO [recognize.py:336] {'subsampling_factor': 4, 'frame_shift_ms': 10, 'beam_size': 4, 'world_size': 1, 'master_port': 12354, 'manifest_in': PosixPath('data/manifests/librilight_chunk_cuts_small.jsonl.gz'), 'manifest_out': PosixPath('librilight_cuts_test2.jsonl.gz'), 'log_dir': PosixPath('logs'), 'nn_model_filename': 'exp/exp/jit_script.pt', 'tokens': 'exp/data/lang_bpe_500/tokens.txt', 'decoding_method': 'greedy_search', 'max_duration': 600.0, 'return_cuts': True, 'num_mel_bins': 80, 'num_workers': 8, 'manifest_out_dir': PosixPath('.'), 'suffix': '.jsonl.gz', 'cuts_filename': 'librilight_cuts_test2', 'blank_id': 0, 'unk_id': 2, 'vocab_size': 500}
2023-08-31 15:33:15,737 INFO [recognize.py:341] device: cuda:0
2023-08-31 15:33:15,737 INFO [recognize.py:343] Loading jit model
2023-08-31 15:33:29,322 INFO [recognize.py:299] cuts processed until now is 20
/star-kw/kangwei/dev_tools/anaconda/envs/textsearch/lib/python3.10/site-packages/torch/nn/modules/module.py:1501: UserWarning: operator() profile_node %626 : int = prim::profile_ivalue(%624)
 does not have profile information (Triggered internally at ../third_party/nvfuser/csrc/graph_fuser.cpp:104.)
  return forward_call(*args, **kwargs)
Traceback (most recent call last):
  File "/star-kw/kangwei/code/text_search/examples/libriheavy/tools/recognize.py", line 433, in <module>
    main()
  File "/star-kw/kangwei/code/text_search/examples/libriheavy/tools/recognize.py", line 426, in main
    run(rank=0, world_size=world_size, args=args, in_cuts=in_cuts)
  File "/star-kw/kangwei/dev_tools/anaconda/envs/textsearch/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/star-kw/kangwei/code/text_search/examples/libriheavy/tools/recognize.py", line 365, in run
    decode_dataset(
  File "/star-kw/kangwei/code/text_search/examples/libriheavy/tools/recognize.py", line 287, in decode_dataset
    hyps, timestamps, scores = decode_one_batch(
  File "/star-kw/kangwei/code/text_search/examples/libriheavy/tools/recognize.py", line 184, in decode_one_batch
    encoder_out, encoder_out_lens = model.encoder(
  File "/star-kw/kangwei/dev_tools/anaconda/envs/textsearch/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
RuntimeError: dims.value().size() == self->getMaybeRFactorDomain().size() INTERNAL ASSERT FAILED at "../third_party/nvfuser/csrc/parser.cpp":3399, please report a bug to PyTorch. 

I think it is a bug in the new version of pytorch.

@danpovey
Copy link
Collaborator

danpovey commented Sep 1, 2023

maybe we can some figure out what op it was doing, to work around it? it's a shame if we can't inference our models in pytorch 2.0.1.

@npovey
Copy link
Author

npovey commented Sep 1, 2023

Not sure if it is relevant I use the same env to train icefall models. I was able to use an Icefall recipe and train a model for 150 epochs using my pytorch 2.0.1. env that is given on top. I am getting an error only when using k2fsa:text_search
https://github.com/k2-fsa/text_search/blob/master/examples/libriheavy/run.sh script. stage 3 as described above.

@pkufool
Copy link
Collaborator

pkufool commented Sep 1, 2023

maybe we can some figure out what op it was doing, to work around it? it's a shame if we can't inference our models in pytorch 2.0.1.

I can't see any stacks, but I think we can first try exporting the model with pythorch 2.0.1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants