CUDA out of memory #216

shanguanma · 2021-06-21T03:39:08Z

I try to use the new snowfall and k2-fsa(0.3.5) to Reproduce your recipe(Librispeech) results, I use the below script:

$cuda_cmd log/stage6_train.log\
   CUDA_VISIBLE_DEVICES="4" python3 ./mmi_att_transformer_train.py \
                                       --world-size 1\
                                       --full-libri false\
                                       --use-ali-model false \
                                       --num-workers-train 1\
                                       --num-workers-valid 1
$decode_cmd log/stage7_decode.log\
  CUDA_VISIBLE_DEVICES="4" python3 ./mmi_att_transformer_decode.py

Get result is as follows:

2021-06-15 11:40:13,293 INFO [common.py:398] [test-clean] %WER 5.78% [3037 / 52576, 571 ins, 181 del, 2285 sub ]
2021-06-15 11:49:09,503 INFO [common.py:398] [test-other] %WER 15.14% [7925 / 52343, 1258 ins, 542 del, 6125 sub ]

environment is summary as follows:

[md510@node02 simple_v1]$  python3 -m k2.version
Collecting environment information...

k2 version: 0.3.5
Build type: Release
Git SHA1: 81ad3a580361e20b828d5eb1120999ecd0d7c675
Git date: Sat Jun 5 11:36:50 2021
Cuda used to build k2: 10.2
cuDNN used to build k2: 8.0.2
Python version used to build k2: 3.8
OS used to build k2: Ubuntu 16.04.7 LTS
CMake version: 3.18.4
GCC version: 5.5.0
CMAKE_CUDA_FLAGS:  --expt-extended-lambda -gencode arch=compute_35,code=sm_35 --expt-extended-lambda -gencode arch=compute_50,code=sm_50 --expt-extended-lambda -gencode arch=compute_60,code=sm_60 --expt-extended-lambda -gencode arch=compute_61,code=sm_61 --expt-extended-lambda -gencode arch=compute_70,code=sm_70 --expt-extended-lambda -gencode arch=compute_75,code=sm_75 -D_GLIBCXX_USE_CXX11_ABI=0 --compiler-options -Wall --compiler-options -Wno-unknown-pragmas --compiler-options -Wno-strict-overflow
CMAKE_CXX_FLAGS:  -D_GLIBCXX_USE_CXX11_ABI=0 -Wno-strict-overflow
PyTorch version used to build k2: 1.8.1
PyTorch is using Cuda: 10.2
NVTX enabled: True
With CUDA: True
Disable debug: True
Sync kernels : False
Disable checks: False

Now I use other corpus(e.g. seame), at train acoustic model, The program keeps prompting CUDA out of memory
Note: GPU is RTX8000（48G per GPU）, my running code is as follows:

$cuda_cmd log/stage5_train.log\
CUDA_VISIBLE_DEVICES="2,3,4" python3 ./mmi_att_transformer_train_seame.py \
                                    --world-size 3\
                                    --use-ali-model false \
                                   --num-workers-train 1\
                                   --num-workers-valid 1

error log is as follows:

# CUDA_VISIBLE_DEVICES=2,3,4 python3 ./mmi_att_transformer_train_seame.py --world-size 3 --use-ali-model false --num-workers-train 1 --num-workers-valid 1 
# Invoked at Mon Jun 21 11:13:10 SGT 2021 from node03
#
# Started at Mon Jun 21 11:14:08 +08 2021 on node02
Traceback (most recent call last):
  File "./mmi_att_transformer_train_seame.py", line 724, in <module>
    main()
  File "./mmi_att_transformer_train_seame.py", line 717, in main
    mp.spawn(run, args=(world_size, args), nprocs=world_size, join=True)
  File "/home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes
    while not context.join():
  File "/home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 150, in join
    raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException: 

-- Process 0 terminated with the following error:
Traceback (most recent call last):
  File "/home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
    fn(i, *args)
  File "/home3/md510/w2020/k2_fsa_2021/snowfall/egs/seame/asr/simple_v1/mmi_att_transformer_train_seame.py", line 630, in run
    objf, valid_objf, global_batch_idx_train = train_one_epoch(
  File "/home3/md510/w2020/k2_fsa_2021/snowfall/egs/seame/asr/simple_v1/mmi_att_transformer_train_seame.py", line 257, in train_one_epoch
    curr_batch_objf, curr_batch_frames, curr_batch_all_frames = get_objf(
  File "/home3/md510/w2020/k2_fsa_2021/snowfall/egs/seame/asr/simple_v1/mmi_att_transformer_train_seame.py", line 113, in get_objf
    mmi_loss, tot_frames, all_frames = loss_fn(nnet_output, texts, supervision_segments)
  File "/home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home3/md510/w2020/k2_fsa_2021/snowfall/snowfall/objectives/mmi.py", line 222, in forward
    return func(nnet_output=nnet_output,
  File "/home3/md510/w2020/k2_fsa_2021/snowfall/snowfall/objectives/mmi.py", line 97, in _compute_mmi_loss_exact_optimized
    num_den_tot_scores = num_den_lats.get_tot_scores(log_semiring=True,
  File "/home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/k2/fsa.py", line 644, in get_tot_scores
    tot_scores = k2.autograd._GetTotScoresFunction.apply(
  File "/home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/k2/autograd.py", line 49, in forward
    tot_scores = fsas._get_tot_scores(use_double_scores=use_double_scores,
  File "/home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/k2/fsa.py", line 623, in _get_tot_scores
    forward_scores = self._get_forward_scores(use_double_scores,
  File "/home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/k2/fsa.py", line 573, in _get_forward_scores
    entering_arc_batches=self._get_entering_arc_batches(),
  File "/home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/k2/fsa.py", line 513, in _get_entering_arc_batches
    incoming_arcs=self._get_incoming_arcs(),
  File "/home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/k2/fsa.py", line 499, in _get_incoming_arcs
    cache[name] = _k2.get_incoming_arcs(self.arcs,
RuntimeError: CUDA out of memory. Tried to allocate 17179869182.18 GiB (GPU 0; 44.49 GiB total capacity; 31.00 GiB already allocated; 7.62 GiB free; 35.77 GiB reserved in total by PyTorch)
Exception raised from malloc at /opt/conda/conda-bld/pytorch_1616554788289/work/c10/cuda/CUDACachingAllocator.cpp:288 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x2aab147e12f2 in /home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x1bc21 (0x2aab1457dc21 in /home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
frame #2: <unknown function> + 0x1c944 (0x2aab1457e944 in /home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
frame #3: <unknown function> + 0x1cf63 (0x2aab1457ef63 in /home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
frame #4: k2::PytorchCudaContext::Allocate(unsigned long, void**) + 0x5e (0x2aab2fe7aade in /home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/libk2context.so)
frame #5: k2::NewRegion(std::shared_ptr<k2::Context>, unsigned long) + 0x11e (0x2aab2fbd876e in /home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/libk2context.so)
frame #6: <unknown function> + 0x23a61d (0x2aab2fd4661d in /home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/libk2context.so)
frame #7: k2::GetTransposeReordering(k2::Ragged<int>&, int) + 0x2ff (0x2aab2fd641ff in /home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/libk2context.so)
frame #8: k2::GetIncomingArcs(k2::Ragged<k2::Arc>&, k2::Array1<int> const&) + 0x11a (0x2aab2fc4407a in /home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/libk2context.so)
frame #9: <unknown function> + 0x444ed (0x2aab2eb634ed in /home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/_k2.cpython-38-x86_64-linux-gnu.so)
frame #10: <unknown function> + 0x1bd5f (0x2aab2eb3ad5f in /home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/_k2.cpython-38-x86_64-linux-gnu.so)
frame #11: PyCFunction_Call + 0x54 (0x55555567fdf4 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #12: _PyObject_MakeTpCall + 0x31e (0x55555568ef2e in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #13: _PyEval_EvalFrameDefault + 0x534b (0x555555728f6b in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #14: <unknown function> + 0x1b1e86 (0x555555705e86 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #15: _PyEval_EvalFrameDefault + 0x4ca3 (0x5555557288c3 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #16: <unknown function> + 0x1b1e86 (0x555555705e86 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #17: _PyEval_EvalFrameDefault + 0x4ca3 (0x5555557288c3 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #18: <unknown function> + 0x1b1e86 (0x555555705e86 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #19: _PyEval_EvalFrameDefault + 0x4ca3 (0x5555557288c3 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #20: _PyEval_EvalCodeWithName + 0x2c3 (0x555555704503 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #21: <unknown function> + 0x1b2007 (0x555555706007 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #22: _PyEval_EvalFrameDefault + 0x1782 (0x5555557253a2 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #23: _PyFunction_Vectorcall + 0x1a6 (0x555555705706 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #24: PyObject_CallObject + 0x53 (0x55555570dd93 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #25: THPFunction_apply(_object*, _object*) + 0x8fd (0x2aaac76a83fd in /home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #26: PyCFunction_Call + 0xf9 (0x55555567fe99 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #27: _PyObject_MakeTpCall + 0x31e (0x55555568ef2e in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #28: _PyEval_EvalFrameDefault + 0x534b (0x555555728f6b in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #29: _PyEval_EvalCodeWithName + 0x2c3 (0x555555704503 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #30: <unknown function> + 0x1b2007 (0x555555706007 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #31: _PyEval_EvalFrameDefault + 0x1782 (0x5555557253a2 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #32: _PyEval_EvalCodeWithName + 0x2c3 (0x555555704503 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #33: _PyFunction_Vectorcall + 0x378 (0x5555557058d8 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #34: _PyEval_EvalFrameDefault + 0x1782 (0x5555557253a2 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #35: _PyFunction_Vectorcall + 0x1a6 (0x555555705706 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #36: <unknown function> + 0x1b1f91 (0x555555705f91 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #37: PyObject_Call + 0x5e (0x5555556790be in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #38: _PyEval_EvalFrameDefault + 0x21c1 (0x555555725de1 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #39: _PyEval_EvalCodeWithName + 0x2c3 (0x555555704503 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #40: _PyObject_FastCallDict + 0x2c1 (0x555555673df1 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #41: _PyObject_Call_Prepend + 0x63 (0x55555567e983 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #42: <unknown function> + 0x181b99 (0x5555556d5b99 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #43: _PyObject_MakeTpCall + 0x31e (0x55555568ef2e in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #44: _PyEval_EvalFrameDefault + 0x4f2e (0x555555728b4e in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #45: _PyEval_EvalCodeWithName + 0x2c3 (0x555555704503 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #46: _PyFunction_Vectorcall + 0x378 (0x5555557058d8 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #47: _PyEval_EvalFrameDefault + 0x1782 (0x5555557253a2 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #48: _PyEval_EvalCodeWithName + 0x2c3 (0x555555704503 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #49: _PyFunction_Vectorcall + 0x378 (0x5555557058d8 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #50: _PyEval_EvalFrameDefault + 0x1782 (0x5555557253a2 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #51: _PyFunction_Vectorcall + 0x1a6 (0x555555705706 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #52: PyObject_Call + 0x5e (0x5555556790be in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #53: _PyEval_EvalFrameDefault + 0x21c1 (0x555555725de1 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #54: _PyFunction_Vectorcall + 0x1a6 (0x555555705706 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #55: PyObject_Call + 0x5e (0x5555556790be in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #56: _PyEval_EvalFrameDefault + 0x21c1 (0x555555725de1 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #57: _PyFunction_Vectorcall + 0x1a6 (0x555555705706 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #58: _PyEval_EvalFrameDefault + 0xa4b (0x55555572466b in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #59: _PyEval_EvalCodeWithName + 0x2c3 (0x555555704503 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #60: _PyFunction_Vectorcall + 0x378 (0x5555557058d8 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #61: _PyEval_EvalFrameDefault + 0xa4b (0x55555572466b in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #62: _PyFunction_Vectorcall + 0x1a6 (0x555555705706 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #63: _PyEval_EvalFrameDefault + 0x92f (0x55555572454f in /home3/md510/anaconda3/envs/foo_k2/bin/python3)


# Ended (code 256) at Mon Jun 21 11:19:00 SGT 2021, elapsed time 350 seconds

I don't know where is wrong? Thanks a lot.

The text was updated successfully, but these errors were encountered:

danpovey · 2021-06-21T04:39:07Z

You can mess with the minibatch size, which might help. But finding the source is a good idea too. Are you using an alignment model? (If not, the posteriors at the start can be very flat, which can cause too many states to stay within the pruning beam). What is the size of the phone set?

…

On Mon, Jun 21, 2021 at 11:39 AM shanguanma ***@***.***> wrote: I try to use the new snowfall and k2-fsa(0.3.5) to Reproduce your recipe(Librispeech) results, I use the below script: $cuda_cmd log/stage6_train.log\ CUDA_VISIBLE_DEVICES="4" python3 ./mmi_att_transformer_train.py \ --world-size 1\ --full-libri false\ --use-ali-model false \ --num-workers-train 1\ --num-workers-valid 1 $decode_cmd log/stage7_decode.log\ CUDA_VISIBLE_DEVICES="4" python3 ./mmi_att_transformer_decode.py Get result is as follows: 2021-06-15 11:40:13,293 INFO [common.py:398] [test-clean] %WER 5.78% [3037 / 52576, 571 ins, 181 del, 2285 sub ] 2021-06-15 11:49:09,503 INFO [common.py:398] [test-other] %WER 15.14% [7925 / 52343, 1258 ins, 542 del, 6125 sub ] environment is summary as follows: ***@***.*** simple_v1]$ python3 -m k2.version Collecting environment information... k2 version: 0.3.5 Build type: Release Git SHA1: 81ad3a580361e20b828d5eb1120999ecd0d7c675 Git date: Sat Jun 5 11:36:50 2021 Cuda used to build k2: 10.2 cuDNN used to build k2: 8.0.2 Python version used to build k2: 3.8 OS used to build k2: Ubuntu 16.04.7 LTS CMake version: 3.18.4 GCC version: 5.5.0 CMAKE_CUDA_FLAGS: --expt-extended-lambda -gencode arch=compute_35,code=sm_35 --expt-extended-lambda -gencode arch=compute_50,code=sm_50 --expt-extended-lambda -gencode arch=compute_60,code=sm_60 --expt-extended-lambda -gencode arch=compute_61,code=sm_61 --expt-extended-lambda -gencode arch=compute_70,code=sm_70 --expt-extended-lambda -gencode arch=compute_75,code=sm_75 -D_GLIBCXX_USE_CXX11_ABI=0 --compiler-options -Wall --compiler-options -Wno-unknown-pragmas --compiler-options -Wno-strict-overflow CMAKE_CXX_FLAGS: -D_GLIBCXX_USE_CXX11_ABI=0 -Wno-strict-overflow PyTorch version used to build k2: 1.8.1 PyTorch is using Cuda: 10.2 NVTX enabled: True With CUDA: True Disable debug: True Sync kernels : False Disable checks: False Now I use other corpus(e.g. seame), at train acoustic model, The program keeps prompting CUDA out of memory Note: GPU is RTX8000（48G per GPU）, my running code is as follows: $cuda_cmd log/stage5_train.log\ CUDA_VISIBLE_DEVICES="2,3,4" python3 ./mmi_att_transformer_train_seame.py \ --world-size 3\ --use-ali-model false \ --num-workers-train 1\ --num-workers-valid 1 error log is as follows: # CUDA_VISIBLE_DEVICES=2,3,4 python3 ./mmi_att_transformer_train_seame.py --world-size 3 --use-ali-model false --num-workers-train 1 --num-workers-valid 1 # Invoked at Mon Jun 21 11:13:10 SGT 2021 from node03 # # Started at Mon Jun 21 11:14:08 +08 2021 on node02 Traceback (most recent call last): File "./mmi_att_transformer_train_seame.py", line 724, in <module> main() File "./mmi_att_transformer_train_seame.py", line 717, in main mp.spawn(run, args=(world_size, args), nprocs=world_size, join=True) File "/home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method='spawn') File "/home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes while not context.join(): File "/home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 150, in join raise ProcessRaisedException(msg, error_index, failed_process.pid) torch.multiprocessing.spawn.ProcessRaisedException: -- Process 0 terminated with the following error: Traceback (most recent call last): File "/home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap fn(i, *args) File "/home3/md510/w2020/k2_fsa_2021/snowfall/egs/seame/asr/simple_v1/mmi_att_transformer_train_seame.py", line 630, in run objf, valid_objf, global_batch_idx_train = train_one_epoch( File "/home3/md510/w2020/k2_fsa_2021/snowfall/egs/seame/asr/simple_v1/mmi_att_transformer_train_seame.py", line 257, in train_one_epoch curr_batch_objf, curr_batch_frames, curr_batch_all_frames = get_objf( File "/home3/md510/w2020/k2_fsa_2021/snowfall/egs/seame/asr/simple_v1/mmi_att_transformer_train_seame.py", line 113, in get_objf mmi_loss, tot_frames, all_frames = loss_fn(nnet_output, texts, supervision_segments) File "/home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "/home3/md510/w2020/k2_fsa_2021/snowfall/snowfall/objectives/mmi.py", line 222, in forward return func(nnet_output=nnet_output, File "/home3/md510/w2020/k2_fsa_2021/snowfall/snowfall/objectives/mmi.py", line 97, in _compute_mmi_loss_exact_optimized num_den_tot_scores = num_den_lats.get_tot_scores(log_semiring=True, File "/home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/k2/fsa.py", line 644, in get_tot_scores tot_scores = k2.autograd._GetTotScoresFunction.apply( File "/home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/k2/autograd.py", line 49, in forward tot_scores = fsas._get_tot_scores(use_double_scores=use_double_scores, File "/home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/k2/fsa.py", line 623, in _get_tot_scores forward_scores = self._get_forward_scores(use_double_scores, File "/home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/k2/fsa.py", line 573, in _get_forward_scores entering_arc_batches=self._get_entering_arc_batches(), File "/home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/k2/fsa.py", line 513, in _get_entering_arc_batches incoming_arcs=self._get_incoming_arcs(), File "/home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/k2/fsa.py", line 499, in _get_incoming_arcs cache[name] = _k2.get_incoming_arcs(self.arcs, RuntimeError: CUDA out of memory. Tried to allocate 17179869182.18 GiB (GPU 0; 44.49 GiB total capacity; 31.00 GiB already allocated; 7.62 GiB free; 35.77 GiB reserved in total by PyTorch) Exception raised from malloc at /opt/conda/conda-bld/pytorch_1616554788289/work/c10/cuda/CUDACachingAllocator.cpp:288 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x2aab147e12f2 in /home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/torch/lib/libc10.so) frame #1: <unknown function> + 0x1bc21 (0x2aab1457dc21 in /home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/torch/lib/libc10_cuda.so) frame #2: <unknown function> + 0x1c944 (0x2aab1457e944 in /home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/torch/lib/libc10_cuda.so) frame #3: <unknown function> + 0x1cf63 (0x2aab1457ef63 in /home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/torch/lib/libc10_cuda.so) frame #4: k2::PytorchCudaContext::Allocate(unsigned long, void**) + 0x5e (0x2aab2fe7aade in /home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/libk2context.so) frame #5: k2::NewRegion(std::shared_ptr<k2::Context>, unsigned long) + 0x11e (0x2aab2fbd876e in /home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/libk2context.so) frame #6: <unknown function> + 0x23a61d (0x2aab2fd4661d in /home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/libk2context.so) frame #7: k2::GetTransposeReordering(k2::Ragged<int>&, int) + 0x2ff (0x2aab2fd641ff in /home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/libk2context.so) frame #8: k2::GetIncomingArcs(k2::Ragged<k2::Arc>&, k2::Array1<int> const&) + 0x11a (0x2aab2fc4407a in /home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/libk2context.so) frame #9: <unknown function> + 0x444ed (0x2aab2eb634ed in /home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/_k2.cpython-38-x86_64-linux-gnu.so) frame #10: <unknown function> + 0x1bd5f (0x2aab2eb3ad5f in /home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/_k2.cpython-38-x86_64-linux-gnu.so) frame #11: PyCFunction_Call + 0x54 (0x55555567fdf4 in /home3/md510/anaconda3/envs/foo_k2/bin/python3) frame #12: _PyObject_MakeTpCall + 0x31e (0x55555568ef2e in /home3/md510/anaconda3/envs/foo_k2/bin/python3) frame #13: _PyEval_EvalFrameDefault + 0x534b (0x555555728f6b in /home3/md510/anaconda3/envs/foo_k2/bin/python3) frame #14: <unknown function> + 0x1b1e86 (0x555555705e86 in /home3/md510/anaconda3/envs/foo_k2/bin/python3) frame #15: _PyEval_EvalFrameDefault + 0x4ca3 (0x5555557288c3 in /home3/md510/anaconda3/envs/foo_k2/bin/python3) frame #16: <unknown function> + 0x1b1e86 (0x555555705e86 in /home3/md510/anaconda3/envs/foo_k2/bin/python3) frame #17: _PyEval_EvalFrameDefault + 0x4ca3 (0x5555557288c3 in /home3/md510/anaconda3/envs/foo_k2/bin/python3) frame #18: <unknown function> + 0x1b1e86 (0x555555705e86 in /home3/md510/anaconda3/envs/foo_k2/bin/python3) frame #19: _PyEval_EvalFrameDefault + 0x4ca3 (0x5555557288c3 in /home3/md510/anaconda3/envs/foo_k2/bin/python3) frame #20: _PyEval_EvalCodeWithName + 0x2c3 (0x555555704503 in /home3/md510/anaconda3/envs/foo_k2/bin/python3) frame #21: <unknown function> + 0x1b2007 (0x555555706007 in /home3/md510/anaconda3/envs/foo_k2/bin/python3) frame #22: _PyEval_EvalFrameDefault + 0x1782 (0x5555557253a2 in /home3/md510/anaconda3/envs/foo_k2/bin/python3) frame #23: _PyFunction_Vectorcall + 0x1a6 (0x555555705706 in /home3/md510/anaconda3/envs/foo_k2/bin/python3) frame #24: PyObject_CallObject + 0x53 (0x55555570dd93 in /home3/md510/anaconda3/envs/foo_k2/bin/python3) frame #25: THPFunction_apply(_object*, _object*) + 0x8fd (0x2aaac76a83fd in /home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/torch/lib/libtorch_python.so) frame #26: PyCFunction_Call + 0xf9 (0x55555567fe99 in /home3/md510/anaconda3/envs/foo_k2/bin/python3) frame #27: _PyObject_MakeTpCall + 0x31e (0x55555568ef2e in /home3/md510/anaconda3/envs/foo_k2/bin/python3) frame #28: _PyEval_EvalFrameDefault + 0x534b (0x555555728f6b in /home3/md510/anaconda3/envs/foo_k2/bin/python3) frame #29: _PyEval_EvalCodeWithName + 0x2c3 (0x555555704503 in /home3/md510/anaconda3/envs/foo_k2/bin/python3) frame #30: <unknown function> + 0x1b2007 (0x555555706007 in /home3/md510/anaconda3/envs/foo_k2/bin/python3) frame #31: _PyEval_EvalFrameDefault + 0x1782 (0x5555557253a2 in /home3/md510/anaconda3/envs/foo_k2/bin/python3) frame #32: _PyEval_EvalCodeWithName + 0x2c3 (0x555555704503 in /home3/md510/anaconda3/envs/foo_k2/bin/python3) frame #33: _PyFunction_Vectorcall + 0x378 (0x5555557058d8 in /home3/md510/anaconda3/envs/foo_k2/bin/python3) frame #34: _PyEval_EvalFrameDefault + 0x1782 (0x5555557253a2 in /home3/md510/anaconda3/envs/foo_k2/bin/python3) frame #35: _PyFunction_Vectorcall + 0x1a6 (0x555555705706 in /home3/md510/anaconda3/envs/foo_k2/bin/python3) frame #36: <unknown function> + 0x1b1f91 (0x555555705f91 in /home3/md510/anaconda3/envs/foo_k2/bin/python3) frame #37: PyObject_Call + 0x5e (0x5555556790be in /home3/md510/anaconda3/envs/foo_k2/bin/python3) frame #38: _PyEval_EvalFrameDefault + 0x21c1 (0x555555725de1 in /home3/md510/anaconda3/envs/foo_k2/bin/python3) frame #39: _PyEval_EvalCodeWithName + 0x2c3 (0x555555704503 in /home3/md510/anaconda3/envs/foo_k2/bin/python3) frame #40: _PyObject_FastCallDict + 0x2c1 (0x555555673df1 in /home3/md510/anaconda3/envs/foo_k2/bin/python3) frame #41: _PyObject_Call_Prepend + 0x63 (0x55555567e983 in /home3/md510/anaconda3/envs/foo_k2/bin/python3) frame #42: <unknown function> + 0x181b99 (0x5555556d5b99 in /home3/md510/anaconda3/envs/foo_k2/bin/python3) frame #43: _PyObject_MakeTpCall + 0x31e (0x55555568ef2e in /home3/md510/anaconda3/envs/foo_k2/bin/python3) frame #44: _PyEval_EvalFrameDefault + 0x4f2e (0x555555728b4e in /home3/md510/anaconda3/envs/foo_k2/bin/python3) frame #45: _PyEval_EvalCodeWithName + 0x2c3 (0x555555704503 in /home3/md510/anaconda3/envs/foo_k2/bin/python3) frame #46: _PyFunction_Vectorcall + 0x378 (0x5555557058d8 in /home3/md510/anaconda3/envs/foo_k2/bin/python3) frame #47: _PyEval_EvalFrameDefault + 0x1782 (0x5555557253a2 in /home3/md510/anaconda3/envs/foo_k2/bin/python3) frame #48: _PyEval_EvalCodeWithName + 0x2c3 (0x555555704503 in /home3/md510/anaconda3/envs/foo_k2/bin/python3) frame #49: _PyFunction_Vectorcall + 0x378 (0x5555557058d8 in /home3/md510/anaconda3/envs/foo_k2/bin/python3) frame #50: _PyEval_EvalFrameDefault + 0x1782 (0x5555557253a2 in /home3/md510/anaconda3/envs/foo_k2/bin/python3) frame #51: _PyFunction_Vectorcall + 0x1a6 (0x555555705706 in /home3/md510/anaconda3/envs/foo_k2/bin/python3) frame #52: PyObject_Call + 0x5e (0x5555556790be in /home3/md510/anaconda3/envs/foo_k2/bin/python3) frame #53: _PyEval_EvalFrameDefault + 0x21c1 (0x555555725de1 in /home3/md510/anaconda3/envs/foo_k2/bin/python3) frame #54: _PyFunction_Vectorcall + 0x1a6 (0x555555705706 in /home3/md510/anaconda3/envs/foo_k2/bin/python3) frame #55: PyObject_Call + 0x5e (0x5555556790be in /home3/md510/anaconda3/envs/foo_k2/bin/python3) frame #56: _PyEval_EvalFrameDefault + 0x21c1 (0x555555725de1 in /home3/md510/anaconda3/envs/foo_k2/bin/python3) frame #57: _PyFunction_Vectorcall + 0x1a6 (0x555555705706 in /home3/md510/anaconda3/envs/foo_k2/bin/python3) frame #58: _PyEval_EvalFrameDefault + 0xa4b (0x55555572466b in /home3/md510/anaconda3/envs/foo_k2/bin/python3) frame #59: _PyEval_EvalCodeWithName + 0x2c3 (0x555555704503 in /home3/md510/anaconda3/envs/foo_k2/bin/python3) frame #60: _PyFunction_Vectorcall + 0x378 (0x5555557058d8 in /home3/md510/anaconda3/envs/foo_k2/bin/python3) frame #61: _PyEval_EvalFrameDefault + 0xa4b (0x55555572466b in /home3/md510/anaconda3/envs/foo_k2/bin/python3) frame #62: _PyFunction_Vectorcall + 0x1a6 (0x555555705706 in /home3/md510/anaconda3/envs/foo_k2/bin/python3) frame #63: _PyEval_EvalFrameDefault + 0x92f (0x55555572454f in /home3/md510/anaconda3/envs/foo_k2/bin/python3) # Ended (code 256) at Mon Jun 21 11:19:00 SGT 2021, elapsed time 350 seconds I don't know where is wrong? Thanks a lot. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#216>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZFLO3F7D6VJLNGBSUOU3TTT2X6RANCNFSM47AX6H6A> .

csukuangfj · 2021-06-21T04:53:34Z

@shanguanma
Are you using the dataset from lhotse-speech/lhotse#320 ?
I notice that a single wav in that dataset can be more than 19 minutes long, which is too long I think.

snowfall/egs/librispeech/asr/simple_v1/mmi_att_transformer_train.py

Line 66 in ad161f6

feature = batch['inputs']

Can you run print(feature.shape) for LibriSpeech and your own dataset? If yours is way larger than that of LibriSpeech,
then that is the reason for OOM.

shanguanma · 2021-06-21T05:42:51Z

@danpovey

Are you using an alignment model? (If not, the posteriors at the start can
be very flat, which can cause too many states to stay within the pruning
beam).

I haven't used the alignment model

What is the size of the phone set?

[md510@node02 simple_v1]$ ls data/lang_nosp/L.fst.txt  -larth
-rw-r--r-- 1 md510 users 5.9M Jun 15 18:10 data/lang_nosp/L.fst.txt
[md510@node02 simple_v1]$ wc -l  data/lang_nosp/phones.txt 
278 data/lang_nosp/phones.txt

You can mess with the minibatch size, which might help. But finding the
the source is a good idea too.

I will try to reduce the minibatch size.

shanguanma · 2021-06-21T06:33:26Z

@csukuangfj

Are you using the dataset from lhotse-speech/lhotse#320 ?
I notice that a single wav in that dataset can be more than 19 minutes long, which is too long I think.

Yes, without segment , the whole utterance is very log. but I have segment file and I used wav.scp and segment file to get segment utterance (it is about 2s~10s one utterance). I don't think where is wrong.

Can you run print(feature.shape) for LibriSpeech and your own dataset? If yours is way larger than that of LibriSpeech,
then that is the reason for OOM.

Now, I am doing it. the shape is as follows:

In the seame data:

# Started at Mon Jun 21 13:53:50 +08 2021 on node02
feature shape is torch.Size([34, 1974, 80])
feature shape is torch.Size([37, 1819, 80])
feature shape is torch.Size([14, 4630, 80])
feature shape is torch.Size([38, 1710, 80])
feature shape is torch.Size([18, 3467, 80])
feature shape is torch.Size([32, 2076, 80])
feature shape is torch.Size([22, 2972, 80])
feature shape is torch.Size([33, 2036, 80])
feature shape is torch.Size([35, 1882, 80])
feature shape is torch.Size([43, 1571, 80])
feature shape is torch.Size([32, 2044, 80])
feature shape is torch.Size([28, 2346, 80])
feature shape is torch.Size([38, 1714, 80])
feature shape is torch.Size([25, 2617, 80])
feature shape is torch.Size([21, 3069, 80])
feature shape is torch.Size([17, 3971, 80])
feature shape is torch.Size([29, 2269, 80])
feature shape is torch.Size([31, 2123, 80])
feature shape is torch.Size([29, 2213, 80])
feature shape is torch.Size([36, 1844, 80])
feature shape is torch.Size([36, 1816, 80])
feature shape is torch.Size([32, 2086, 80])
feature shape is torch.Size([33, 1948, 80])
feature shape is torch.Size([33, 1975, 80])
feature shape is torch.Size([20, 3234, 80])
feature shape is torch.Size([41, 1628, 80])
feature shape is torch.Size([25, 2655, 80])
feature shape is torch.Size([40, 1636, 80])
feature shape is torch.Size([22, 2877, 80])
feature shape is torch.Size([26, 2491, 80])
feature shape is torch.Size([40, 1658, 80])
feature shape is torch.Size([26, 2504, 80])
feature shape is torch.Size([24, 2664, 80])
feature shape is torch.Size([43, 1552, 80])
feature shape is torch.Size([29, 2275, 80])
feature shape is torch.Size([24, 2755, 80])
feature shape is torch.Size([39, 1644, 80])
feature shape is torch.Size([21, 3057, 80])
feature shape is torch.Size([31, 2100, 80])
feature shape is torch.Size([40, 1645, 80])
feature shape is torch.Size([30, 2255, 80])
feature shape is torch.Size([37, 1780, 80])
feature shape is torch.Size([22, 2921, 80])
feature shape is torch.Size([39, 1701, 80])
feature shape is torch.Size([33, 1969, 80])
feature shape is torch.Size([33, 1988, 80])
feature shape is torch.Size([32, 2089, 80])
feature shape is torch.Size([53, 1260, 80])
feature shape is torch.Size([32, 2084, 80])
feature shape is torch.Size([38, 1712, 80])
feature shape is torch.Size([28, 2370, 80])
feature shape is torch.Size([23, 2870, 80])
feature shape is torch.Size([50, 1377, 80])
feature shape is torch.Size([31, 2108, 80])
feature shape is torch.Size([25, 2652, 80])
feature shape is torch.Size([50, 1294, 80])
feature shape is torch.Size([48, 1414, 80])
feature shape is torch.Size([28, 2331, 80])
feature shape is torch.Size([38, 1817, 80])
feature shape is torch.Size([23, 2784, 80])
feature shape is torch.Size([40, 1621, 80])
feature shape is torch.Size([40, 1695, 80])
feature shape is torch.Size([36, 1914, 80])
feature shape is torch.Size([39, 1649, 80])
feature shape is torch.Size([39, 1671, 80])
feature shape is torch.Size([39, 1741, 80])
feature shape is torch.Size([35, 1895, 80])
feature shape is torch.Size([40, 1591, 80])
feature shape is torch.Size([39, 1661, 80])
feature shape is torch.Size([34, 1859, 80])
feature shape is torch.Size([34, 1960, 80])
feature shape is torch.Size([41, 1599, 80])
feature shape is torch.Size([37, 1875, 80])
feature shape is torch.Size([40, 1659, 80])
feature shape is torch.Size([34, 1925, 80])
feature shape is torch.Size([43, 1533, 80])
feature shape is torch.Size([37, 1831, 80])
feature shape is torch.Size([27, 2454, 80])

In the librispeech data:

in this librispeech , feature shape is torch.Size([34, 1771, 80])
in this librispeech , feature shape is torch.Size([35, 1779, 80])
in this librispeech , feature shape is torch.Size([35, 1698, 80])
in this librispeech , feature shape is torch.Size([34, 1736, 80])
in this librispeech , feature shape is torch.Size([33, 1807, 80])
in this librispeech , feature shape is torch.Size([35, 1754, 80])
in this librispeech , feature shape is torch.Size([34, 1798, 80])
in this librispeech , feature shape is torch.Size([34, 1791, 80])
in this librispeech , feature shape is torch.Size([36, 1748, 80])
in this librispeech , feature shape is torch.Size([33, 1771, 80])
in this librispeech , feature shape is torch.Size([35, 1757, 80])
in this librispeech , feature shape is torch.Size([35, 1771, 80])
in this librispeech , feature shape is torch.Size([32, 1828, 80])
in this librispeech , feature shape is torch.Size([34, 1723, 80])
in this librispeech , feature shape is torch.Size([35, 1744, 80])
in this librispeech , feature shape is torch.Size([33, 1847, 80])
in this librispeech , feature shape is torch.Size([35, 1847, 80])
in this librispeech , feature shape is torch.Size([34, 1722, 80])
in this librispeech , feature shape is torch.Size([33, 1866, 80])
in this librispeech , feature shape is torch.Size([35, 1672, 80])
in this librispeech , feature shape is torch.Size([33, 1808, 80])
in this librispeech , feature shape is torch.Size([33, 1805, 80])
in this librispeech , feature shape is torch.Size([33, 1817, 80])
in this librispeech , feature shape is torch.Size([36, 1662, 80])
in this librispeech , feature shape is torch.Size([35, 1724, 80])
in this librispeech , feature shape is torch.Size([33, 1727, 80])
in this librispeech , feature shape is torch.Size([35, 1797, 80])
in this librispeech , feature shape is torch.Size([32, 1876, 80])
in this librispeech , feature shape is torch.Size([34, 1731, 80])
in this librispeech , feature shape is torch.Size([34, 1839, 80])
in this librispeech , feature shape is torch.Size([31, 1812, 80])
in this librispeech , feature shape is torch.Size([21, 2857, 80])
in this librispeech , feature shape is torch.Size([19, 3265, 80])
in this librispeech , feature shape is torch.Size([28, 2162, 80])
in this librispeech , feature shape is torch.Size([22, 2712, 80])
in this librispeech , feature shape is torch.Size([22, 2950, 80])
in this librispeech , feature shape is torch.Size([30, 2082, 80])
in this librispeech , feature shape is torch.Size([31, 2022, 80])
in this librispeech , feature shape is torch.Size([24, 2462, 80])
in this librispeech , feature shape is torch.Size([28, 2088, 80])
in this librispeech , feature shape is torch.Size([37, 1770, 80])
in this librispeech , feature shape is torch.Size([27, 2299, 80])
in this librispeech , feature shape is torch.Size([29, 2118, 80])
in this librispeech , feature shape is torch.Size([18, 1450, 80])

I did not find the star difference between the above two.

now I reduced minibatch, --max-duration from 500 -> 100 ， it can run it without CUDA out of memory.

shanguanma closed this as completed Jun 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA out of memory #216

CUDA out of memory #216

shanguanma commented Jun 21, 2021

danpovey commented Jun 21, 2021 via email

csukuangfj commented Jun 21, 2021

shanguanma commented Jun 21, 2021

shanguanma commented Jun 21, 2021 •

edited

Loading

CUDA out of memory #216

CUDA out of memory #216

Comments

shanguanma commented Jun 21, 2021

danpovey commented Jun 21, 2021 via email

csukuangfj commented Jun 21, 2021

shanguanma commented Jun 21, 2021

shanguanma commented Jun 21, 2021 • edited Loading

shanguanma commented Jun 21, 2021 •

edited

Loading