-
Notifications
You must be signed in to change notification settings - Fork 42
CUDA out of memory #216
Comments
You can mess with the minibatch size, which might help. But finding the
source is a good idea too.
Are you using an alignment model? (If not, the posteriors at the start can
be very flat, which can cause too many states to stay within the pruning
beam).
What is the size of the phone set?
…On Mon, Jun 21, 2021 at 11:39 AM shanguanma ***@***.***> wrote:
I try to use the new snowfall and k2-fsa(0.3.5) to Reproduce your
recipe(Librispeech) results, I use the below script:
$cuda_cmd log/stage6_train.log\
CUDA_VISIBLE_DEVICES="4" python3 ./mmi_att_transformer_train.py \
--world-size 1\
--full-libri false\
--use-ali-model false \
--num-workers-train 1\
--num-workers-valid 1
$decode_cmd log/stage7_decode.log\
CUDA_VISIBLE_DEVICES="4" python3 ./mmi_att_transformer_decode.py
Get result is as follows:
2021-06-15 11:40:13,293 INFO [common.py:398] [test-clean] %WER 5.78% [3037 / 52576, 571 ins, 181 del, 2285 sub ]
2021-06-15 11:49:09,503 INFO [common.py:398] [test-other] %WER 15.14% [7925 / 52343, 1258 ins, 542 del, 6125 sub ]
environment is summary as follows:
***@***.*** simple_v1]$ python3 -m k2.version
Collecting environment information...
k2 version: 0.3.5
Build type: Release
Git SHA1: 81ad3a580361e20b828d5eb1120999ecd0d7c675
Git date: Sat Jun 5 11:36:50 2021
Cuda used to build k2: 10.2
cuDNN used to build k2: 8.0.2
Python version used to build k2: 3.8
OS used to build k2: Ubuntu 16.04.7 LTS
CMake version: 3.18.4
GCC version: 5.5.0
CMAKE_CUDA_FLAGS: --expt-extended-lambda -gencode arch=compute_35,code=sm_35 --expt-extended-lambda -gencode arch=compute_50,code=sm_50 --expt-extended-lambda -gencode arch=compute_60,code=sm_60 --expt-extended-lambda -gencode arch=compute_61,code=sm_61 --expt-extended-lambda -gencode arch=compute_70,code=sm_70 --expt-extended-lambda -gencode arch=compute_75,code=sm_75 -D_GLIBCXX_USE_CXX11_ABI=0 --compiler-options -Wall --compiler-options -Wno-unknown-pragmas --compiler-options -Wno-strict-overflow
CMAKE_CXX_FLAGS: -D_GLIBCXX_USE_CXX11_ABI=0 -Wno-strict-overflow
PyTorch version used to build k2: 1.8.1
PyTorch is using Cuda: 10.2
NVTX enabled: True
With CUDA: True
Disable debug: True
Sync kernels : False
Disable checks: False
Now I use other corpus(e.g. seame), at train acoustic model, The program
keeps prompting CUDA out of memory
Note: GPU is RTX8000(48G per GPU), my running code is as follows:
$cuda_cmd log/stage5_train.log\
CUDA_VISIBLE_DEVICES="2,3,4" python3 ./mmi_att_transformer_train_seame.py \
--world-size 3\
--use-ali-model false \
--num-workers-train 1\
--num-workers-valid 1
error log is as follows:
# CUDA_VISIBLE_DEVICES=2,3,4 python3 ./mmi_att_transformer_train_seame.py --world-size 3 --use-ali-model false --num-workers-train 1 --num-workers-valid 1
# Invoked at Mon Jun 21 11:13:10 SGT 2021 from node03
#
# Started at Mon Jun 21 11:14:08 +08 2021 on node02
Traceback (most recent call last):
File "./mmi_att_transformer_train_seame.py", line 724, in <module>
main()
File "./mmi_att_transformer_train_seame.py", line 717, in main
mp.spawn(run, args=(world_size, args), nprocs=world_size, join=True)
File "/home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes
while not context.join():
File "/home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 150, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:
-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
fn(i, *args)
File "/home3/md510/w2020/k2_fsa_2021/snowfall/egs/seame/asr/simple_v1/mmi_att_transformer_train_seame.py", line 630, in run
objf, valid_objf, global_batch_idx_train = train_one_epoch(
File "/home3/md510/w2020/k2_fsa_2021/snowfall/egs/seame/asr/simple_v1/mmi_att_transformer_train_seame.py", line 257, in train_one_epoch
curr_batch_objf, curr_batch_frames, curr_batch_all_frames = get_objf(
File "/home3/md510/w2020/k2_fsa_2021/snowfall/egs/seame/asr/simple_v1/mmi_att_transformer_train_seame.py", line 113, in get_objf
mmi_loss, tot_frames, all_frames = loss_fn(nnet_output, texts, supervision_segments)
File "/home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home3/md510/w2020/k2_fsa_2021/snowfall/snowfall/objectives/mmi.py", line 222, in forward
return func(nnet_output=nnet_output,
File "/home3/md510/w2020/k2_fsa_2021/snowfall/snowfall/objectives/mmi.py", line 97, in _compute_mmi_loss_exact_optimized
num_den_tot_scores = num_den_lats.get_tot_scores(log_semiring=True,
File "/home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/k2/fsa.py", line 644, in get_tot_scores
tot_scores = k2.autograd._GetTotScoresFunction.apply(
File "/home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/k2/autograd.py", line 49, in forward
tot_scores = fsas._get_tot_scores(use_double_scores=use_double_scores,
File "/home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/k2/fsa.py", line 623, in _get_tot_scores
forward_scores = self._get_forward_scores(use_double_scores,
File "/home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/k2/fsa.py", line 573, in _get_forward_scores
entering_arc_batches=self._get_entering_arc_batches(),
File "/home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/k2/fsa.py", line 513, in _get_entering_arc_batches
incoming_arcs=self._get_incoming_arcs(),
File "/home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/k2/fsa.py", line 499, in _get_incoming_arcs
cache[name] = _k2.get_incoming_arcs(self.arcs,
RuntimeError: CUDA out of memory. Tried to allocate 17179869182.18 GiB (GPU 0; 44.49 GiB total capacity; 31.00 GiB already allocated; 7.62 GiB free; 35.77 GiB reserved in total by PyTorch)
Exception raised from malloc at /opt/conda/conda-bld/pytorch_1616554788289/work/c10/cuda/CUDACachingAllocator.cpp:288 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x2aab147e12f2 in /home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x1bc21 (0x2aab1457dc21 in /home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
frame #2: <unknown function> + 0x1c944 (0x2aab1457e944 in /home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
frame #3: <unknown function> + 0x1cf63 (0x2aab1457ef63 in /home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
frame #4: k2::PytorchCudaContext::Allocate(unsigned long, void**) + 0x5e (0x2aab2fe7aade in /home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/libk2context.so)
frame #5: k2::NewRegion(std::shared_ptr<k2::Context>, unsigned long) + 0x11e (0x2aab2fbd876e in /home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/libk2context.so)
frame #6: <unknown function> + 0x23a61d (0x2aab2fd4661d in /home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/libk2context.so)
frame #7: k2::GetTransposeReordering(k2::Ragged<int>&, int) + 0x2ff (0x2aab2fd641ff in /home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/libk2context.so)
frame #8: k2::GetIncomingArcs(k2::Ragged<k2::Arc>&, k2::Array1<int> const&) + 0x11a (0x2aab2fc4407a in /home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/libk2context.so)
frame #9: <unknown function> + 0x444ed (0x2aab2eb634ed in /home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/_k2.cpython-38-x86_64-linux-gnu.so)
frame #10: <unknown function> + 0x1bd5f (0x2aab2eb3ad5f in /home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/_k2.cpython-38-x86_64-linux-gnu.so)
frame #11: PyCFunction_Call + 0x54 (0x55555567fdf4 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #12: _PyObject_MakeTpCall + 0x31e (0x55555568ef2e in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #13: _PyEval_EvalFrameDefault + 0x534b (0x555555728f6b in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #14: <unknown function> + 0x1b1e86 (0x555555705e86 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #15: _PyEval_EvalFrameDefault + 0x4ca3 (0x5555557288c3 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #16: <unknown function> + 0x1b1e86 (0x555555705e86 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #17: _PyEval_EvalFrameDefault + 0x4ca3 (0x5555557288c3 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #18: <unknown function> + 0x1b1e86 (0x555555705e86 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #19: _PyEval_EvalFrameDefault + 0x4ca3 (0x5555557288c3 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #20: _PyEval_EvalCodeWithName + 0x2c3 (0x555555704503 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #21: <unknown function> + 0x1b2007 (0x555555706007 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #22: _PyEval_EvalFrameDefault + 0x1782 (0x5555557253a2 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #23: _PyFunction_Vectorcall + 0x1a6 (0x555555705706 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #24: PyObject_CallObject + 0x53 (0x55555570dd93 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #25: THPFunction_apply(_object*, _object*) + 0x8fd (0x2aaac76a83fd in /home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #26: PyCFunction_Call + 0xf9 (0x55555567fe99 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #27: _PyObject_MakeTpCall + 0x31e (0x55555568ef2e in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #28: _PyEval_EvalFrameDefault + 0x534b (0x555555728f6b in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #29: _PyEval_EvalCodeWithName + 0x2c3 (0x555555704503 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #30: <unknown function> + 0x1b2007 (0x555555706007 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #31: _PyEval_EvalFrameDefault + 0x1782 (0x5555557253a2 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #32: _PyEval_EvalCodeWithName + 0x2c3 (0x555555704503 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #33: _PyFunction_Vectorcall + 0x378 (0x5555557058d8 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #34: _PyEval_EvalFrameDefault + 0x1782 (0x5555557253a2 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #35: _PyFunction_Vectorcall + 0x1a6 (0x555555705706 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #36: <unknown function> + 0x1b1f91 (0x555555705f91 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #37: PyObject_Call + 0x5e (0x5555556790be in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #38: _PyEval_EvalFrameDefault + 0x21c1 (0x555555725de1 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #39: _PyEval_EvalCodeWithName + 0x2c3 (0x555555704503 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #40: _PyObject_FastCallDict + 0x2c1 (0x555555673df1 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #41: _PyObject_Call_Prepend + 0x63 (0x55555567e983 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #42: <unknown function> + 0x181b99 (0x5555556d5b99 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #43: _PyObject_MakeTpCall + 0x31e (0x55555568ef2e in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #44: _PyEval_EvalFrameDefault + 0x4f2e (0x555555728b4e in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #45: _PyEval_EvalCodeWithName + 0x2c3 (0x555555704503 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #46: _PyFunction_Vectorcall + 0x378 (0x5555557058d8 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #47: _PyEval_EvalFrameDefault + 0x1782 (0x5555557253a2 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #48: _PyEval_EvalCodeWithName + 0x2c3 (0x555555704503 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #49: _PyFunction_Vectorcall + 0x378 (0x5555557058d8 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #50: _PyEval_EvalFrameDefault + 0x1782 (0x5555557253a2 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #51: _PyFunction_Vectorcall + 0x1a6 (0x555555705706 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #52: PyObject_Call + 0x5e (0x5555556790be in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #53: _PyEval_EvalFrameDefault + 0x21c1 (0x555555725de1 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #54: _PyFunction_Vectorcall + 0x1a6 (0x555555705706 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #55: PyObject_Call + 0x5e (0x5555556790be in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #56: _PyEval_EvalFrameDefault + 0x21c1 (0x555555725de1 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #57: _PyFunction_Vectorcall + 0x1a6 (0x555555705706 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #58: _PyEval_EvalFrameDefault + 0xa4b (0x55555572466b in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #59: _PyEval_EvalCodeWithName + 0x2c3 (0x555555704503 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #60: _PyFunction_Vectorcall + 0x378 (0x5555557058d8 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #61: _PyEval_EvalFrameDefault + 0xa4b (0x55555572466b in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #62: _PyFunction_Vectorcall + 0x1a6 (0x555555705706 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #63: _PyEval_EvalFrameDefault + 0x92f (0x55555572454f in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
# Ended (code 256) at Mon Jun 21 11:19:00 SGT 2021, elapsed time 350 seconds
I don't know where is wrong? Thanks a lot.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#216>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZFLO3F7D6VJLNGBSUOU3TTT2X6RANCNFSM47AX6H6A>
.
|
@shanguanma
Can you run |
I haven't used the alignment model
I will try to reduce the minibatch size. |
Yes, without segment , the whole utterance is very log. but I have segment file and I used wav.scp and segment file to get segment utterance (it is about 2s~10s one utterance). I don't think where is wrong.
Now, I am doing it. the shape is as follows: In the seame data:
In the librispeech data:
I did not find the star difference between the above two. now I reduced minibatch, --max-duration from 500 -> 100 , it can run it without CUDA out of memory. |
I try to use the new snowfall and k2-fsa(0.3.5) to Reproduce your recipe(Librispeech) results, I use the below script:
Get result is as follows:
environment is summary as follows:
Now I use other corpus(e.g. seame), at train acoustic model, The program keeps prompting CUDA out of memory
Note: GPU is RTX8000(48G per GPU), my running code is as follows:
error log is as follows:
I don't know where is wrong? Thanks a lot.
The text was updated successfully, but these errors were encountered: