-
Notifications
You must be signed in to change notification settings - Fork 42
Convergence & decoding problem #170
Comments
@csukuangfj will make a PR to-morrow or the next day that should help the convergence, by using a teacher model. |
... actually there is a flag to turn off the LM rescoring, I believe. That is a new feature, it may be buggy. |
You are right,I run the tests and some are failed:
|
Did you run make -j before running ctest? The log is from the configure phase, not from the build phase. |
Could you try #174? |
@csukuangfj |
@csukuangfj The following are the results of w/o ali_model: During decoding, I turned off the LM rescoring. |
Thanks!
I suppose it's unclear from this whether the alignment is helpful in terms
of WER. We can turn it off when we don't have problems with convergence,
but
for now let's keep it in the code because it makes it easier to play with
new models and not worry so much about will it converge.
…On Sun, Apr 25, 2021 at 2:06 PM ZhichaoWang ***@***.***> wrote:
@csukuangfj <https://github.com/csukuangfj>
I re-installed lhotse, k2 and snowfall, the convergence porblem was solved.
The following are the results of w/o ali_model:
[image: image]
<https://user-images.githubusercontent.com/8521283/115982617-33c08380-a5cf-11eb-8982-8c1626f5877b.png>
During decoding, I turned off the LM rescoring.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#170 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZFLO6OX2IW6E6YU4EIYEDTKOWPTANCNFSM43KAVT4A>
.
|
And please try the LM rescoring code.
…On Sun, Apr 25, 2021 at 2:38 PM Daniel Povey ***@***.***> wrote:
Thanks!
I suppose it's unclear from this whether the alignment is helpful in
terms of WER. We can turn it off when we don't have problems with
convergence, but
for now let's keep it in the code because it makes it easier to play with
new models and not worry so much about will it converge.
On Sun, Apr 25, 2021 at 2:06 PM ZhichaoWang ***@***.***>
wrote:
> @csukuangfj <https://github.com/csukuangfj>
> I re-installed lhotse, k2 and snowfall, the convergence porblem was
> solved.
>
> The following are the results of w/o ali_model:
> [image: image]
> <https://user-images.githubusercontent.com/8521283/115982617-33c08380-a5cf-11eb-8982-8c1626f5877b.png>
>
> During decoding, I turned off the LM rescoring.
>
> —
> You are receiving this because you commented.
> Reply to this email directly, view it on GitHub
> <#170 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AAZFLO6OX2IW6E6YU4EIYEDTKOWPTANCNFSM43KAVT4A>
> .
>
|
@danpovey The LM rescoring related configuration: I didn't use the whole lattice for LM rescoring because of the OOM problem, even when I reduced max-duration to 10 and output-beam-size to 3. The GPU used for decoding is Tesla P40 with 22GB memory. |
Could you try the latest snowfall? I just fixed the bug about |
Rescoring with the whole lattice should give you a WER less than 6% on test-clean. |
I tried the latest snowfall, the output-beam-size option worked. |
Perhaps your GPU has a very limited RAM. How about decreasing the output_beam_size further? It can be |
I found the "--output-beam-size" option has nothing to do with the OOM problem. The OOM problem occured before searching process, following is the log. The GPU capacity is 22GB. CUDA_VISIBLE_DEVICES=7 ./mmi_att_transformer_decode.py --epoch=10 --avg=5 --use-lm-rescoring=True --num-path=-1 --max-duration=10 --output-beam-size=0.01 |
One workaround is to use CPU to do G = k2.arc_sort(G.to('cpu')).to(device) But I am not sure whether you will encounter CUDA OOM in the later stages. |
Hi guys,
I have installed the latest lhotse, k2 and snowfall and run the 100h librispeech example using mmi_att_transformer_train.py. The model seems does not converge well as the valid objf no longer drops after the second epoch. And during decoding, there is an error. Following are the training log and decoding error:
The text was updated successfully, but these errors were encountered: