Below are pre-trained acoustic and language models from Who Needs Words? Lexicon-free Speech Recognition (Likhomanenko et al., 2019).
File | Dataset | Dev Set | Architecture | Lexicon | Tokens |
---|---|---|---|---|---|
baseline_dev-clean+other | LibriSpeech | dev-clean+dev-other | Archfile | Lexicon | Tokens |
baseline_nov93dev | WSJ | nov93dev | Archfile | Lexicon | Tokens |
Convolutional language models (ConvLM) are trained with the fairseq toolkit. n-gram language models are trained with the KenLM toolkit. The below language models are converted into a binary format compatible with the wav2letter++ decoder.
Name | Dataset | Type | Vocab |
---|---|---|---|
lm_librispeech_convlm_char_20B | LibriSpeech | ConvLM 20B | LM Vocab |
lm_librispeech_convlm_word_14B | LibriSpeech | ConvLM 14B | LM Vocab |
lm_librispeech_kenlm_char_15g_pruned | LibriSpeech | 15-gram | - |
lm_librispeech_kenlm_char_20g_pruned | LibriSpeech | 20-gram | - |
lm_librispeech_kenlm_word_4g_200kvocab | LibriSpeech | 4-gram | - |
lm_wsj_convlm_char_20B | WSJ | ConvLM 20B | LM Vocab |
lm_wsj_convlm_word_14B | WSJ | ConvLM 14B | LM Vocab |
lm_wsj_kenlm_char_15g_pruned | WSJ | 15-gram | - |
lm_wsj_kenlm_char_20g_pruned | WSJ | 20-gram | - |
lm_wsj_kenlm_word_4g | WSJ | 4-gram | - |
@article{likhomanenko2019needs,
title={Who needs words? lexicon-free speech recognition},
author={Likhomanenko, Tatiana and Synnaeve, Gabriel and Collobert, Ronan},
journal={arXiv preprint arXiv:1904.04479},
year={2019}
}