This is the project for the paper LTL-UDE at Low-Resource Speech-to-Text Shared Task : Investigating Mozilla DeepSpeech in a low-resource setting published at SWISSTEXT 5th and KONVENS 2020.
This project aims to develop a working Speech to Text module using Mozilla DeepSpeech, which can be used for any Audio processing pipeline. Mozillla DeepSpeech is a state-of-the-art open-source automatic speech recognition (ASR) toolkit. DeepSpeech is using a model trained by deep learning techniques based on Baidu's Deep Speech research paper. Project DeepSpeech uses Google's TensorFlow to make the implementation easier.
DeepSpeech-API: https://github.com/AASHISHAG/DeepSpeech-API
This Readme is written for DeepSpeech v0.6.0. Refer to Mozillla DeepSpeech for lastest updates.
virtualenv -p python3 deepspeech-swiss-german
source deepspeech-swiss-german/bin/activate
pip3 install -r python_requirements.txt
$ git clone https://github.com/mozilla/DeepSpeech.git
$ cd DeepSpeech
$ git checkout v0.6.0
$ docker build -t deepspeech_v0.6.0 .
$ docker run -d -it --name deepspeech_v0.6.0 --mount type=bind,source="$(pwd)",target=/root deepspeech_v0.6.0
$ docker exec -it deepspeech_v0.6.0 /bin/bash
Note: Set the locale to en_US.UTF-8 if required:
$ dpkg-reconfigure locales
$ https://perlgeek.de/en/article/set-up-a-clean-utf8-environment <reference>
1. English
- Mozilla Common Voice ~1488h
- LibriSpeech ~1000h
2. German
- Mozilla Common Voice ~454h
- Mailabs ~233h
- German Distant Speech Corpus (TUDA-De) ~184h
- Voxforge ~57h
3. Swiss-German
1. Mozilla_EN
$ mkdir mozilla_en
$ cd mozilla_en
$ wget https://voice-prod-bundler-ee1969a6ce8178826482b88e843c335139bd3fb4.s3.amazonaws.com/cv-corpus-4-2019-12-10/en.tar.gz
$ tar -xzvf en.tar.gz
$ python3 DeepSpeech/bin/import_cv2.py --audio_dir path --filter_alphabet deepspeech-swiss-german/data/en_alphabet.txt export_path <change the path accordingly>
2. LibriSpeech_EN
$ mkdir librispeech
$ cd librispeech
$ python3 DeepSpeech/bin/import_librivox.py export_path <change the path accordingly>
3. Mozilla_DE
$ mkdir mozilla_de
$ cd mozilla_de
$ wget https://voice-prod-bundler-ee1969a6ce8178826482b88e843c335139bd3fb4.s3.amazonaws.com/cv-corpus-4-2019-12-10/de.tar.gz
$ tar -xzvf de.tar.gz
$ python3 DeepSpeech/bin/import_cv2.py --audio_dir path --filter_alphabet deepspeech-swiss-german/data/alphabet.txt export_path <change the path accordingly>
4. Mailabs_DE
$ mkdir mailabs
$ cd mailabs
$ python3 DeepSpeech/bin/import_m-ailabs.py --language de_DE --filter_alphabet deepspeech-swiss-german/data/alphabet.txt export_path <change the path accordingly>
5. Tuda_DE
$ mkdir tuda
$ cd tuda
$ wget http://www.repository.voxforge1.org/downloads/de/german-speechdata-package-v2.tar.gz
$ tar -xzvf german-speechdata-package-v2.tar.gz
$ deepspeech-swiss-german/pre-processing/prepare_data.py --tuda corpus_path export_path
6. Voxforge_DE
$ mkdir voxforge
$ cd voxforge
python3
$ from audiomate.corpus import io
$ dl = io.VoxforgeDownloader(lang='de')
$ dl.download(voxforge_corpus_path)
$ deepspeech-swiss-german/pre-processing/run_to_utf_8.sh
$ python3 deepspeech-swiss-german/prepare_data.py --voxforge corpus_path export_path <change the path accordingly>
NOTE: Change the path accordingly in run_to_utf_8.sh
7. SwissText_DE
$ mkdir swisstext
$ cd swisstext
$ https://drive.switch.ch/index.php/s/PpUArRmN5Ba5C8J <download link>
$ unzip train.zip
$ python3 deepspeech-swiss-german/prepare_data_swiss_german.py
$ python3 deepspeech-swiss-german/shuffle_and_split.py
8. ArchiMob_DE
Follow steps here:
$ https://github.com/AASHISHAG/archimob-swissgerman-deepspeech-importer
We used KenLM toolkit to train a 3-gram language model. It is Language Model inference code by Kenneth Heafield
- Installation
$ git clone https://github.com/kpu/kenlm.git
$ cd kenlm
$ mkdir -p build
$ cd build
$ cmake ..
$ make -j `nproc`
- Corpus
We used an open-source German Speech Corpus released by University of Hamburg and European Parliament Proceedings Parallel Corpus 1996-2011
- Download the data (EN, DE)
##EN
$ using Mozilla default LM and Trie
## DE
$ wget http://ltdata1.informatik.uni-hamburg.de/kaldi_tuda_de/German_sentences_8mil_filtered_maryfied.txt.gz
$ gzip -d German_sentences_8mil_filtered_maryfied.txt.gz
$ wget https://www.statmt.org/europarl/v7/de-en.tgz
$ tar -xzvf de-en.tgz
$ cat German_sentences_8mil_filtered_maryfied.txt >> europarl-v7.de-en.de
- Pre-process the data (DE)
$ deepspeech-swiss-german/pre-processing/prepare_vocab.py europarl-v7.de-en.de exp_path/clean_vocab.txt
- Build the Language Model (DE)
$ kenlm/build/bin/lmplz --text exp_path/clean_vocab.txt --arpa exp_path/words.arpa --o 3
$ kenlm/build/bin/build_binary -T -s exp_path/words.arpa exp_path/de_lm.binary
NOTE: use -S memoryuse_in_%, if malloc expection occurs
Example:
$ kenlm/build/bin/lmplz --text exp_path/clean_vocab.txt --arpa exp_path/words.arpa --o 3 -S 50%
- Build Trie (DE)
$ DeepSpeech/native_client/generate_trie deepspeech-swiss-german/data/alphabet.txt path/de_lm.binary export_path/de_trie
Change the path accordingly.
$ ./DeepSpeech.py --train_files train.csv --dev_files dev.csv --test_files test.csv --alphabet_config_path alphabet.txt --lm_trie_path trie --lm_binary_path lm.binary --test_batch_size 36 --train_batch_size 24 --dev_batch_size 36 --epochs 75 --learning_rate 0.0001 --dropout_rate 0.25 --export_dir ../models
Change the path accordingly.
$ ./DeepSpeech.py --train_files train.csv --dev_files dev.csv --test_files test.csv --alphabet_config_path alphabet.txt --lm_trie_path trie --lm_binary_path lm.binary --test_batch_size 36 --train_batch_size 24 --dev_batch_size 36 --epochs 75 --learning_rate 0.0001 --dropout_rate 0.25 --export_dir ../models AUG_AUDIO="--data_aug_features_additive 0.2 --data_aug_features_multiplicative 0.2 --augmentation_speed_up_std 0.2" AUG_FREQ_TIME="--augmentation_freq_and_time_masking --augmentation_freq_and_time_masking_freq_mask_range 5 --augmentation_freq_and_time_masking_number_freq_masks 3 --augmentation_freq_and_time_masking_time_mask_range 2 --augmentation_freq_and_time_masking_number_time_masks 3" AUG_PITCH_TEMPO="--augmentation_pitch_and_tempo_scaling --augmentation_pitch_and_tempo_scaling_min_pitch 0.95 --augmentation_pitch_and_tempo_scaling_max_pitch 1.2 --augmentation_pitch_and_tempo_scaling_max_tempo 1.2" AUG_SPEC_DROP="--augmentation_spec_dropout_keeprate 0.2"
Some results from our findings.
- English -> German -> Swiss : 56.6
NOTE: Refer our paper for more information.
- Prof. Dr.-Ing. Torsten Zesch - Co-Author
If you use our findings/scripts in your academic work, please cite: