Efficient Minimum Word Error Rate Training for Attention-Based models

The repository contains implementation of Minimum Word Error Rate (MWER) Training based on Monotonic RNN-T Loss for audio transduction. The solution is based on the TensorFlowASR Library (https://github.com/TensorSpeech/TensorFlowASR).

Installation:

NOTE: We assume the user has successfully installed and configured Python (and Docker in case of accessing the code via Docker) prior to the solution installation.

Manual installation:

Install requirements

pip install -r requirements-pre.txt && pip install -r requirements.txt

Download LibriSpeech test dataset.

wget https://www.openslr.org/resources/12/test-clean.tar.gz

tar -xzvf test-clean.tar.gz

Add working directory to PYTHONPATH.

export PYTHONPATH="${PYTHONPATH}:${PWD}"

Create transcriptions (.tsv files) for downloaded datasets.

python ./scripts/create_librispeech_trans.py -d <extracted_dataset_dir> <target_dir>

python ./scripts/create_librispeech_trans.py -d ./LibriSpeech/test-clean ./LibriSpeech/test_transcriptions/test.tsv

Provide paths to the generated transcriptions in the config file ('data_path' in train, eval and test subconfigs)

./examples/conformer/config.yml

Docker Installation:

For convenience we provide a way to access the code using Docker. The guide assumes user has installed docker, docker-compose and nvidia-docker (https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#install-guide).

NOTE: By default docker installation downloads only the test dataset.

Build and run docker container.

docker compose run tensorflow_asr

Training:

Download LibriSpeech train and dev datasets. NOTE: Datasets are quite large (around 24GB) so download process might take a while. Also training will fail on majority of consumer tier GPU devices due to lack of memory.

wget https://www.openslr.org/resources/12/dev-clean.tar.gz
tar -xzvf dev-clean.tar.gz 

wget https://www.openslr.org/resources/12/train-clean-360.tar.gz
tar -xzvf train-clean-360.tar.gz

Create transcriptions (.tsv files) for downloaded datasets.

python ./scripts/create_librispeech_trans.py -d ./LibriSpeech/train-clean-360 /data/LibriSpeech/test_transcriptions/train.tsv
python ./scripts/create_librispeech_trans.py -d ./LibriSpeech/dev-clean /data/LibriSpeech/test_transcriptions/dev.tsv

In order to start training, under the path examples/conformer/train.py, there's a script starting the training process. If user wishes to train with MWER training procedure, in config.yml under model_config there's a boolean mwer_training which, if set to True, starts the MWER training procedure. Otherwise it starts standard training procedure with regular RNN-T loss. train.py receives specific arguments for training. Most important are:

--config a path to model config.yml file.
--sentence_piece a flag whether to use sentence_piece as text tokenizer.
--bs batch size.
--devices which GPU devices are supposed to be used.

The rest of arguments are described in train.py file. Example of such command (that works under default setup) would be:

python examples/conformer/train.py --config examples/conformer/config.yml --sentence_piece --devices 0

Test:

In order to start testing, under the path examples/conformer/test.py, there's a script starting the inference process. test.py receives specific arguments for training. Most important are:

--saved a path to saved model.
--config a path to model config.yml file.
--sentence_piece a flag whether to use sentence_piece as text tokenizer.
--bs batch size.
--output path to output transcriptions.

Example of such command (that works under default setup) would be:

python ./examples/conformer/test.py --config ./examples/conformer/config.yml \
                                    --saved predefined_checkpoints/weights.hdf5 \
                                    --sentence_piece \
                                    --output test_result.tsv \
                                    --bs 1

Inference demonstration:

To run a demonstration on the actual flac file, from the root directory you need to run the command:

python examples/demonstration/conformer.py --config ./examples/conformer/config.yml \
                                           --saved predefined_checkpoints/weights.hdf5 \
                                           --sentence_piece \
                                           --subwords ./vocabularies/librispeech/spm_512 \
                                           --beam_width 1 \
                                           examples/demonstration/wavs/1089-134691-0000.flac

This script demonstrates the usage of the model on real world data.

Name		Name	Last commit message	Last commit date
Latest commit History 565 Commits
.github/workflows		.github/workflows
examples		examples
notebooks		notebooks
scripts		scripts
tensorflow_asr		tensorflow_asr
tests		tests
vocabularies		vocabularies
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
requirements-pre.txt		requirements-pre.txt
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Efficient Minimum Word Error Rate Training for Attention-Based models

Installation:

Manual installation:

Docker Installation:

Training:

Test:

Inference demonstration:

About

Releases

Packages

Languages

License

jkurdek/ThesisMWER

Folders and files

Latest commit

History

Repository files navigation

Efficient Minimum Word Error Rate Training for Attention-Based models

Installation:

Manual installation:

Docker Installation:

Training:

Test:

Inference demonstration:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages