Speech recognition models in PyTorch build on Kaldi data pre-processing toolchain. Made at the University of Wrocław, Poland.
The repository holds the code to replicate the experiments from:
- Towards Using Context-Dependent Symbols in CTC Without State-Tying Decision Trees
- Lattice Generation in Attention-based Speech Recognition Models
-
Download the
miniconda 3
for your platform, then install it. -
If you didn't add the conda to your
.bashrc
during installation, run the command provided by the installator, e.g.eval "$(/pio/scratch/1/alan/miniconda2/bin/conda shell.bash hook)"
to populate your current shell with conda programs. -
Install the
conda
environment byconda env create -f environment.yml
To update the environment use
conda env update --file environment.yml
.
- If you haven't done that already
eval "$(/pio/scratch/1/alan/miniconda2/bin/conda shell.bash hook)"
- Activate the appriopriate environment
conda activate pytorch_asr source set-env.sh
Run build_deps.sh
script
cd pytorch-asr ./build_deps.sh
Model training
python train.py egs/wsj/deep_speech2.yaml /tmp/experiment
Greedy model decoding
python decode.py /tmp/experiment/config_train1.yaml --model /tmp/experiment/checkpoints/best.pkl
To decode using external LM, first build language models
bash egs/wsj/build_decoding_fst.sh
Then decode
bash egs/wsj/ctc_kaldi_decode.sh --min_checkpoint 35000 --pkl ALL --subset test lm/lm_ees_tg_larger/biphone runs/ctc_bi
Decoding results will be put in the experiment dir. Consult both scripts for more decoding options.
The models are located in egs/wsj/yamls/*.yaml
.
To train a particular model
python train.py egs/wsj/yaml/ctcg_bi_cde.yaml runs/ctcg_bi_cde
Intermediate checkpoints and best model will be stored in runs/ctcg_bi_cde/checkpoints
.
To decode, select the appriopriate language model.
Model | $lm_path |
---|---|
mono-char CTC | exp/wsjs5/pydata/lm/lm_ees_tg_larger/monophone |
bi-char CTC, CTC-G, CTC-G (+ CDE) | exp/wsjs5/pydata/lm/lm_ees_tg_larger/biphone |
bi-char CTC-GB (+ CDE) | exp/wsjs5/pydata/lm/lm_ees_tg_larger/biphone_contextblank |
bash egs/wsj/ctc_kaldi_decode.sh --min-acwt 0.3 --subset dev $lm_path runs/ctcg_bi_cde
Consult the decoding script for more options.
First, train the initial model with CTC:
python train.py egs/wsj/yamls/lattice_decoding/ctc.yaml runs/lattice_base
The run the second stage with TCN
python train.py egs/wsj/yamls/lattice_decoding/ctc.yaml runs/lattice_stage2 --initialize-from runs/lattice_base/checkpoints/best.pkl
In order to decode, pick a checkpoint:
python decode.py ~/group/mza/recreate/tcn.yaml --model runs/lattice_stage2/checkpoints/best_51853_CER_0.0808282271662.pkl
--csv decoded.csv -m Model.decoder.use_graph_search True
Model.decoder.length_normalization 0 Model.decoder.coverage_weight 0.8
Model.decoder.min_attention_pos 0 Model.decoder.coverage_tau 0.25
Model.decoder.keep_eos_score False Model.decoder.lm_weight 0.75
Model.decoder.att_force_forward "[-10, 50]" Model.decoder.beam_size 10
Model.decoder.lm_file
/net/archive/groups/plggneurony/mza/lm_ees_tg_larger/LG_syms.fst
Datasets.test.batch_size 1