The implementation of the paper:
Autoregressive Co-Training for Learning Discrete Speech Representations
Sung-Lin Yeh, Hao Tang
pip install -r requirements.txt
The co-training model described in the paper is defined in cotraining.py
. Different components of the model
are modular and can be easily modified.
Data are processed to Kaldi I/O form,
which uses scp files to map utterance ids to positions in ark files. Functions used to process .scp
and .ark
files
can be found under dataflow/
. We provide a data sample in sample/
for users to run the pipeline. Users can simply pluge in
your custom dataloader here.
python3 train.py --config config/cotraining.yaml
Hours | Num codes | Model | dev93 (PER) | eval92 (PER) | Link |
---|---|---|---|---|---|
360 | 256 | 3-layer lstm with Marginalization | 19.5 | 19.0 | link |
960 | 256 | 3-layer lstm with Marginalization | 18.2 | 17.8 | link |