Scripts for training colloquial Finnish wav2vec 2.0 models
Model | Labeled Data, h | DEV WER, % | TEST WER, % |
---|---|---|---|
Wav2Vec 2.0 Base VP-Finnish | N/A | N/A | N/A |
Wav2Vec 2.0 Base VP-Finnish | 100 | 29.35 | 31.90 |
Wav2Vec 2.0 Base VP-Finnish | 1500 | 22.18 | 24.43 |
Wav2Vec 2.0 Base LP (PT from scratch) | N/A | N/A | N/A |
Wav2Vec 2.0 Base LP (PT from scratch) | 100 | 26.40 | 28.92 |
Wav2Vec 2.0 Base LP (PT from scratch) | 1500 | 21.61 | 24.35 |
Wav2Vec 2.0 Base LP (continued PT) | N/A | N/A | N/A |
Wav2Vec 2.0 Base LP (continued PT) | 100 | 22.49 | 24.95 |
Wav2Vec 2.0 Base LP (continued PT) | 1500 | 17.38 | 19.65 |
Wav2Vec 2.0 Large VP-Uralic | N/A | N/A | N/A |
Wav2Vec 2.0 Large VP-Uralic | 100 | 21.02 | 22.98 |
Wav2Vec 2.0 Large VP-Uralic | 1500 | 19.14 | 20.49 |
Wav2Vec 2.0 Large LP (PT from scratch) | N/A | N/A | N/A |
Wav2Vec 2.0 Large LP (PT from scratch) | 100 | 21.66 | 23.85 |
Wav2Vec 2.0 Large LP (PT from scratch) | 1500 | 17.54 | 19.26 |
Wav2Vec 2.0 Large LP (continued PT) | N/A | N/A | N/A |
Wav2Vec 2.0 Large LP (continued PT) | 100 | 22.49 | 24.95 |
Wav2Vec 2.0 Large LP (continued PT) | 1500 | 16.24 | 18.04 |
More details on the models are available in the paper. The models are also available at Huggingface Hub
The scripts shared in this repository are adapted to the AMD hardware of the LUMI supercomputer. To train a wav2vec 2.0 Base model, run
sbatch /scripts/pretraining/fairseq_train_multinode_w2v2_B_512gpus.sh
Note: you can simulate 512 GPUs by using k GPUs and adding command line parameters (before --config-dir
)
distributed_training.distributed_world_size=k
+optimization.update_freq='[x]'
where x = 512/k
To fine-tune a wav2vec 2.0 Base model using Fairseq, run
sbatch scripts/finetuning/fairseq_finetune_multinode_w2v2_B_128gpus_full.sh
Note: you can simulate 128 GPUs by using k GPUs and adding command line parameters (before --config-dir
)
distributed_training.distributed_world_size=k
+optimization.update_freq='[x]'
where x = 128/k
To fine-tune a wav2vec 2.0 Base model using Huggingface Transformers, run
sbatch scripts/finetuning/huggingface_finetune_multinode_w2v2_B_8gpus_full.sh
If you use our models or scripts, please cite our article as:
@inproceedings{getman24_interspeech,
title = {What happens in continued pre-training? Analysis of self-supervised speech
models with continued pre-training for colloquial Finnish ASR},
author = {Yaroslav Getman and Tamas Grosz and Mikko Kurimo},
year = {2024},
booktitle = {Interspeech 2024},
pages = {5043--5047},
doi = {10.21437/Interspeech.2024-476},
}