Skip to content

aalto-speech/colloquial-Finnish-wav2vec2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Colloquial Finnish wav2vec2

Scripts for training colloquial Finnish wav2vec 2.0 models

Pre-trained and fine-tuned models

Model Labeled Data, h DEV WER, % TEST WER, %
Wav2Vec 2.0 Base VP-Finnish N/A N/A N/A
Wav2Vec 2.0 Base VP-Finnish 100 29.35 31.90
Wav2Vec 2.0 Base VP-Finnish 1500 22.18 24.43
Wav2Vec 2.0 Base LP (PT from scratch) N/A N/A N/A
Wav2Vec 2.0 Base LP (PT from scratch) 100 26.40 28.92
Wav2Vec 2.0 Base LP (PT from scratch) 1500 21.61 24.35
Wav2Vec 2.0 Base LP (continued PT) N/A N/A N/A
Wav2Vec 2.0 Base LP (continued PT) 100 22.49 24.95
Wav2Vec 2.0 Base LP (continued PT) 1500 17.38 19.65
Wav2Vec 2.0 Large VP-Uralic N/A N/A N/A
Wav2Vec 2.0 Large VP-Uralic 100 21.02 22.98
Wav2Vec 2.0 Large VP-Uralic 1500 19.14 20.49
Wav2Vec 2.0 Large LP (PT from scratch) N/A N/A N/A
Wav2Vec 2.0 Large LP (PT from scratch) 100 21.66 23.85
Wav2Vec 2.0 Large LP (PT from scratch) 1500 17.54 19.26
Wav2Vec 2.0 Large LP (continued PT) N/A N/A N/A
Wav2Vec 2.0 Large LP (continued PT) 100 22.49 24.95
Wav2Vec 2.0 Large LP (continued PT) 1500 16.24 18.04

More details on the models are available in the paper. The models are also available at Huggingface Hub

Pre-training the models

The scripts shared in this repository are adapted to the AMD hardware of the LUMI supercomputer. To train a wav2vec 2.0 Base model, run

sbatch /scripts/pretraining/fairseq_train_multinode_w2v2_B_512gpus.sh

Note: you can simulate 512 GPUs by using k GPUs and adding command line parameters (before --config-dir) distributed_training.distributed_world_size=k +optimization.update_freq='[x]' where x = 512/k

Fine-tuning the models with CTC

To fine-tune a wav2vec 2.0 Base model using Fairseq, run

sbatch scripts/finetuning/fairseq_finetune_multinode_w2v2_B_128gpus_full.sh

Note: you can simulate 128 GPUs by using k GPUs and adding command line parameters (before --config-dir) distributed_training.distributed_world_size=k +optimization.update_freq='[x]' where x = 128/k

Fine-tuning the models with CTC using 🤗Transformers

To fine-tune a wav2vec 2.0 Base model using Huggingface Transformers, run

sbatch scripts/finetuning/huggingface_finetune_multinode_w2v2_B_8gpus_full.sh

Citation

If you use our models or scripts, please cite our article as:

@inproceedings{getman24_interspeech,
  title     = {What happens in continued pre-training? Analysis of self-supervised speech
 models with continued pre-training for colloquial Finnish ASR},
  author    = {Yaroslav Getman and Tamas Grosz and Mikko Kurimo},
  year      = {2024},
  booktitle = {Interspeech 2024},
  pages     = {5043--5047},
  doi       = {10.21437/Interspeech.2024-476},
}

About

Scripts for training colloquial Finnish wav2vec2 models

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published