In this repository, an attempt was made to examine all aspects of the wav2vec2 model.
-
Datasets: for fine-tuning the Sharif-Wav2vec2-v1 model we've used: Mozilla Common Voice
The main datasets used for fine-tuning the Sharif-Wav2vec2-v2 model consist of BigFarsdat, DeepMine, FarsSpon & Mozilla Common Voice (AGP Dataset)
-
Corpus : Most of our textual data was taken from naab corpus which is a Huge corpora of textual data in Farsi
-
System Config: To fine-tune this model, NVIDIA GeForce RTX 3060-12 GB is used
Order of use:
- Preprocessing
- Fine-tuning
- MakingLM
- Test Model
- client
-
🤗 You can find fine-tuned models at these addresses:
Several models were fine-tuned in this process, so this is the reason for the discrepancy between the code results. You insert your own route model. In order to make a fair comparison between the existing wav2vec2 models, we prepared a standard test set including various and appropriate data, which will soon be included with our paper.
- Base Model:https://huggingface.co/facebook/wav2vec2-large-xlsr-53
- Base Paper: https://arxiv.org/abs/2006.13979
- Language Model: https://github.com/kpu/kenlm https://kheafield.com/code/kenlm/
- Other Wav2vec2 Models info: https://github.com/facebookresearch/fairseq/tree/main/examples/wav2vec#wav2vec-20
- Our Standard Farsi Testset : Loading .... :hourglass_flowing_sand:
Thanks to Sadra Sabouri for his collaboration:handshake::handshake:
Also, I would like to thank Mehrdad Farahani for his normalizer and dictionary 🤝
⭐Give us a star if you found this repo useful.
🙋♀️ Open an issue if you have any comments about them.
🥰 Feel free to open a pull request addding your feature. We'll be more than happy to accept them.