In this repository, the wavLM model is used for quality and poor-quality data for speaker verification tasks, and the PyCM library is used for evaluation.
-
Datasets: In this review, 30 speakers have been selected from the Farsdat Dataset, 10 speakers is chosen for test as unknows and the rest of speakers as known (target/untarget) each speakers has 10 audio files we use the first audio file as Enrollment file audio files should be 6 secs (here we use ffmpeg to cut them)
-
Evaluation: For the evaluation part, the PyCM library has been used, which is a reliable and comprehensive library and supports many metrics PyCM is a multi-class confusion matrix library written in Python that supports both input data vectors and direct matrix, and a proper tool for post-classification model evaluation that supports most classes and overall statistics parameters. PyCM is the swiss-army knife of confusion matrices, targeted mainly at data scientists that need a broad array of metrics for predictive models and accurate evaluation of a large variety of classifiers.
-
System Config: To fine-tune this model, NVIDIA GeForce RTX 3060-12 GB is used.
-
link to model: https://huggingface.co/SaraSadeghi/Sharif-WavLM
for high-quality(microphone) data: use WavLM_base_AGP for poor-quality(telephony) data: use WavLM_base_telephony
Loading .... :hourglass_flowing_sand:
- Base Model:https://huggingface.co/microsoft/wavlm-base-plus-sd
- Base Paper:https://arxiv.org/abs/2110.13900
- PyCM
Thanks to Sadra Sabouri for his collaboration:handshake::handshake:
and also thanks to PyCM🔥🔥
⭐Give us a star if you found this repo useful.
🙋♀️ Open an issue if you have any comments about them.
🥰 Feel free to open a pull request addding your feature. We'll be more than happy to accept them.