In this repository, I put the Persian speech dataset along with the related text.
In this link , I put a dataset related to ASR task in Persian language with a duration of 3 hours.
The label of each audio file is in the form of a sentence and the duration of each file is about 10 seconds.
This dataset is not copied from anywhere and it is my personal project that I publish freely. You can use it in your projects.
Also, if you want to have a 86-hour dataset like this, you can contact me. hubare.ra[at]gmail.com [not free]
myaudio_tiny is tiny dataset with a duration of 3 hours.
myaudio_full is big dataset with a duration of 30 hours.
persian_v2 is is big datasat with a duration of 56 hours.
Other sources:
-
Mozilla dataset :
Mozilla Company has started to produce a huge Persian dataset. In its version 7, the company has converted 293 hours of Persian audio to text and published it for free at this link. The sounds in this collection are usually short. -
persianspeechcorpus :
You can also use this site. This ~ 2.5-hour Single-Speaker Speech corpus has been developed using the same methodologies used in the PhD work carried out by Nawar Halabi at the University of Southampton.
I try to publish free Persian datasets in github. Your financial support will encourage me.
Donation link : https://www.patreon.com/persiandataset