This is public domain speech dataset consisting of 24018 short audio clips of a single speaker reading sentences in Polish. A transcription is provided for each clip. Clips have total length of more than 22 hours.
Texts are in public domain. The audio was recorded in 2021-22 as a part of my master's thesis and is in public domain.
The dataset is available at:
If you use this dataset, please cite:
@masterthesis{mcspeech,
title={Analiza porównawcza korpusów nagrań mowy dla celów syntezy mowy w języku polskim},
author={Czyżnikiewicz, Mateusz},
year={2022},
month={December},
school={Warsaw University of Technology},
type={Master's thesis},
doi={10.13140/RG.2.2.26293.24800},
note={Available at \url{http://dx.doi.org/10.13140/RG.2.2.26293.24800}},
}
Also, if you find this resource helpful, kindly consider leaving a ⭐.