Audio/Speech Datasets

A list of various Audio/Speech datasets about Speech Recognition, Speech Synthesis, Noise, Audio Tagging/Sound Event Detection, Speaker Diarization, Speaker Recognition, (Inverse) Text normalization, Speech Translation, Multilingual, etc. (continuously update)

Audio/Speech Datasets

Table of contents generated with markdown-toc

Overview

Task
- ASR
- TTS
- Noise
- Audio/Sound
- SD
- SR
- TN/ITN
- ST
Language
- chinese
- english
- ohter

Task

Speech Recognition

chinese

Name	Duration(hours)	Links	Comments
THCHS-30	30	[SLR18]	train 30 speakers, 10893 utterances test 10 speakers, 2496 utterances
Aishell	179	[SLR33]	400 speakers
Aishell2	1000	[Website]	if available, 1991 speakers
Free ST Chinese Mandarin (ST-CMDS)	110	[SLR38]	855 speakers, 102600 utterances
Primewords Chinese Corpus Set 1	99	[SLR47]	296 native Chinese speakers
aidatatang_200zh	200	[SLR62]	600 speakers
aidatatang_1505zh	1505	[Github]	if available
MAGICDATA Mandarin Read	755	[SLR68]	1080 speakers
MAGICDATA Mandarin Conversational (RAMC)	180	[SLR123]	663 speakers
AliMeeting (M2MeT)	118.75 (train/dev/test 104.75/4/10)	[SLR119]	ASR, SD
WenetSpeech	10000+	[SLR121] [Github] [Website]
TAL-ASR	100	[Website]	80+ speakers
TAL-CSASR	587	[Website]	code-switching, 200+ speakers
didispeech			if available

english

Name	Duration(hours)	Links	Comments
LibriSpeech	1000	[SLR12] [LM]
GigaSpeech	33,000+ for unsupervised 10,000 for supervised	[Github]
Multilingual LibriSpeech (MLS)		[SLR94]	Multilingual
libri-light	60,000 unlabelled speech	[Github]	pretraining, unsupervised, semi-supervised
libriheavy	50,000	[Github]	casing, punctuation, context
Spgispeech
People's Speech

Speech Synthesis

chinese

Name	Duration(hours)	Links	Comments
AISHELL-3	85	[Website]	44.1k, 218 native Chinese spearkers, 88035 utterances
LibriTTS

Noise

Name	Duration(hours)	Links	Comments
MUSAN		[SLR17]
Aachen Impulse Response database (AIR)		[SLR20]
Simulated Room Impulse Response Database		[SLR26]
Room Impulse Response and Noise Database		[SLR28]

Audio Tagging/Sound Event Detection

Speaker Diarization

Name	Duration(hours)	Links	Comments
AliMeeting (M2MeT)	118.75 (train/dev/test 104.75/4/10)	[SLR119]	ASR, SD

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Audio/Speech Datasets

Overview

Task

Speech Recognition

chinese

english

Speech Synthesis

chinese

Noise

Audio Tagging/Sound Event Detection

Speaker Diarization

Speaker Recognition

(Inverse) Text normalization

Speech Translation

Reference

About

Releases

Packages

License

weimeng23/audio-speech-datasets

Folders and files

Latest commit

History

Repository files navigation

Audio/Speech Datasets

Overview

Task

Speech Recognition

chinese

english

Speech Synthesis

chinese

Noise

Audio Tagging/Sound Event Detection

Speaker Diarization

Speaker Recognition

(Inverse) Text normalization

Speech Translation

Reference

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages