A list of various Audio/Speech datasets about Speech Recognition, Speech Synthesis, Noise, Audio Tagging/Sound Event Detection, Speaker Diarization, Speaker Recognition, (Inverse) Text normalization, Speech Translation, Multilingual, etc. (continuously update)
Table of contents generated with markdown-toc
- Task
- ASR
- TTS
- Noise
- Audio/Sound
- SD
- SR
- TN/ITN
- ST
- Language
- chinese
- english
- ohter
Name | Duration(hours) | Links | Comments |
---|---|---|---|
THCHS-30 | 30 | [SLR18] | train 30 speakers, 10893 utterances test 10 speakers, 2496 utterances |
Aishell | 179 | [SLR33] | 400 speakers |
Aishell2 | 1000 | [Website] | if available, 1991 speakers |
Free ST Chinese Mandarin (ST-CMDS) | 110 | [SLR38] | 855 speakers, 102600 utterances |
Primewords Chinese Corpus Set 1 | 99 | [SLR47] | 296 native Chinese speakers |
aidatatang_200zh | 200 | [SLR62] | 600 speakers |
aidatatang_1505zh | 1505 | [Github] | if available |
MAGICDATA Mandarin Read | 755 | [SLR68] | 1080 speakers |
MAGICDATA Mandarin Conversational (RAMC) | 180 | [SLR123] | 663 speakers |
AliMeeting (M2MeT) | 118.75 (train/dev/test 104.75/4/10) | [SLR119] | ASR, SD |
WenetSpeech | 10000+ | [SLR121] [Github] [Website] |
|
TAL-ASR | 100 | [Website] | 80+ speakers |
TAL-CSASR | 587 | [Website] | code-switching, 200+ speakers |
didispeech | if available |
Name | Duration(hours) | Links | Comments |
---|---|---|---|
LibriSpeech | 1000 | [SLR12] [LM] |
|
GigaSpeech | 33,000+ for unsupervised 10,000 for supervised |
[Github] | |
Multilingual LibriSpeech (MLS) | [SLR94] | Multilingual | |
libri-light | 60,000 unlabelled speech | [Github] | pretraining, unsupervised, semi-supervised |
libriheavy | 50,000 | [Github] | casing, punctuation, context |
Spgispeech | |||
People's Speech |
Name | Duration(hours) | Links | Comments |
---|---|---|---|
AISHELL-3 | 85 | [Website] | 44.1k, 218 native Chinese spearkers, 88035 utterances |
LibriTTS | |||
Name | Duration(hours) | Links | Comments |
---|---|---|---|
MUSAN | [SLR17] | ||
Aachen Impulse Response database (AIR) | [SLR20] | ||
Simulated Room Impulse Response Database | [SLR26] | ||
Room Impulse Response and Noise Database | [SLR28] |
Name | Duration(hours) | Links | Comments |
---|---|---|---|
AliMeeting (M2MeT) | 118.75 (train/dev/test 104.75/4/10) | [SLR119] | ASR, SD |
GigaST
GigaS2S