Skip to content

The official repository of SpeechCraft dataset, a large-scale expressive bilingual speech dataset with natural language descriptions.

Notifications You must be signed in to change notification settings

thuhcsi/SpeechCraft

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 

Repository files navigation

SpeechCraft

This is the official repository of the ACM Multimedia 2024 paper "SpeechCraft: A Fine-Grained Expressive Speech Dataset with Natural Language Description".

For details of the pipeline and dataset, please refer to our Paper and Demo Page

Download Speech Corpus

Language Speech Corpus #Duration #Clips
ZH Zhvoice 799.68h 1,020,427
ZH AISHELL-3 63.70h 63,011
EN GigaSpeech-M 739.91h 670,070
EN LibriTTS-R 548.88h 352,265

Download Speech Annotation

Description Instruction
ZH download download
EN download download

Request Access to Emphasis Speech Dataset

Since we do not own the copyright of the original audio files, for researchers and educators who wish to use the audio files for non-commercial research and/or educational purposes, we can provide access to our regenerated version under certain conditions and terms. To apply for the AISHELL-3 and LibriTTS-R with fine-grained keyword emphasis, please fill out the EULA form at Emphasis-SpeechCraft-EULA.pdf and send the scanned form to jinzeyu23@mails.tsinghua.edu.cn. Once approved, you will be supplied with a download link.

Please first refer to some emphasis examples provided here. We are actively working on improving methods for large-scale fine-grained data construction that align with human perception.

Language Speech Corpus #Duration #Clips
ZH AISHELL-3-stress 50.59h 63,243
EN LibriTTS-R-stress 148.78h 74,496

Pipeline

To be released.

Citation

Please cite our paper if you find this work useful:

@inproceedings{jin2024speechcraft,
title={SpeechCraft: A Fine-Grained Expressive Speech Dataset with Natural Language Description},
author={Zeyu Jin and Jia Jia and Qixin Wang and Kehan Li and Shuoyi Zhou and Songtao Zhou and Xiaoyu Qin and Zhiyong Wu},
booktitle={ACM Multimedia 2024},
year={2024},
url={https://openreview.net/forum?id=rjAY1DGUWC}
}

About

The official repository of SpeechCraft dataset, a large-scale expressive bilingual speech dataset with natural language descriptions.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published