Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Help with using this tool for creating TTS training data #69

Open
weedwind opened this issue Jul 11, 2024 · 3 comments
Open

Help with using this tool for creating TTS training data #69

weedwind opened this issue Jul 11, 2024 · 3 comments

Comments

@weedwind
Copy link

Hi,

Thank you very much for building this tool. I want to use it to segment/align libri-light for training TTS. I am new to this tool. Can anyone help me with the following questions:

  1. If I want to segment the books to about 10 sec chunks (rather than 30), what hyperparameters I should change?
  2. In the output, there are two sets of texts, lowercase with punctuations, and uppercase without punctuations, which one should I use as the ground truth for training TTS?

Thank you so much for any help.

@pkufool
Copy link
Collaborator

pkufool commented Jul 16, 2024

If I want to segment the books to about 10 sec chunks (rather than 30), what hyperparameters I should change?

"min_duration": 2,
"max_duration": 30,
"expected_duration": (5, 20),

In the output, there are two sets of texts, lowercase with punctuations, and uppercase without punctuations, which one should I use as the ground truth for training TTS?

It's up to you, I will suggest to use texts with punctuations.

@weedwind
Copy link
Author

@pkufool Thank you so much. From a quick look at the documentation, it looks to me that the texts with punctuations are the reference, and the uppercase ones are the output from ASR. I am wondering is the uppercase text equally accurate as the reference, if I want to use them to train TTS?

@pkufool
Copy link
Collaborator

pkufool commented Jul 24, 2024

No. If you don't want the punctuations, you can remove them and convert the punctuation texts to uppercase, it is not a good idea to use the ASR transcrptions to train TTS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants