Trained model 600 epochs complete flop #310

Eyalm321 · 2024-08-14T02:07:35Z

Eyalm321
Aug 14, 2024

Hey guys,

I need your help with some tips for training
I fed the model around hour and half of single speaker voice about 2200 segments 1-5 seconds average
Did I over segment it with pyannote?

Should I fed it more data? Less?
How many epochs is average to get good result? Whats your process?

Seems like after 600 epochs its talking complete giberish

Eyalm321 · 2024-08-14T02:48:54Z

Eyalm321
Aug 14, 2024
Author

I think I started too big, im going to go over dataset manually and get best samples 3-8 seconds and feed it 20 minutes, see what it does

0 replies

Eyalm321 · 2024-08-14T06:54:32Z

Eyalm321
Aug 14, 2024
Author

So after feeding it 20 minutes of selected audio about 300 segments , I got better results but still not a predictable voice tonation or clear voice

0 replies

erew123 · 2024-08-14T07:03:50Z

erew123
Aug 14, 2024
Maintainer

Hi @Eyalm321 You are best reading up on the Coqui forums about training, however, 600 epochs is a lot of training. That's approaching the kind of training you would use to train a new language.

If you are just training a voice in a language that the model supports e.g. English, you wont need a huge amount of epochs (the defaults should work). You can over-train a model. Its best to start low and you can always further train a model if needs be. There is no actual perfect recipe for training a model however and ask Coqui note It is hard to estimate the voice quality without asking the actual people.. To paraphrase the advice on the Coqui discussion forums, train the model a bit and see how it sounds, if it doesnt sound correct, train it a little more.

https://docs.coqui.ai/en/latest/faq.html#how-do-i-know-when-to-stop-training

The absolute most important thing is the quality of the training data and dataset. Higher quality audio and ensuring the eval and train csv files match the spoken audio, gives the best results.

3 replies

Eyalm321 Aug 14, 2024
Author

Would you have more of a narrator sound? (kinda flat sound) or a tv show for example would be ok (more emotion samples)

erew123 Aug 14, 2024
Maintainer

Thats really a personal preference. The main things are:

https://docs.coqui.ai/en/latest/what_makes_a_good_dataset.html

Mistake free.

Remove any wrong or broken files. Check annotations, compare transcript and audio length.

Noise free.

Background noise might lead your model to struggle, especially for a good alignment. Even if it learns the alignment, the final result is likely to be suboptimial.

Compatible tone and pitch among voice clips.

For instance, if you are using audiobook recordings for your project, it might have impersonations for different characters in the book. These differences between samples downgrade the model performance.

Good phoneme coverage.

Make sure that your dataset covers a good portion of the phonemes, di-phonemes, and in some languages tri-phonemes.

Naturalness of recordings.

For your model WISIAIL (What it sees is all it learns). Therefore, your dataset should accommodate all the attributes you want to hear from your model.

Eyalm321 Aug 14, 2024
Author

Thanks alot! ill try and let you know how it works out, appreciate the work!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trained model 600 epochs complete flop #310

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Trained model 600 epochs complete flop #310

Eyalm321 Aug 14, 2024

Replies: 3 comments · 3 replies

Eyalm321 Aug 14, 2024 Author

Eyalm321 Aug 14, 2024 Author

erew123 Aug 14, 2024 Maintainer

Eyalm321 Aug 14, 2024 Author

erew123 Aug 14, 2024 Maintainer

Mistake free.

Noise free.

Compatible tone and pitch among voice clips.

Good phoneme coverage.

Naturalness of recordings.

Eyalm321 Aug 14, 2024 Author

Eyalm321
Aug 14, 2024

Replies: 3 comments 3 replies

Eyalm321
Aug 14, 2024
Author

Eyalm321
Aug 14, 2024
Author

erew123
Aug 14, 2024
Maintainer

Eyalm321 Aug 14, 2024
Author

erew123 Aug 14, 2024
Maintainer

Eyalm321 Aug 14, 2024
Author