Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Making a model for the Russian language #707

Closed
ks-sav opened this issue Mar 17, 2021 · 10 comments
Closed

Making a model for the Russian language #707

ks-sav opened this issue Mar 17, 2021 · 10 comments

Comments

@ks-sav
Copy link

ks-sav commented Mar 17, 2021

I proceed to creating a model for the Russian language

  1. I made verify that the code works on my platform, using the LibriSpeech train-clean-100
  2. I prepared 380 hours of Russian speech (1270 speakers) into this format: #437 (comment)

Now I need your advice

  • Do I need to add English to my dataset?
  • Can I re-train model to an existing synthesizer model, or is it better to train my own
  • Do I have any chance of doing this on CPU
@ghost
Copy link

ghost commented Mar 18, 2021

Advice

  • I do not recommend adding English, but it is something you can try if you need a model that works for both languages.
  • Train a new synthesizer model. Don't forget to edit synthesizer/utils/symbols.py to include all the letters of the Russian alphabet. Here is a good start for Russian: symbols.py
  • Realistically, CPU is too slow. The model needs to learn attention before inference will work. This usually requires 10,000 to 20,000 steps. The training speed on CPU is anywhere from 1 to 4 steps per minute. So you will be waiting 1 to 2 weeks until you know whether your settings are correct. Even after attention is learned, you will be waiting another month or longer to train the 100,000 to 200,000 steps that it takes for the model to become usable.

If you do not have access to a GPU, try to set up this repo, which has a Russian pretrained model. Note: It uses tensorflow and you will need to apply the synthesizer changes in #366 to make it work on CPU. https://github.com/vlomme/Multi-Tacotron-Voice-Cloning

@ks-sav
Copy link
Author

ks-sav commented Mar 24, 2021

What can be related to

RuntimeWarning: invalid value encountered in true_divide
wav = wav / np. abs(wav). max () * params.rescaling_max 

when running the script synthesizer_preprocess_audio.py ?

@ks-sav
Copy link
Author

ks-sav commented Mar 24, 2021

Also, I want to share for those who have similar problems:

  • Can't pickle <class 'Memory Error'>: it's not the same object as builtins. MemoryError when running synthesizer_preprocess_audio.py is solved using --n_processes 1
  • Instead of the webrtcvad library, I use webrtcvad-wheels on Windows10

@ghost
Copy link

ghost commented Mar 24, 2021

RuntimeWarning: invalid value encountered in true_divide

Check for audio files that are completely silent.

@RAVANv2
Copy link

RAVANv2 commented Mar 31, 2021

@blue-fish I would like to retrain all models. Is there any problem if I use google colab GPU for training purpose. Is it sufficient for training?

@ghost
Copy link

ghost commented Apr 1, 2021

@RAVANv2 We do not provide support for colab, it can be done but you'll have to figure it out on your own.

@ghost ghost closed this as completed Apr 1, 2021
@fancat-programer
Copy link

Advice

* I do not recommend adding English, but it is something you can try if you need a model that works for both languages.

* Train a new synthesizer model. Don't forget to edit `synthesizer/utils/symbols.py` to include all the letters of the Russian alphabet. Here is a good start for Russian: [`symbols.py`](https://github.com/vlomme/Multi-Tacotron-Voice-Cloning/blob/master/synthesizer/utils/symbols.py)

* Realistically, CPU is too slow. The model needs to learn attention before inference will work. This usually requires 10,000 to 20,000 steps. The training speed on CPU is anywhere from 1 to 4 steps per **minute**. So you will be waiting 1 to 2 weeks until you know whether your settings are correct. Even after attention is learned, you will be waiting another month or longer to train the 100,000 to 200,000 steps that it takes for the model to become usable.

If you do not have access to a GPU, try to set up this repo, which has a Russian pretrained model. Note: It uses tensorflow and you will need to apply the synthesizer changes in #366 to make it work on CPU. https://github.com/vlomme/Multi-Tacotron-Voice-Cloning

This model is too awful.

@ghost ghost mentioned this issue Oct 8, 2021
@neonsecret
Copy link

see my fork
https://github.com/neonsecret/Real-Time-Voice-Cloning-Multilang
it is adjusted to train the bilingual ru+en model and is easily adjustable for adding new languages

@vorob1
Copy link

vorob1 commented May 26, 2022

see my fork https://github.com/neonsecret/Real-Time-Voice-Cloning-Multilang it is adjusted to train the bilingual ru+en model and is easily adjustable for adding new languages

Sir, that's exactly what i'm looking for. I wanna correct some wrong voiceover in old game, but since i can't get in touch with actor i want to simulate his voice.

The subj tool works, but can't do russian voice https://youtu.be/lDbpoaaBJSo
Your fork gives me errors:

PS C:\Users\babud\Downloads\Real-Time-Voice-Cloning-Multilang-master> python demo_toolbox.py
Traceback (most recent call last):
  File "demo_toolbox.py", line 7, in <module>
    from utils.default_models import ensure_default_models
ModuleNotFoundError: No module named 'utils.default_models'

My knowledge on all these python stuff is low so i just copy paste commands, sometimes try to understand its errors, but this looks unsolvable with my level of knowledge.

I want simple thing, launch GUI, point program to WAV files with actor voice, enter text and get voiceover files :)

I also tried python demo_cli.py, got lot's of stuff but in the end it was this:

FileNotFoundError: [Errno 2] No such file or directory: 'saved_models\\rusmodeltweaked\\synthesizer.pt'

@vorob1
Copy link

vorob1 commented May 27, 2022

Okay i managed to turn on toolbox by copying some files from original build, now when i add wav and try synth +vocode i get this error:

size mismatch for encoder.embeddingweight: copying a param with shape torch.5ize([66, 512]) from chequoint, the shape in current model is tord1.Size([194, 512]).

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants