Making a model for the Russian language #707

ks-sav · 2021-03-17T21:54:33Z

I proceed to creating a model for the Russian language

I made verify that the code works on my platform, using the LibriSpeech train-clean-100
I prepared 380 hours of Russian speech (1270 speakers) into this format: #437 (comment)

Now I need your advice

Do I need to add English to my dataset?
Can I re-train model to an existing synthesizer model, or is it better to train my own
Do I have any chance of doing this on CPU

ghost · 2021-03-18T02:05:53Z

Advice

I do not recommend adding English, but it is something you can try if you need a model that works for both languages.
Train a new synthesizer model. Don't forget to edit synthesizer/utils/symbols.py to include all the letters of the Russian alphabet. Here is a good start for Russian: symbols.py
Realistically, CPU is too slow. The model needs to learn attention before inference will work. This usually requires 10,000 to 20,000 steps. The training speed on CPU is anywhere from 1 to 4 steps per minute. So you will be waiting 1 to 2 weeks until you know whether your settings are correct. Even after attention is learned, you will be waiting another month or longer to train the 100,000 to 200,000 steps that it takes for the model to become usable.

If you do not have access to a GPU, try to set up this repo, which has a Russian pretrained model. Note: It uses tensorflow and you will need to apply the synthesizer changes in #366 to make it work on CPU. https://github.com/vlomme/Multi-Tacotron-Voice-Cloning

ks-sav · 2021-03-24T19:24:22Z

What can be related to

RuntimeWarning: invalid value encountered in true_divide
wav = wav / np. abs(wav). max () * params.rescaling_max

when running the script synthesizer_preprocess_audio.py ?

ks-sav · 2021-03-24T19:28:50Z

Also, I want to share for those who have similar problems:

Can't pickle <class 'Memory Error'>: it's not the same object as builtins. MemoryError when running synthesizer_preprocess_audio.py is solved using --n_processes 1
Instead of the webrtcvad library, I use webrtcvad-wheels on Windows10

ghost · 2021-03-24T20:58:15Z

RuntimeWarning: invalid value encountered in true_divide

Check for audio files that are completely silent.

RAVANv2 · 2021-03-31T13:59:24Z

@blue-fish I would like to retrain all models. Is there any problem if I use google colab GPU for training purpose. Is it sufficient for training?

ghost · 2021-04-01T00:26:46Z

@RAVANv2 We do not provide support for colab, it can be done but you'll have to figure it out on your own.

fancat-programer · 2021-10-01T08:05:54Z

Advice

* I do not recommend adding English, but it is something you can try if you need a model that works for both languages.

* Train a new synthesizer model. Don't forget to edit `synthesizer/utils/symbols.py` to include all the letters of the Russian alphabet. Here is a good start for Russian: [`symbols.py`](https://github.com/vlomme/Multi-Tacotron-Voice-Cloning/blob/master/synthesizer/utils/symbols.py)

* Realistically, CPU is too slow. The model needs to learn attention before inference will work. This usually requires 10,000 to 20,000 steps. The training speed on CPU is anywhere from 1 to 4 steps per **minute**. So you will be waiting 1 to 2 weeks until you know whether your settings are correct. Even after attention is learned, you will be waiting another month or longer to train the 100,000 to 200,000 steps that it takes for the model to become usable.

If you do not have access to a GPU, try to set up this repo, which has a Russian pretrained model. Note: It uses tensorflow and you will need to apply the synthesizer changes in #366 to make it work on CPU. https://github.com/vlomme/Multi-Tacotron-Voice-Cloning

This model is too awful.

neonsecret · 2022-04-15T10:53:25Z

see my fork
https://github.com/neonsecret/Real-Time-Voice-Cloning-Multilang
it is adjusted to train the bilingual ru+en model and is easily adjustable for adding new languages

vorob1 · 2022-05-26T18:31:36Z

see my fork https://github.com/neonsecret/Real-Time-Voice-Cloning-Multilang it is adjusted to train the bilingual ru+en model and is easily adjustable for adding new languages

Sir, that's exactly what i'm looking for. I wanna correct some wrong voiceover in old game, but since i can't get in touch with actor i want to simulate his voice.

The subj tool works, but can't do russian voice https://youtu.be/lDbpoaaBJSo
Your fork gives me errors:

PS C:\Users\babud\Downloads\Real-Time-Voice-Cloning-Multilang-master> python demo_toolbox.py
Traceback (most recent call last):
  File "demo_toolbox.py", line 7, in <module>
    from utils.default_models import ensure_default_models
ModuleNotFoundError: No module named 'utils.default_models'

My knowledge on all these python stuff is low so i just copy paste commands, sometimes try to understand its errors, but this looks unsolvable with my level of knowledge.

I want simple thing, launch GUI, point program to WAV files with actor voice, enter text and get voiceover files :)

I also tried python demo_cli.py, got lot's of stuff but in the end it was this:

FileNotFoundError: [Errno 2] No such file or directory: 'saved_models\\rusmodeltweaked\\synthesizer.pt'

vorob1 · 2022-05-27T10:00:20Z

Okay i managed to turn on toolbox by copying some files from original build, now when i add wav and try synth +vocode i get this error:

size mismatch for encoder.embeddingweight: copying a param with shape torch.5ize([66, 512]) from chequoint, the shape in current model is tord1.Size([194, 512]).

ghost closed this as completed Apr 1, 2021

ghost mentioned this issue Oct 8, 2021

Support for other languages #30

Open

vorob1 mentioned this issue May 27, 2022

Fixing neonsecret fork with RUS support (newbie questions) #1072

Open

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Making a model for the Russian language #707

Making a model for the Russian language #707

ks-sav commented Mar 17, 2021 •

edited

Loading

ghost commented Mar 18, 2021

ks-sav commented Mar 24, 2021

ks-sav commented Mar 24, 2021

ghost commented Mar 24, 2021

RAVANv2 commented Mar 31, 2021

ghost commented Apr 1, 2021

fancat-programer commented Oct 1, 2021

neonsecret commented Apr 15, 2022

vorob1 commented May 26, 2022 •

edited

Loading

vorob1 commented May 27, 2022

Making a model for the Russian language #707

Making a model for the Russian language #707

Comments

ks-sav commented Mar 17, 2021 • edited Loading

ghost commented Mar 18, 2021

ks-sav commented Mar 24, 2021

ks-sav commented Mar 24, 2021

ghost commented Mar 24, 2021

RAVANv2 commented Mar 31, 2021

ghost commented Apr 1, 2021

fancat-programer commented Oct 1, 2021

neonsecret commented Apr 15, 2022

vorob1 commented May 26, 2022 • edited Loading

vorob1 commented May 27, 2022

ks-sav commented Mar 17, 2021 •

edited

Loading

vorob1 commented May 26, 2022 •

edited

Loading