TTS(= Text-To-Speech) Model for studying and researching. This Repository is mainly based on ming024/FastSpeech2 and we modified and added codes. And We converted sentecnes into arpabet
TextGrid
files by Montreal Forced Aligner (MFA) before training. We could get these files from ming024/FastSpeech2 repo. You can download from here: Google Drive Folder Link. This is why this repo is named fastspeech2_a
.
Additionally, I added some codes from:
- 🤗
accelerate
:multi-gpu
- Trained on 2 x NVIDIA GeForce RTX 4090 GPUs - ✍🏻️
wandb
wandb
instead ofTensorboard
.wandb
is compatible with 🤗accelerate
and with 🔥pytorch
.
torchmalloc.py
and 🌈colorama
can show your resource in real-time (during training)noisereduce
is available when you runpreprocessor.py
.Non-Stataionary Noise Reduction
prop_decrease
can avoid data-distortion. (0.0 ~ 1.0)- Actually, NOT USED.
- 🔥
[Pytorch-Hub]NVIDIA/HiFi-GAN
: used as a vocoder.
- LJSpeech
Language
: English 🇺🇸Speaker
: Single Speakersample_rate
: 22.05kHz
These codes are run and the example-speeches are synthesized in my vscode environment. I moved this Jupyter Notebook file to Colab to share the synthesized example-speeches below:
- (EXAMPLE_Jupyternotebook) Synthesis.ipynb
- (EXAMPLE_CLI) Synthesis.ipynb
- More_Examples_Synthesized.ipynb
This preprocess.py
can give you the pitch, energy, duration and phones from TextGrid
files.
python preprocess.py config/LJSpeech/preprocess.yaml
First, you should log-in wandb with your token key in CLI.
wandb login --relogin '<your-wandb-api-token>'
Next, you can set your training environment with following commands.
accelerate config
With this command, you can start training.
accelerate launch train.py --n_epochs 800 --save_start_step 12000 --save_epochs 20 --synthesis_logging_epochs 20 --try_name T_01_LJSpeech
Also, you can train your TTS model with this command.
CUDA_VISIBLE_DEVICES=2,3 accelerate launch train.py --n_epochs 800 --save_start_step 12000 --save_epochs 20 --synthesis_logging_epochs 20 --try_name T_01_LJSpeech
you can synthesize speech in CLI with this command:
python synthesize.py --raw_texts <Text to syntheize to speech> --restore_step 100000
You can refer to Colab notebooks (Examples)
above if you wanna synthesize.
Also, you can check these jupyter-notebooks: