Skip to content

elu-lab/fastspeech2_a

Repository files navigation

fastspeech2_a

TTS(= Text-To-Speech) Model for studying and researching. This Repository is mainly based on :octocat: ming024/FastSpeech2 and we modified and added codes. And We converted sentecnes into arpabet TextGrid files by Montreal Forced Aligner (MFA) before training. We could get these files from :octocat: ming024/FastSpeech2 repo. You can download from here: Google Drive Folder Link. This is why this repo is named fastspeech2_a.

Additionally, I added some codes from:

  • 🤗 accelerate: multi-gpu - Trained on 2 x NVIDIA GeForce RTX 4090 GPUs
  • ✍🏻️ wandb wandb
    • wandb instead of Tensorboard. wandb is compatible with 🤗accelerate and with 🔥pytorch.
    • dashboard screenshots
  • torchmalloc.py and 🌈colorama can show your resource in real-time (during training)
  • noisereduce is available when you run preprocessor.py.
    • Non-Stataionary Noise Reduction
    • prop_decrease can avoid data-distortion. (0.0 ~ 1.0)
    • Actually, NOT USED.
  • 🔥[Pytorch-Hub]NVIDIA/HiFi-GAN: used as a vocoder.

Dataset

  • LJSpeech
    • Language: English 🇺🇸
    • Speaker: Single Speaker
    • sample_rate: 22.05kHz

Colab notebooks (Examples):

These codes are run and the example-speeches are synthesized in my vscode environment. I moved this Jupyter Notebook file to Colab to share the synthesized example-speeches below:

  • (EXAMPLE_Jupyternotebook) Synthesis.ipynb Open In Colab
  • (EXAMPLE_CLI) Synthesis.ipynb Open In Colab
  • More_Examples_Synthesized.ipynb Open In Colab

Preprocess

This preprocess.py can give you the pitch, energy, duration and phones from TextGrid files.

python preprocess.py config/LJSpeech/preprocess.yaml 

Train

First, you should log-in wandb with your token key in CLI.

wandb login --relogin '<your-wandb-api-token>'

Next, you can set your training environment with following commands.

accelerate config

With this command, you can start training.

accelerate launch train.py --n_epochs 800 --save_start_step 12000 --save_epochs 20 --synthesis_logging_epochs 20 --try_name T_01_LJSpeech

Also, you can train your TTS model with this command.

CUDA_VISIBLE_DEVICES=2,3 accelerate launch train.py --n_epochs 800 --save_start_step 12000 --save_epochs 20 --synthesis_logging_epochs 20 --try_name T_01_LJSpeech

Synthesize

you can synthesize speech in CLI with this command:

python synthesize.py --raw_texts <Text to syntheize to speech> --restore_step 100000

You can refer to Colab notebooks (Examples) above if you wanna synthesize.
Also, you can check these jupyter-notebooks:

References

About

fastspeech2_arpabet version

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published