fastspeech2_a

TTS(= Text-To-Speech) Model for studying and researching. This Repository is mainly based on ming024/FastSpeech2 and we modified and added codes. And We converted sentecnes into arpabet TextGrid files by Montreal Forced Aligner (MFA) before training. We could get these files from ming024/FastSpeech2 repo. You can download from here: Google Drive Folder Link. This is why this repo is named fastspeech2_a.

Additionally, I added some codes from:

🤗 accelerate: multi-gpu - Trained on 2 x NVIDIA GeForce RTX 4090 GPUs
✍🏻️ wandb
- wandb instead of Tensorboard. wandb is compatible with 🤗accelerate and with 🔥pytorch.
- dashboard screenshots
torchmalloc.py and 🌈colorama can show your resource in real-time (during training)
noisereduce is available when you run preprocessor.py.
- Non-Stataionary Noise Reduction
- prop_decrease can avoid data-distortion. (0.0 ~ 1.0)
- Actually, NOT USED.
🔥[Pytorch-Hub]NVIDIA/HiFi-GAN: used as a vocoder.

Dataset

LJSpeech
- Language: English 🇺🇸
- Speaker: Single Speaker
- sample_rate: 22.05kHz

Colab notebooks (Examples):

These codes are run and the example-speeches are synthesized in my vscode environment. I moved this Jupyter Notebook file to Colab to share the synthesized example-speeches below:

(EXAMPLE_Jupyternotebook) Synthesis.ipynb
(EXAMPLE_CLI) Synthesis.ipynb
More_Examples_Synthesized.ipynb

Preprocess

This preprocess.py can give you the pitch, energy, duration and phones from TextGrid files.

python preprocess.py config/LJSpeech/preprocess.yaml

Train

First, you should log-in wandb with your token key in CLI.

wandb login --relogin '<your-wandb-api-token>'

Next, you can set your training environment with following commands.

accelerate config

With this command, you can start training.

accelerate launch train.py --n_epochs 800 --save_start_step 12000 --save_epochs 20 --synthesis_logging_epochs 20 --try_name T_01_LJSpeech

Also, you can train your TTS model with this command.

CUDA_VISIBLE_DEVICES=2,3 accelerate launch train.py --n_epochs 800 --save_start_step 12000 --save_epochs 20 --synthesis_logging_epochs 20 --try_name T_01_LJSpeech

Synthesize

you can synthesize speech in CLI with this command:

python synthesize.py --raw_texts <Text to syntheize to speech> --restore_step 100000

You can refer to Colab notebooks (Examples) above if you wanna synthesize.
Also, you can check these jupyter-notebooks:

References

arXiv: FastSpeech 2: Fast and High-Quality End-to-End Text to Speech
ming024/FastSpeech2
HGU-DLLAB/Korean-FastSpeech2-Pytorch Public
🔥[Pytorch-Hub]NVIDIA/HiFi-GAN
🤗 accelerate
🤗 accelerate(Github)
torchmalloc.py is referred from: 🤗 huggingface-peft Github

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
audio		audio
ckpts/ljspeech/t_01		ckpts/ljspeech/t_01
config/LJSpeech		config/LJSpeech
imgs		imgs
inputs		inputs
lexicon		lexicon
log/ljspeech/t_01		log/ljspeech/t_01
model		model
preprocessor		preprocessor
result/ljspeech/t_01		result/ljspeech/t_01
text		text
transformer		transformer
utils		utils
(EXAMPLE)[T_01] synthesis.ipynb		(EXAMPLE)[T_01] synthesis.ipynb
(EXAMPLE_CLI)[T_01] synthesis.ipynb		(EXAMPLE_CLI)[T_01] synthesis.ipynb
README.md		README.md
dataset.py		dataset.py
evaluate.py		evaluate.py
preprocess.py		preprocess.py
requirements.txt		requirements.txt
synthesize.py		synthesize.py
torchmalloc.py		torchmalloc.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

fastspeech2_a

Dataset

Colab notebooks (Examples):

Preprocess

Train

Synthesize

References

About

Releases

Packages

Languages

elu-lab/fastspeech2_a

Folders and files

Latest commit

History

Repository files navigation

fastspeech2_a

Dataset

Colab notebooks (Examples):

Preprocess

Train

Synthesize

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages