Batched Inference for VITS TTS Model

This repository includes two main components: a shell script for multiprocess batched inference and a Python script for single-process inference of the VITS (Variational Inference Text-to-Speech) model.

Installation

All the code in this repository is adapted from the original VITS repository, available here.

Please follow the installation instructions from the original VITS repository before running the scripts.

Batched Inference (Multiprocessing)

Usage

./batched_vits_multiprocess_inference.sh --csv_file <csv_file> --gpu_ids <gpu_ids> --max_process <max_process> --batch_size <batch_size> --vits_config <vits_config> --vits_checkpoint <vits_checkpoint> [--audio_save_dir <audio_save_dir> --noise_scale <noise_scale> --noise_scale_w <noise_scale_w> --length_scale <length_scale> --vits_multispeaker true]

Parameters

--csv_file: Path to the CSV or TSV file containing input data. Contains two columns text and filename. text column has text for which audio has to be generated and saved into filename. Optional speaker_id column for multispeaker model (See Note below Multispeaker Model example). See example csv file: test_data.csv
--gpu_ids: Comma-separated GPU IDs to use for multiprocessing.
--max_process: Maximum number of parallel processes.
--batch_size: Batch size for each process.
--vits_config: Path to the VITS model configuration file.
--vits_checkpoint: Path to the VITS model checkpoint file.
--audio_save_dir: Directory to save generated audio (default: "./VITS_TTS_samples/").
--noise_scale: Noise scale factor (default: 0.667).
--noise_scale_w: Noise scale weight (default: 0.8).
--length_scale: Length scale factor (default: 1).
--vits_multispeaker: Optional flag for indicating whether a multispeaker model is used (default: false).

Example

Single Speaker Model

bash batched_vits_multiprocess_inference.sh --csv_file test_data.csv --gpu_ids 2,4 --max_process 4 --batch_size 2 --vits_config ./configs/ljs_base.json --vits_checkpoint ../pretrained_ljs.pth --audio_save_dir ./TTS_samples/test

Note: No need of speaker_id column in test_data.csv for single speaker model

Multispeaker Model

bash batched_vits_multiprocess_inference.sh --csv_file test_data.csv --gpu_ids 2,4 --max_process 4 --batch_size 2 --vits_config ./configs/vctk_base.json --vits_checkpoint ../pretrained_vctk.pth --audio_save_dir ./TTS_samples/test_sid --vits_multispeaker true

Note: If you choose to use the vits_multispeaker option and "speaker_id" column is absent in your dataset.

In such cases, the script will compensate by generating random speaker IDs, chosen from the range 0 to hyperparameters.data.n_speakers-1.

Output

Log files (in ./logs directory) for each process: log_1.txt, log_2.txt, ..., log_<max_process>.txt.
Generated audio saved in the specified directory.

Note: If audio files with the same name already exist in the output directory, they will not be regenerated.

Single Process Inference (batched_vits_inference.py)

Multiple runs of this file with different arguments are done using batched_vits_multiprocessing_inference.sh

Usage

python batched_vits_inference.py \
    --vits_config <vits_config_path> \
    --vits_checkpoint <vits_checkpoint_path> \
    --audio_saving_dir <audio_save_dir> \
    --data_file <data_file_path> \
    [--seed <seed>] \
    [--start_idx <start_index>] \
    [--end_idx <end_index>] \
    --batch_size <batch_size> \
    [--noise_scale <noise_scale>] \
    [--noise_scale_w <noise_scale_w>] \
    [--length_scale <length_scale>] \
    [--vits_multispeaker True]

Parameters

--vits_config: Path to the VITS model configuration file (default: "../configs/vctk_base.json").
--vits_checkpoint: Path to the VITS model checkpoint file (default: "./pretrained_ljs.pth").
--audio_saving_dir or -v: Directory to save the TTS samples generated by the VITS model (default: "./VITS_TTS_samples/").
--data_file: Path to the CSV or TSV file containing text and audio_filename columns. The text column contains the text for which audio will be generated, and the audio_filename column contains the path where the generated audio will be saved (default: ./test_data.csv).
--seed: Seed for reproducibility (default: 1).
--start_idx: Start from this index in the dataframe used for multiprocessing (inclusive) (default: 0).
--end_idx: End on this index in the dataframe used for multiprocessing (NOT inclusive, so the range is [start_idx, end_idx)) (default: None, defaults to the full length of the dataset).
--batch_size: Batch size for inference.
--noise_scale: Noise scale used for inference (default: 0.667).
--noise_scale_w: Noise scale weight used for inference (default: 0.8).
--length_scale: Duration used for inference (default: 1).
--vits_multispeaker: Optional flag for indicating whether a multispeaker model is used (default: False).

Example

python batched_vits_inference.py \
    --vits_config ./configs/ljs_base.json \
    --vits_checkpoint pretrained_ljs.pth \
    --audio_saving_dir ./VITS_TTS_samples/ \
    --data_file test_data.csv \
    --seed 1 \
    --start_idx 0 \
    --end_idx 100 \
    --batch_size 4 \
    --noise_scale 0.5 \
    --noise_scale_w 0.9 \
    --length_scale 1.2 \
    --vits_multispeaker True

Feel free to adjust the parameters based on your specific needs. Contributions and feedback are welcome!

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
configs		configs
filelists		filelists
monotonic_align		monotonic_align
resources		resources
text		text
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
attentions.py		attentions.py
batched_vits_inference.py		batched_vits_inference.py
batched_vits_multiprocess_inference.sh		batched_vits_multiprocess_inference.sh
commons.py		commons.py
data_utils.py		data_utils.py
inference.ipynb		inference.ipynb
losses.py		losses.py
mel_processing.py		mel_processing.py
models.py		models.py
modules.py		modules.py
preprocess.py		preprocess.py
requirements.txt		requirements.txt
test_data.csv		test_data.csv
train.py		train.py
train_ms.py		train_ms.py
transforms.py		transforms.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Batched Inference for VITS TTS Model

Installation

Batched Inference (Multiprocessing)

Usage

Parameters

Example

Single Speaker Model

Multispeaker Model

Output

Single Process Inference (batched_vits_inference.py)

Usage

Parameters

Example

About

Releases

Packages

Languages

License

piyushsinghpasi/VITS-Batched-Inference

Folders and files

Latest commit

History

Repository files navigation

Batched Inference for VITS TTS Model

Installation

Batched Inference (Multiprocessing)

Usage

Parameters

Example

Single Speaker Model

Multispeaker Model

Output

Single Process Inference (batched_vits_inference.py)

Usage

Parameters

Example

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages