Just using melotts in korea texts
The repo is developed and tested on Ubuntu 20.04
and Python 3.9
.
pip install -e
python -m unidic download
you need to download weight in melo hugginface.
'config.json' also need to be downloaded.
cd melo
python infer.py -t "<TEXT EXAMPLES>" -m "<weigth_path>" -o "<result_path>" -l 'KR'
you can also change voice speed.
original infer.py do not use voice speed arguments but default speed is too slow for korea language.
So i just added speed arguments to customize. Speed 1.2 fits well in korean voice.
python infer.py -t "<TEXT EXAMPLES>" -m "<weigth_path>" -o "<result_path>" -l 'KR' -sp 1.3
cd test
python test_base_model_tts_package.py
if you use this method you need to add config&checkpoint arguments when you define TTS model.
not just like this 'model = TTS(language=language)' but 'model = TTS(language=language, config_path=config_path, ckpt_path=ckpt_path)'
{
// Use IntelliSense to learn about possible attributes.
// Hover to view descriptions of existing attributes.
// For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
"version": "0.2.0",
"configurations": [
{
"name": "Python Debugger: Current File",
"type": "debugpy",
"request": "launch",
"program": "${file}",
"args": [
"-t","[TEXT]",
"-m","MeloTTS/melo/weight/checkpoint.pth",
"-o","MeloTTS/test/result",
"-l","KR",
"-sp","1.23"
],
"console": "integratedTerminal",
"justMyCode": false
}
]
}
- inference test [2024.05.02]
- voice speed [2024.05.02]
- voice conversion (~ing)
- train code test
MeloTTS is a high-quality multi-lingual text-to-speech library by MyShell.ai. Supported languages include:
The Python API and model cards can be found in this repo or on HuggingFace.
Citation
@software{zhao2024melo,
author={Zhao, Wenliang and Yu, Xumin and Qin, Zengyi},
title = {MeloTTS: High-quality Multi-lingual Multi-accent Text-to-Speech},
url = {https://github.com/myshell-ai/MeloTTS},
year = {2023}
}
This library is under MIT License, which means it is free for both commercial and non-commercial use.
This implementation is based on TTS, VITS, VITS2 and Bert-VITS2. We appreciate their awesome work.