-
Notifications
You must be signed in to change notification settings - Fork 118
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
67 changed files
with
6,830 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,185 @@ | ||
# ChatTTS-Forge | ||
# 🗣️ ChatTTS-Forge | ||
|
||
ChatTTS 锻造厂提供强大的 ChatTTS API,支持类 SSML 语法生成长文本,并能高效管理和复用说话人和风格。 | ||
|
||
# Features | ||
|
||
- **风格提示词注入**: 灵活调整输出风格,通过注入提示词实现个性化。 | ||
- **全面的 API 服务**: 所有功能均通过 API 访问,集成方便。 | ||
- **友好的调试 GUI**: 独立于 Gradio 的 playground,简化调试流程。 | ||
- **OpenAI 风格 API**: `/v1/openai/audio/speech` 提供类似 OpenAI 的语音生成接口。 | ||
- **Google 风格 API**: `/v1/google/text:synthesize` 提供类似 Google 的文本合成接口。 | ||
- **类 SSML 支持**: 使用类 SSML 语法创建丰富的音频长文本。 | ||
- **说话人管理**: 通过名称或 ID 高效复用说话人。 | ||
- **风格管理**: 通过名称或 ID 复用说话风格,内置 32 种不同风格。 | ||
- **文本标准化**: 针对 ChatTTS 优化的文本标准化,解决大部分不支持的 token。 | ||
- **独立 refine API**: 提供单独的 refine 调试接口,提升调试效率。 | ||
- **高效缓存机制**: 生成接口采用 LRU 缓存,提升响应速度。 | ||
|
||
# Useage | ||
|
||
## 环境准备 | ||
|
||
- python | ||
- ffmpeg | ||
- 显存 4gb 以上 (运行占用 3.7gb 左右) | ||
|
||
## 启动项目 | ||
|
||
``` | ||
python launch.py | ||
``` | ||
|
||
## Argument Description | ||
|
||
| Parameter | Type | Default | Description | | ||
| --------------- | ------ | ----------- | ----------------------------------------------------------------------- | | ||
| `--host` | `str` | `"0.0.0.0"` | Host to run the server on | | ||
| `--port` | `int` | `8000` | Port to run the server on | | ||
| `--reload` | `bool` | `False` | Enable auto-reload for development | | ||
| `--compile` | `bool` | `False` | Enable model compile | | ||
| `--lru_size` | `int` | `64` | Set the size of the request cache pool; set to 0 to disable `lru_cache` | | ||
| `--cors_origin` | `str` | `"*"` | Allowed CORS origins. Use `*` to allow all origins | | ||
|
||
# Docker | ||
|
||
WIP 开发中 | ||
|
||
# API | ||
|
||
部署后打开 `http://localhost:8000/docs` 可查看详细信息 | ||
|
||
![api](./docs/api.png) | ||
|
||
# Playground | ||
|
||
实现了一套用于调试 api 的 playground 前端页面,独立于 python 代码非 gradio | ||
|
||
![playgorund](./docs/playground.png) | ||
|
||
# ChatTTS-SSML v0.1 | ||
|
||
> 这是一个实验性质的尝试,现在并没用完整的 SSML 语法和功能。 | ||
ChatTTS-SSML 是类似 微软 tts 的那种格式,结合本系统中的 speaker 和 style 会很好用 | ||
|
||
下面是一个简单的例子 | ||
|
||
```xml | ||
<speak version="0.1"> | ||
<voice spk="Bob" style="narration-relaxed"> | ||
ChatTTS 用于合成多角色多情感的有声书示例 | ||
</voice> | ||
<voice spk="Bob" style="narration-relaxed"> | ||
黛玉冷笑道: | ||
</voice> | ||
<voice spk="female2" style="angry"> | ||
我说呢 [uv_break] ,亏了绊住,不然,早就飞起来了。 | ||
</voice> | ||
<voice spk="Bob" style="narration-relaxed"> | ||
宝玉道: | ||
</voice> | ||
<voice spk="Alice" style="unfriendly"> | ||
“只许和你玩 [uv_break] ,替你解闷。不过偶然到他那里,就说这些闲话。” | ||
</voice> | ||
<voice spk="female2" style="angry"> | ||
“好没意思的话![uv_break] 去不去,关我什么事儿? 又没叫你替我解闷儿 [uv_break],还许你不理我呢” | ||
</voice> | ||
<voice spk="Bob" style="narration-relaxed"> | ||
说着,便赌气回房去了。 | ||
</voice> | ||
</speak> | ||
``` | ||
|
||
## prosody | ||
|
||
> 受限于模型能力,暂时无法做到对单个字控制。尽量在一个 prosody 中用长文本效果更好。 | ||
prosody 和 voice 一样接收所有语音控制参数,除此之外还可以控制 rate volume pitch 以对生成语音进行细致的后处理。 | ||
|
||
一个例子如下 | ||
|
||
```xml | ||
<speak version="0.1"> | ||
<voice spk="Bob" style="narration-relaxed"> | ||
使用 prosody 控制生成文本的语速语调和音量,示例如下 | ||
|
||
<prosody> | ||
无任何限制将会继承父级voice配置进行生成 | ||
</prosody> | ||
<prosody rate="1.5"> | ||
设置 rate 大于1表示加速,小于1为减速 | ||
</prosody> | ||
<prosody pitch="6"> | ||
设置 pitch 调整音调,设置为6表示提高6个半音 | ||
</prosody> | ||
<prosody volume="2"> | ||
设置 volume 调整音量,设置为2表示提高2个分贝 | ||
</prosody> | ||
|
||
在 voice 中无prosody包裹的文本即为默认生成状态下的语音 | ||
</voice> | ||
</speak> | ||
``` | ||
|
||
## break | ||
|
||
空白,用于在文本中插入固定时长的空白停顿 | ||
|
||
```xml | ||
<speak version="0.1"> | ||
<voice spk="Bob" style="narration-relaxed"> | ||
使用 break 标签将会简单的 | ||
|
||
<break time="500" /> | ||
|
||
插入一段空白到生成结果中 | ||
</voice> | ||
</speak> | ||
``` | ||
|
||
# styles list | ||
|
||
> 暂时还没有找到特别稳定的 prompt 方法,使用 style 时可能导致输出质量下降 | ||
| 风格 | 说明 | | ||
| ------------------------- | -------------------------------------------------------------------------------------------------- | | ||
| advertisement_upbeat | 用兴奋和精力充沛的语气推广产品或服务。 | | ||
| affectionate | 以较高的音调和音量表达温暖而亲切的语气。说话者处于吸引听众注意力的状态。说话者的个性往往是讨喜的。 | | ||
| angry | 表达生气和厌恶的语气。 | | ||
| assistant | 数字助理用的是热情而轻松的语气。 | | ||
| calm | 以沉着冷静的态度说话。语气、音调和韵律与其他语音类型相比要统一得多。 | | ||
| chat | 表达轻松随意的语气。 | | ||
| cheerful | 表达积极愉快的语气。 | | ||
| customerservice | 以友好热情的语气为客户提供支持。 | | ||
| depressed | 调低音调和音量来表达忧郁、沮丧的语气。 | | ||
| disgruntled | 表达轻蔑和抱怨的语气。这种情绪的语音表现出不悦和蔑视。 | | ||
| documentary-narration | 用一种轻松、感兴趣和信息丰富的风格讲述纪录片,适合配音纪录片、专家评论和类似内容。 | | ||
| embarrassed | 在说话者感到不舒适时表达不确定、犹豫的语气。 | | ||
| empathetic | 表达关心和理解。 | | ||
| envious | 当你渴望别人拥有的东西时,表达一种钦佩的语气。 | | ||
| excited | 表达乐观和充满希望的语气。似乎发生了一些美好的事情,说话人对此满意。 | | ||
| fearful | 以较高的音调、较高的音量和较快的语速来表达恐惧、紧张的语气。说话人处于紧张和不安的状态。 | | ||
| friendly | 表达一种愉快、怡人且温暖的语气。听起来很真诚且满怀关切。 | | ||
| gentle | 以较低的音调和音量表达温和、礼貌和愉快的语气。 | | ||
| hopeful | 表达一种温暖且渴望的语气。听起来像是会有好事发生在说话人身上。 | | ||
| lyrical | 以优美又带感伤的方式表达情感。 | | ||
| narration-professional | 以专业、客观的语气朗读内容。 | | ||
| narration-relaxed | 为内容阅读表达一种舒缓而悦耳的语气。 | | ||
| newscast | 以正式专业的语气叙述新闻。 | | ||
| newscast-casual | 以通用、随意的语气发布一般新闻。 | | ||
| newscast-formal | 以正式、自信和权威的语气发布新闻。 | | ||
| poetry-reading | 在读诗时表达出带情感和节奏的语气。 | | ||
| sad | 表达悲伤语气。 | | ||
| serious | 表达严肃和命令的语气。说话者的声音通常比较僵硬,节奏也不那么轻松。 | | ||
| shouting | 表达一种听起来好像声音在远处或在另一个地方的语气,努力让别人听清楚。 | | ||
| sports_commentary | 表达一种既轻松又感兴趣的语气,用于播报体育赛事。 | | ||
| sports_commentary_excited | 用快速且充满活力的语气播报体育赛事精彩瞬间。 | | ||
| whispering | 表达一种柔和的语气,试图发出安静而柔和的声音。 | | ||
| terrified | 表达一种害怕的语气,语速快且声音颤抖。听起来说话人处于不稳定的疯狂状态。 | | ||
| unfriendly | 表达一种冷淡无情的语气。 | | ||
|
||
# References | ||
|
||
- ChatTTS: https://github.com/2noise/ChatTTS | ||
- PaddleSpeech: https://github.com/PaddlePaddle/PaddleSpeech |
Binary file not shown.
Binary file not shown.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
id,name,desc,params | ||
0,advertisement_upbeat,builtin style advertisement_upbeat,"{""speed"": 9, ""oral"": 7, ""laugh"": 2, ""break"": 1, ""prompt2"": ""[Ptts][Ptts][Ptts] 用兴奋和精力充沛的语气像在拍广告 [Stts][Ptts][Stts][Ptts][Stts]""}" | ||
1,affectionate,builtin style affectionate,"{""speed"": 6, ""oral"": 8, ""laugh"": 1, ""break"": 2, ""prompt2"": ""[Ptts][Ptts][Ptts] 用温暖亲切的语气像在和亲密的人说话 [Stts][Ptts][Stts][Ptts][Stts]""}" | ||
2,angry,builtin style angry,"{""speed"": 7, ""oral"": 5, ""break"": 3, ""prompt2"": ""[Ptts][Ptts][Ptts] 用生气厌恶的语气像在吵架 [Stts][Ptts][Stts][Ptts][Stts]""}" | ||
3,assistant,builtin style assistant,"{""speed"": 5, ""oral"": 7, ""laugh"": 1, ""break"": 2, ""prompt2"": ""[Ptts][Ptts][Ptts] 用热情轻松的语气像助理 [Stts][Ptts][Stts][Ptts][Stts]""}" | ||
4,calm,builtin style calm,"{""speed"": 0, ""oral"": 4, ""break"": 4, ""prompt2"": ""[Ptts][Ptts][Ptts] 用沉着冷静的语气像在阐述事实 [Stts][Ptts][Stts][Ptts][Stts]""}" | ||
5,chat,builtin style chat,"{""speed"": 6, ""oral"": 9, ""laugh"": 1, ""break"": 2, ""prompt2"": ""[Ptts][Ptts][Ptts] 用轻松随意的语气像朋友聊天 [Stts][Ptts][Stts][Ptts][Stts]""}" | ||
6,cheerful,builtin style cheerful,"{""speed"": 7, ""oral"": 8, ""laugh"": 2, ""break"": 2, ""prompt2"": ""[Ptts][Ptts][Ptts] 用积极向上的语气表达愉快心情 [Stts][Ptts][Stts][Ptts][Stts]""}" | ||
7,customerservice,builtin style customerservice,"{""speed"": 5, ""oral"": 6, ""laugh"": 1, ""break"": 5, ""prompt2"": ""[Ptts][Ptts][Ptts] 用友好热情的语气像客服人员 [Stts][Ptts][Stts][Ptts][Stts]""}" | ||
8,depressed,builtin style depressed,"{""speed"": 2, ""oral"": 3, ""break"": 5, ""prompt2"": ""[Ptts][Ptts][Ptts] 用低沉的语气表达忧郁情绪 [Stts][Ptts][Stts][Ptts][Stts]""}" | ||
9,disgruntled,builtin style disgruntled,"{""speed"": 4, ""oral"": 5, ""break"": 4, ""prompt2"": ""[Ptts][Ptts][Ptts] 用轻蔑抱怨的语气表达不满 [Stts][Ptts][Stts][Ptts][Stts]""}" | ||
10,documentary-narration,builtin style documentary-narration,"{""speed"": 4, ""oral"": 4, ""break"": 3, ""prompt2"": ""[Ptts][Ptts][Ptts] 用轻松感兴趣的语气像纪录片解说 [Stts][Ptts][Stts][Ptts][Stts]""}" | ||
11,embarrassed,builtin style embarrassed,"{""speed"": 3, ""oral"": 3, ""break"": 5, ""prompt2"": ""[Ptts][Ptts][Ptts] 用犹豫吞吐的语气表达尴尬 [Stts][Ptts][Stts][Ptts][Stts]""}" | ||
12,empathetic,builtin style empathetic,"{""speed"": 5, ""oral"": 6, ""laugh"": 1, ""break"": 3, ""prompt2"": ""[Ptts][Ptts][Ptts] 用温柔的语气表达关心和理解 [Stts][Ptts][Stts][Ptts][Stts]""}" | ||
13,envious,builtin style envious,"{""speed"": 5, ""oral"": 5, ""laugh"": 1, ""break"": 2, ""prompt2"": ""[Ptts][Ptts][Ptts] 用羡慕的语气像看到别人拥有自己想要的东西 [Stts][Ptts][Stts][Ptts][Stts]""}" | ||
14,excited,builtin style excited,"{""speed"": 8, ""oral"": 7, ""laugh"": 2, ""break"": 1, ""prompt2"": ""[Ptts][Ptts][Ptts] 用欢快语气表达遇到好事 [Stts][Ptts][Stts][Ptts][Stts]""}" | ||
15,fearful,builtin style fearful,"{""speed"": 7, ""oral"": 6, ""break"": 2, ""prompt2"": ""[Ptts][Ptts][Ptts] 用紧张语气表达恐惧像遇到危险 [Stts][Ptts][Stts][Ptts][Stts]""}" | ||
16,friendly,builtin style friendly,"{""speed"": 6, ""oral"": 7, ""laugh"": 1, ""break"": 2, ""prompt2"": ""[Ptts][Ptts][Ptts] 用亲切友好的语气像朋友 [Stts][Ptts][Stts][Ptts][Stts]""}" | ||
17,gentle,builtin style gentle,"{""speed"": 4, ""oral"": 5, ""laugh"": 1, ""break"": 3, ""prompt2"": ""[Ptts][Ptts][Ptts] 用温柔语气像在哄小孩 [Stts][Ptts][Stts][Ptts][Stts]""}" | ||
18,hopeful,builtin style hopeful,"{""speed"": 5, ""oral"": 6, ""laugh"": 1, ""break"": 4, ""prompt2"": ""[Ptts][Ptts][Ptts] 用温暖的语气带着期待 [Stts][Ptts][Stts][Ptts][Stts]""}" | ||
19,lyrical,builtin style lyrical,"{""speed"": 3, ""oral"": 4, ""break"": 4, ""prompt2"": ""[Ptts][Ptts][Ptts] 用优美语气带着淡淡的忧伤 [Stts][Ptts][Stts][Ptts][Stts]""}" | ||
20,narration-professional,builtin style narration-professional,"{""speed"": 5, ""oral"": 3, ""break"": 3, ""prompt2"": ""[Ptts][Ptts][Ptts] 用沉稳语气像专业播音员 [Stts][Ptts][Stts][Ptts][Stts]""}" | ||
21,narration-relaxed,builtin style narration-relaxed,"{""speed"": 2, ""oral"": 4, ""break"": 3, ""prompt2"": ""[Ptts][Ptts][Ptts] 用舒缓语气像在听睡前故事 [Stts][Ptts][Stts][Ptts][Stts]""}" | ||
22,newscast,builtin style newscast,"{""speed"": 2, ""oral"": 4, ""break"": 2, ""prompt2"": ""[Ptts][Ptts][Ptts] 用正式严肃语气播报新闻 [Stts][Ptts][Stts][Ptts][Stts]""}" | ||
23,newscast-casual,builtin style newscast-casual,"{""speed"": 2, ""oral"": 5, ""break"": 2, ""prompt2"": ""[Ptts][Ptts][Ptts] 用轻松自然语气分享新闻 [Stts][Ptts][Stts][Ptts][Stts]""}" | ||
24,newscast-formal,builtin style newscast-formal,"{""speed"": 2, ""oral"": 3, ""break"": 2, ""prompt2"": ""[Ptts][Ptts][Ptts] 用正式自信语气播报重要新闻 [Stts][Ptts][Stts][Ptts][Stts]""}" | ||
25,poetry-reading,builtin style poetry-reading,"{""speed"": 4, ""oral"": 5, ""break"": 3, ""prompt2"": ""[Ptts][Ptts][Ptts] 用语气调整表达诗歌情感注意节奏韵律 [Stts][Ptts][Stts][Ptts][Stts]""}" | ||
26,sad,builtin style sad,"{""speed"": 0, ""oral"": 3, ""break"": 5, ""prompt2"": ""[Ptts][Ptts][Ptts] 用悲伤低沉语气诉说悲伤事情 [Stts][Ptts][Stts][Ptts][Stts]""}" | ||
27,serious,builtin style serious,"{""speed"": 3, ""oral"": 3, ""break"": 4, ""prompt2"": ""[Ptts][Ptts][Ptts] 用严肃坚定语气像发布命令 [Stts][Ptts][Stts][Ptts][Stts]""}" | ||
28,shouting,builtin style shouting,"{""speed"": 0, ""oral"": 6, ""break"": 1, ""prompt2"": ""[Ptts] 用提高音量的语气像在远处喊叫 [Stts][Ptts][Stts][Ptts][Stts]""}" | ||
29,sports_commentary,builtin style sports_commentary,"{""speed"": 6, ""oral"": 7, ""laugh"": 1, ""break"": 2, ""prompt2"": ""[Ptts][Ptts][Ptts] 用轻松激情语气解说体育比赛 [Stts][Ptts][Stts][Ptts][Stts]""}" | ||
30,sports_commentary_excited,builtin style sports_commentary_excited,"{""speed"": 8, ""oral"": 8, ""laugh"": 2, ""break"": 1, ""prompt2"": ""[Ptts][Ptts][Ptts] 用激动活力语气解说精彩瞬间 [Stts][Ptts][Stts][Ptts][Stts]""}" | ||
31,whispering,builtin style whispering,"{""speed"": 2, ""oral"": 3, ""break"": 4, ""prompt2"": ""[Ptts][Ptts][Ptts] 用气声低音量语气像说悄悄话 [Stts][Ptts][Stts][Ptts][Stts]""}" | ||
32,terrified,builtin style terrified,"{""speed"": 7, ""oral"": 7, ""break"": 1, ""prompt2"": ""[Ptts][Ptts][Ptts] 用恐惧颤抖语气像遇到可怕事情 [Stts][Ptts][Stts][Ptts][Stts]""}" | ||
33,unfriendly,builtin style unfriendly,"{""speed"": 4, ""oral"": 2, ""break"": 3, ""prompt2"": ""[Ptts][Ptts][Ptts] 用冷淡无感情语气像对待陌生人 [Stts][Ptts][Stts][Ptts][Stts]""}" |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,77 @@ | ||
from modules import config | ||
from modules import generate_audio as generate | ||
from modules.api import api | ||
|
||
from functools import lru_cache | ||
from typing import Callable | ||
|
||
|
||
def conditional_cache(condition: Callable): | ||
def decorator(func): | ||
@lru_cache(None) | ||
def cached_func(*args, **kwargs): | ||
return func(*args, **kwargs) | ||
|
||
def wrapper(*args, **kwargs): | ||
if condition(*args, **kwargs): | ||
return cached_func(*args, **kwargs) | ||
else: | ||
return func(*args, **kwargs) | ||
|
||
return wrapper | ||
|
||
return decorator | ||
|
||
|
||
if __name__ == "__main__": | ||
import argparse | ||
import uvicorn | ||
|
||
parser = argparse.ArgumentParser( | ||
description="Start the FastAPI server with command line arguments" | ||
) | ||
parser.add_argument( | ||
"--host", type=str, default="0.0.0.0", help="Host to run the server on" | ||
) | ||
parser.add_argument( | ||
"--port", type=int, default=8000, help="Port to run the server on" | ||
) | ||
parser.add_argument( | ||
"--reload", action="store_true", help="Enable auto-reload for development" | ||
) | ||
parser.add_argument("--compile", action="store_true", help="Enable model compile") | ||
parser.add_argument( | ||
"--lru_size", | ||
type=int, | ||
default=64, | ||
help="Set the size of the request cache pool, set it to 0 will disable lru_cache", | ||
) | ||
parser.add_argument( | ||
"--cors_origin", | ||
type=str, | ||
default="*", | ||
help="Allowed CORS origins. Use '*' to allow all origins.", | ||
) | ||
|
||
args = parser.parse_args() | ||
|
||
config.args = args | ||
|
||
if args.compile: | ||
print("Model compile is enabled") | ||
config.enable_model_compile = True | ||
|
||
def should_cache(*args, **kwargs): | ||
spk_seed = kwargs.get("spk_seed", -1) | ||
infer_seed = kwargs.get("infer_seed", -1) | ||
return spk_seed != -1 and infer_seed != -1 | ||
|
||
if args.lru_size > 0: | ||
config.lru_size = args.lru_size | ||
generate.generate_audio = conditional_cache(should_cache)( | ||
generate.generate_audio | ||
) | ||
|
||
api.set_cors() | ||
|
||
uvicorn.run(api.app, host=args.host, port=args.port, reload=args.reload) |
Empty file.
Empty file.
Oops, something went wrong.