💡 TTS 小样本 finetune / 声音克隆问题汇总 #2456

yt605155624 · 2022-09-26T10:23:18Z

如果 12 句 finetune 效果不佳，一般是因为数据集太小了，建议增加数据集，一般是 300 ~ 600 条，数据量和质量越好，合成的效果越好
数据的质量要求没有混响，没有杂音，离麦克风距离适中，具体可以参考标贝的数据质量。
finetune 出来的音色与目标说话人和原始说话人的相似度有关，即目标说话人和原始说话人相似度越高，finetune 出来的音色更接近目标说话人。
finetune 出来的音频质量与原始说话人的音频质量有关，原始说话人的音频质量不好，finetune 出来的效果也可能不好。
综上，finetune 方案在数据采集，选择原始说话人上需要好好选择。

小样本 finetune 原理参考关于训练一个自己的TTS模型

❣️ [TTS] MFA 报错 No such file or directory: "xx/xx/xx/train/mfcc/raw_mfcc.0.scp" #2437
[TTS]小样本 finetune 时，batch_size 要 <= 样本数，否则会报错 #2454
请问自己 finetune 的 tts 模型能够改变语速吗？ #2383
预处理都没有问题，为什么不跑训练流程？-> epoch 的设置有问题，参考：向 aishell3 里添加自己的音频数据进行训练 #2319 (comment)
TTS Finetune / TTS3对multi-speaker数据进行微调 #2442
使用ecapa-tdnn进行语音克隆报错 #2471 -> 安装 develop 版本的 paddlespeech
请教语音克隆，音质优化的方向 #2245
ImportError: cannot import name 'norm' from 'paddlespeech.t2s.exps.syn_utils' (/opt/conda/envs/paddlespeech/lib/python3.7/site-packages/paddlespeech/t2s/exps/syn_utils.py) #2485 -> 安装 develop 版本的 paddlespeech
声音克隆单句话克隆效果很差 #2583 -> 推荐使用 finetune 方案
为什么使用ERNIE-SAT声音克隆，从中文生成的英文语音完全听不懂？ #2586
[TTS]一键微调功能问题 #2607
关于小样本微调测试的报错【This dataset has no examples】 #2790
如何通过自己训练的单人speaker的fastspeech2模型进行二次微调，并且加入到模型中，且可以通过id索引的方式进行推理不同音色 #2953

UserName-wang · 2022-10-20T13:21:56Z

./run.sh --stage 0 --stop-stage 5
check oov
get mfa result
sh: 1: mfa_align: Exec format error
generate durations.txt
extract feature
[nltk_data] Error loading averaged_perceptron_tagger: <urlopen error
[nltk_data] [Errno 111] Connection refused>
[nltk_data] Error loading cmudict: <urlopen error [Errno 111]
[nltk_data] Connection refused>
196 1
100%|███████████████████████████████████████████████████████████████████████████████████| 196/196 [00:00<00:00, 5146.26it/s]
Done
Traceback (most recent call last):
File "local/extract_feature.py", line 346, in
extract_feature(
File "local/extract_feature.py", line 266, in extract_feature
normalize(speech_scaler, pitch_scaler, energy_scaler, vocab_phones,
File "local/extract_feature.py", line 155, in normalize
dataset = DataTable(
File "/home/nx/study/python/Paddle24/PaddleSpeech/paddlespeech/t2s/datasets/data_table.py", line 45, in init
assert len(data) > 0, "This dataset has no examples"
AssertionError: This dataset has no examples

The code in File "/home/nx/study/python/Paddle24/PaddleSpeech/paddlespeech/t2s/datasets/data_table.py", line 45:
self.data = data
assert len(data) > 0, "This dataset has no examples"

yt605155624 · 2022-10-20T14:42:15Z

@UserName-wang follow https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md to download nltk_data to your ${HOME}

exceedzhang · 2023-02-17T03:26:25Z

按照PaddleSpeech/examples/other/tts_finetune/tts3 进行小样本训练

运行run_mix.sh提示如下错误：
root@autodl-container-9db311a83c-4d0bf061:~/autodl-tmp/PaddleSpeech/examples/other/tts_finetune/tts3# ./run_mix.sh
check oov
get mfa result
align.py:60: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
Setting up corpus information...
Number of speakers in corpus: 1, average number of utterances per speaker: 12.0
/root/autodl-tmp/PaddleSpeech/examples/other/tts_finetune/tts3/tools/montreal-forced-aligner/lib/aligner/models.py:87: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
Creating dictionary information...
Setting up training data...
Calculating MFCCs...
Calculating CMVN...
Number of speakers in corpus: 1, average number of utterances per speaker: 12.0
Done with setup.
100%|########################################################################################################| 2/2 [00:02<00:00, 1.01s/it]
Done! Everything took 6.651328802108765 seconds
generate durations.txt
Traceback (most recent call last):
File "local/generate_duration.py", line 38, in
gen_duration_from_textgrid(mfa_dir, duration_file, fs, n_shift)
File "/root/autodl-tmp/PaddleSpeech/utils/gen_duration_from_textgrid.py", line 76, in gen_duration_from_textgrid
durations_dict[name] = (speaker, readtg(
File "/root/autodl-tmp/PaddleSpeech/utils/gen_duration_from_textgrid.py", line 29, in readtg
for interval in alignment.tierDict["phones"].entryList:
AttributeError: 'Textgrid' object has no attribute 'tierDict'

使用Python 3.8版本

zhouzyc · 2023-02-17T08:49:45Z

按照PaddleSpeech/examples/other/tts_finetune/tts3 进行小样本训练

运行run_mix.sh提示如下错误： root@autodl-container-9db311a83c-4d0bf061:~/autodl-tmp/PaddleSpeech/examples/other/tts_finetune/tts3# ./run_mix.sh check oov get mfa result align.py:60: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details. Setting up corpus information... Number of speakers in corpus: 1, average number of utterances per speaker: 12.0 /root/autodl-tmp/PaddleSpeech/examples/other/tts_finetune/tts3/tools/montreal-forced-aligner/lib/aligner/models.py:87: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details. Creating dictionary information... Setting up training data... Calculating MFCCs... Calculating CMVN... Number of speakers in corpus: 1, average number of utterances per speaker: 12.0 Done with setup. 100%|########################################################################################################| 2/2 [00:02<00:00, 1.01s/it] Done! Everything took 6.651328802108765 seconds generate durations.txt Traceback (most recent call last): File "local/generate_duration.py", line 38, in gen_duration_from_textgrid(mfa_dir, duration_file, fs, n_shift) File "/root/autodl-tmp/PaddleSpeech/utils/gen_duration_from_textgrid.py", line 76, in gen_duration_from_textgrid durations_dict[name] = (speaker, readtg( File "/root/autodl-tmp/PaddleSpeech/utils/gen_duration_from_textgrid.py", line 29, in readtg for interval in alignment.tierDict["phones"].entryList: AttributeError: 'Textgrid' object has no attribute 'tierDict'

使用Python 3.8版本

我用时的3.7.9一样问题，请问解决了把，ubuntu22

maize-j · 2023-02-24T03:34:51Z

按照PaddleSpeech/examples/other/tts_finetune/tts3 进行小样本训练
运行run_mix.sh提示如下错误： root@autodl-container-9db311a83c-4d0bf061:~/autodl-tmp/PaddleSpeech/examples/other/tts_finetune/tts3# ./run_mix.sh check oov get mfa result align.py:60: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details. Setting up corpus information... Number of speakers in corpus: 1, average number of utterances per speaker: 12.0 /root/autodl-tmp/PaddleSpeech/examples/other/tts_finetune/tts3/tools/montreal-forced-aligner/lib/aligner/models.py:87: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details. Creating dictionary information... Setting up training data... Calculating MFCCs... Calculating CMVN... Number of speakers in corpus: 1, average number of utterances per speaker: 12.0 Done with setup. 100%|########################################################################################################| 2/2 [00:02<00:00, 1.01s/it] Done! Everything took 6.651328802108765 seconds generate durations.txt Traceback (most recent call last): File "local/generate_duration.py", line 38, in gen_duration_from_textgrid(mfa_dir, duration_file, fs, n_shift) File "/root/autodl-tmp/PaddleSpeech/utils/gen_duration_from_textgrid.py", line 76, in gen_duration_from_textgrid durations_dict[name] = (speaker, readtg( File "/root/autodl-tmp/PaddleSpeech/utils/gen_duration_from_textgrid.py", line 29, in readtg for interval in alignment.tierDict["phones"].entryList: AttributeError: 'Textgrid' object has no attribute 'tierDict'
使用Python 3.8版本

我用时的3.7.9一样问题，请问解决了把，ubuntu22

看下praatio的版本是不是5.0.0

yt605155624 · 2023-02-28T07:13:48Z

@zhouzyc @maize-j 可能是 praatio 的不兼容升级导致的 https://github.com/timmahrt/praatIO/blob/main/UPGRADING.md#version-5-to-6-migration

maize-j · 2023-02-28T07:26:15Z

@zhouzyc @maize-j 可能是 praatio 的不兼容升级导致的 https://github.com/timmahrt/praatIO/blob/main/UPGRADING.md#version-5-to-6-migration

是的，现在安装的时候praatio默认是6.0.0，版本没有向下兼容，就会出现这个问题，改回5.0.0就好了

yt605155624 · 2023-02-28T07:37:15Z

@zhouzyc @maize-j 可能是 praatio 的不兼容升级导致的 https://github.com/timmahrt/praatIO/blob/main/UPGRADING.md#version-5-to-6-migration

fixed by #2970

Rapheal-Madfrog · 2023-06-09T03:15:54Z

在docker里，get_frontend有一步是下载文件，589MB的，估计是bert的ckpt吧，每次进镜像都要重新下载，项目里实在是没找到相关代码，请问这个589m的文件是从哪里下的，有什么作用，放到哪里？我好本地下载一下，挂载进去，不要再每次都下载了。。

Rapheal-Madfrog · 2023-06-12T06:55:50Z

在docker里，get_frontend有一步是下载文件，589MB的，估计是bert的ckpt吧，每次进镜像都要重新下载，项目里实在是没找到相关代码，请问这个589m的文件是从哪里下的，有什么作用，放到哪里？我好本地下载一下，挂载进去，不要再每次都下载了。。

已解决，挂载docker里/root/下的三个文件夹，nltk_data、.paddlenlp、.paddlespeech
这个589MB的是G2PWModel_1.1.zip，不可只保留G2PWModel_1.1/删zip，删了会重下。。。

joisonwk · 2023-06-25T16:43:41Z

./run.sh --stage 0 --stop-stage 5
check oov
get mfa result
Setting up corpus information...
Number of speakers in corpus: 1, average number of utterances per speaker: 688.0
Creating dictionary information...
Setting up corpus_data directory...
Generating base features (mfcc)...
Calculating CMVN...
Done with setup.
There were 1 segments/files not aligned. Please see ./mfa_result/unaligned.txt for more details on why alignment failed for these files.
Done! Everything took 53.481459617614746 seconds
generate durations.txt
extract feature
686 1
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 686/686 [00:00<00:00, 8198.77it/s]Done
Traceback (most recent call last):
File "local/extract_feature.py", line 346, in
extract_feature(
File "local/extract_feature.py", line 266, in extract_feature
normalize(speech_scaler, pitch_scaler, energy_scaler, vocab_phones,
File "local/extract_feature.py", line 155, in normalize
dataset = DataTable(
File "/mnt/d/voice/PaddleSpeech/paddlespeech/t2s/datasets/data_table.py", line 47, in init
assert len(data) > 0, "This dataset has no examples"
AssertionError: This dataset has no examples

(venv) ant@DESKTOP-MEKU9AN:/mnt/d/voice/PaddleSpeech/examples/other/tts_finetune/tts3$ ls ~/nltk_data/
corpora taggers

@UserName-wang follow https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md to download nltk_data to your ${HOME}

我这个已经下载nltk_data到home目录了还是提示这个错误，是什么原因呢？

yt605155624 added Bug T2S labels Sep 26, 2022

yt605155624 assigned yt605155624 and lym0302 Sep 26, 2022

yt605155624 removed the Bug label Sep 26, 2022

yt605155624 mentioned this issue Sep 27, 2022

[TTS]使用speech_web的小数据集微调报错 #2461

Closed

yt605155624 changed the title ~~[TTS]小样本 finetune 问题汇总~~ [TTS]小样本 finetune / 声音克隆问题汇总 Sep 28, 2022

yt605155624 pinned this issue Sep 28, 2022

yt605155624 added this to the r1.2.0 milestone Sep 29, 2022

yt605155624 changed the title ~~[TTS]小样本 finetune / 声音克隆问题汇总~~ 💡[TTS]小样本 finetune / 声音克隆问题汇总 Sep 30, 2022

yt605155624 changed the title ~~💡[TTS]小样本 finetune / 声音克隆问题汇总~~ 💡 TTS 小样本 finetune / 声音克隆问题汇总 Sep 30, 2022

yt605155624 mentioned this issue Oct 24, 2022

❣️❣️【🔝长期置顶】常见使用问题合集（总入口）❣️❣️ #2576

Open

stale bot added the Stale label Dec 21, 2022

yt605155624 removed the Stale label Dec 22, 2022

PaddlePaddle deleted a comment from stale bot Dec 22, 2022

peiqianggao mentioned this issue Jan 2, 2023

关于小样本微调测试的报错【This dataset has no examples】 #2790

Closed

yt605155624 mentioned this issue Feb 28, 2023

[Install]fix praatio's version because praatio==6.0.0 has incompatible upgrade #2970

Merged

yt605155624 closed this as completed in #2970 Feb 28, 2023

yt605155624 reopened this Mar 1, 2023

yt605155624 added the Tips label Mar 2, 2023

jzhang533 unpinned this issue Feb 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

💡 TTS 小样本 finetune / 声音克隆问题汇总 #2456

💡 TTS 小样本 finetune / 声音克隆问题汇总 #2456

yt605155624 commented Sep 26, 2022 •

edited by lym0302

Loading

UserName-wang commented Oct 20, 2022

yt605155624 commented Oct 20, 2022

exceedzhang commented Feb 17, 2023

zhouzyc commented Feb 17, 2023

maize-j commented Feb 24, 2023

yt605155624 commented Feb 28, 2023

maize-j commented Feb 28, 2023

yt605155624 commented Feb 28, 2023

Rapheal-Madfrog commented Jun 9, 2023

Rapheal-Madfrog commented Jun 12, 2023

joisonwk commented Jun 25, 2023

💡 TTS 小样本 finetune / 声音克隆问题汇总 #2456

💡 TTS 小样本 finetune / 声音克隆问题汇总 #2456

Comments

yt605155624 commented Sep 26, 2022 • edited by lym0302 Loading

UserName-wang commented Oct 20, 2022

yt605155624 commented Oct 20, 2022

exceedzhang commented Feb 17, 2023

zhouzyc commented Feb 17, 2023

maize-j commented Feb 24, 2023

yt605155624 commented Feb 28, 2023

maize-j commented Feb 28, 2023

yt605155624 commented Feb 28, 2023

Rapheal-Madfrog commented Jun 9, 2023

Rapheal-Madfrog commented Jun 12, 2023

joisonwk commented Jun 25, 2023

yt605155624 commented Sep 26, 2022 •

edited by lym0302

Loading