Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

💡 TTS 小样本 finetune / 声音克隆问题汇总 #2456

Open
yt605155624 opened this issue Sep 26, 2022 · 11 comments · Fixed by #2970
Open

💡 TTS 小样本 finetune / 声音克隆问题汇总 #2456

yt605155624 opened this issue Sep 26, 2022 · 11 comments · Fixed by #2970
Assignees
Milestone

Comments

@yt605155624
Copy link
Collaborator

yt605155624 commented Sep 26, 2022

如果 12 句 finetune 效果不佳,一般是因为数据集太小了,建议增加数据集,一般是 300 ~ 600 条,数据量和质量越好,合成的效果越好
数据的质量要求没有混响,没有杂音,离麦克风距离适中,具体可以参考标贝的数据质量。
finetune 出来的音色与 目标说话人和原始说话人的相似度有关,即目标说话人和原始说话人相似度越高,finetune 出来的音色更接近目标说话人。
finetune 出来的音频质量与原始说话人的音频质量有关,原始说话人的音频质量不好,finetune 出来的效果也可能不好。
综上,finetune 方案在数据采集,选择原始说话人上需要好好选择。

小样本 finetune 原理参考 关于训练一个自己的TTS模型

  1. ❣️ [TTS] MFA 报错 No such file or directory: "xx/xx/xx/train/mfcc/raw_mfcc.0.scp" #2437
  2. [TTS]小样本 finetune 时,batch_size 要 <= 样本数,否则会报错  #2454
  3. 请问自己 finetune 的 tts 模型能够改变语速吗? #2383
  4. 预处理都没有问题,为什么不跑训练流程?-> epoch 的设置有问题,参考: 向 aishell3 里添加自己的音频数据进行训练 #2319 (comment)
  5. TTS Finetune / TTS3对multi-speaker数据进行微调 #2442
  6. 使用ecapa-tdnn进行语音克隆报错 #2471 -> 安装 develop 版本的 paddlespeech
  7. 请教语音克隆,音质优化的方向 #2245
  8. ImportError: cannot import name 'norm' from 'paddlespeech.t2s.exps.syn_utils' (/opt/conda/envs/paddlespeech/lib/python3.7/site-packages/paddlespeech/t2s/exps/syn_utils.py) #2485 -> 安装 develop 版本的 paddlespeech
  9. 声音克隆单句话克隆效果很差 #2583 -> 推荐使用 finetune 方案
  10. 为什么使用ERNIE-SAT声音克隆,从中文生成的英文语音完全听不懂? #2586
  11. [TTS]一键微调功能问题 #2607
  12. 关于小样本微调测试的报错【This dataset has no examples】 #2790
  13. 如何通过自己训练的单人speaker的fastspeech2模型进行二次微调,并且加入到模型中,且可以通过id索引的方式进行推理不同音色 #2953
@yt605155624 yt605155624 removed the Bug label Sep 26, 2022
@yt605155624 yt605155624 changed the title [TTS]小样本 finetune 问题汇总 [TTS]小样本 finetune / 声音克隆问题汇总 Sep 28, 2022
@yt605155624 yt605155624 pinned this issue Sep 28, 2022
@yt605155624 yt605155624 added this to the r1.2.0 milestone Sep 29, 2022
@yt605155624 yt605155624 changed the title [TTS]小样本 finetune / 声音克隆问题汇总 💡[TTS]小样本 finetune / 声音克隆问题汇总 Sep 30, 2022
@yt605155624 yt605155624 changed the title 💡[TTS]小样本 finetune / 声音克隆问题汇总 💡 TTS 小样本 finetune / 声音克隆问题汇总 Sep 30, 2022
@UserName-wang
Copy link

./run.sh --stage 0 --stop-stage 5
check oov
get mfa result
sh: 1: mfa_align: Exec format error
generate durations.txt
extract feature
[nltk_data] Error loading averaged_perceptron_tagger: <urlopen error
[nltk_data] [Errno 111] Connection refused>
[nltk_data] Error loading cmudict: <urlopen error [Errno 111]
[nltk_data] Connection refused>
196 1
100%|███████████████████████████████████████████████████████████████████████████████████| 196/196 [00:00<00:00, 5146.26it/s]
Done
Traceback (most recent call last):
File "local/extract_feature.py", line 346, in
extract_feature(
File "local/extract_feature.py", line 266, in extract_feature
normalize(speech_scaler, pitch_scaler, energy_scaler, vocab_phones,
File "local/extract_feature.py", line 155, in normalize
dataset = DataTable(
File "/home/nx/study/python/Paddle24/PaddleSpeech/paddlespeech/t2s/datasets/data_table.py", line 45, in init
assert len(data) > 0, "This dataset has no examples"
AssertionError: This dataset has no examples

The code in File "/home/nx/study/python/Paddle24/PaddleSpeech/paddlespeech/t2s/datasets/data_table.py", line 45:
self.data = data
assert len(data) > 0, "This dataset has no examples"

@yt605155624
Copy link
Collaborator Author

@UserName-wang follow https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md to download nltk_data to your ${HOME}

@exceedzhang
Copy link

按照PaddleSpeech/examples/other/tts_finetune/tts3 进行小样本训练

运行run_mix.sh提示如下错误:
root@autodl-container-9db311a83c-4d0bf061:~/autodl-tmp/PaddleSpeech/examples/other/tts_finetune/tts3# ./run_mix.sh
check oov
get mfa result
align.py:60: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
Setting up corpus information...
Number of speakers in corpus: 1, average number of utterances per speaker: 12.0
/root/autodl-tmp/PaddleSpeech/examples/other/tts_finetune/tts3/tools/montreal-forced-aligner/lib/aligner/models.py:87: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
Creating dictionary information...
Setting up training data...
Calculating MFCCs...
Calculating CMVN...
Number of speakers in corpus: 1, average number of utterances per speaker: 12.0
Done with setup.
100%|########################################################################################################| 2/2 [00:02<00:00, 1.01s/it]
Done! Everything took 6.651328802108765 seconds
generate durations.txt
Traceback (most recent call last):
File "local/generate_duration.py", line 38, in
gen_duration_from_textgrid(mfa_dir, duration_file, fs, n_shift)
File "/root/autodl-tmp/PaddleSpeech/utils/gen_duration_from_textgrid.py", line 76, in gen_duration_from_textgrid
durations_dict[name] = (speaker, readtg(
File "/root/autodl-tmp/PaddleSpeech/utils/gen_duration_from_textgrid.py", line 29, in readtg
for interval in alignment.tierDict["phones"].entryList:
AttributeError: 'Textgrid' object has no attribute 'tierDict'

使用Python 3.8版本

@zhouzyc
Copy link

zhouzyc commented Feb 17, 2023

按照PaddleSpeech/examples/other/tts_finetune/tts3 进行小样本训练

运行run_mix.sh提示如下错误: root@autodl-container-9db311a83c-4d0bf061:~/autodl-tmp/PaddleSpeech/examples/other/tts_finetune/tts3# ./run_mix.sh check oov get mfa result align.py:60: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details. Setting up corpus information... Number of speakers in corpus: 1, average number of utterances per speaker: 12.0 /root/autodl-tmp/PaddleSpeech/examples/other/tts_finetune/tts3/tools/montreal-forced-aligner/lib/aligner/models.py:87: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details. Creating dictionary information... Setting up training data... Calculating MFCCs... Calculating CMVN... Number of speakers in corpus: 1, average number of utterances per speaker: 12.0 Done with setup. 100%|########################################################################################################| 2/2 [00:02<00:00, 1.01s/it] Done! Everything took 6.651328802108765 seconds generate durations.txt Traceback (most recent call last): File "local/generate_duration.py", line 38, in gen_duration_from_textgrid(mfa_dir, duration_file, fs, n_shift) File "/root/autodl-tmp/PaddleSpeech/utils/gen_duration_from_textgrid.py", line 76, in gen_duration_from_textgrid durations_dict[name] = (speaker, readtg( File "/root/autodl-tmp/PaddleSpeech/utils/gen_duration_from_textgrid.py", line 29, in readtg for interval in alignment.tierDict["phones"].entryList: AttributeError: 'Textgrid' object has no attribute 'tierDict'

使用Python 3.8版本

我用时的3.7.9一样问题,请问解决了把,ubuntu22
微信截图_20230217164927

@maize-j
Copy link

maize-j commented Feb 24, 2023

按照PaddleSpeech/examples/other/tts_finetune/tts3 进行小样本训练
运行run_mix.sh提示如下错误: root@autodl-container-9db311a83c-4d0bf061:~/autodl-tmp/PaddleSpeech/examples/other/tts_finetune/tts3# ./run_mix.sh check oov get mfa result align.py:60: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details. Setting up corpus information... Number of speakers in corpus: 1, average number of utterances per speaker: 12.0 /root/autodl-tmp/PaddleSpeech/examples/other/tts_finetune/tts3/tools/montreal-forced-aligner/lib/aligner/models.py:87: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details. Creating dictionary information... Setting up training data... Calculating MFCCs... Calculating CMVN... Number of speakers in corpus: 1, average number of utterances per speaker: 12.0 Done with setup. 100%|########################################################################################################| 2/2 [00:02<00:00, 1.01s/it] Done! Everything took 6.651328802108765 seconds generate durations.txt Traceback (most recent call last): File "local/generate_duration.py", line 38, in gen_duration_from_textgrid(mfa_dir, duration_file, fs, n_shift) File "/root/autodl-tmp/PaddleSpeech/utils/gen_duration_from_textgrid.py", line 76, in gen_duration_from_textgrid durations_dict[name] = (speaker, readtg( File "/root/autodl-tmp/PaddleSpeech/utils/gen_duration_from_textgrid.py", line 29, in readtg for interval in alignment.tierDict["phones"].entryList: AttributeError: 'Textgrid' object has no attribute 'tierDict'
使用Python 3.8版本

我用时的3.7.9一样问题,请问解决了把,ubuntu22 微信截图_20230217164927

看下praatio的版本是不是5.0.0

@yt605155624
Copy link
Collaborator Author

@maize-j
Copy link

maize-j commented Feb 28, 2023

@zhouzyc @maize-j 可能是 praatio 的不兼容升级导致的 https://github.com/timmahrt/praatIO/blob/main/UPGRADING.md#version-5-to-6-migration

是的,现在安装的时候praatio默认是6.0.0,版本没有向下兼容,就会出现这个问题,改回5.0.0就好了

@yt605155624
Copy link
Collaborator Author

@zhouzyc @maize-j 可能是 praatio 的不兼容升级导致的 https://github.com/timmahrt/praatIO/blob/main/UPGRADING.md#version-5-to-6-migration

fixed by #2970

@Rapheal-Madfrog
Copy link

在docker里,get_frontend有一步是下载文件,589MB的,估计是bert的ckpt吧,每次进镜像都要重新下载,项目里实在是没找到相关代码,请问这个589m的文件是从哪里下的,有什么作用,放到哪里?我好本地下载一下,挂载进去,不要再每次都下载了。。

@Rapheal-Madfrog
Copy link

在docker里,get_frontend有一步是下载文件,589MB的,估计是bert的ckpt吧,每次进镜像都要重新下载,项目里实在是没找到相关代码,请问这个589m的文件是从哪里下的,有什么作用,放到哪里?我好本地下载一下,挂载进去,不要再每次都下载了。。

已解决,挂载docker里/root/下的三个文件夹,nltk_data、.paddlenlp、.paddlespeech
这个589MB的是G2PWModel_1.1.zip,不可只保留G2PWModel_1.1/删zip,删了会重下。。。

@joisonwk
Copy link

./run.sh --stage 0 --stop-stage 5
check oov
get mfa result
Setting up corpus information...
Number of speakers in corpus: 1, average number of utterances per speaker: 688.0
Creating dictionary information...
Setting up corpus_data directory...
Generating base features (mfcc)...
Calculating CMVN...
Done with setup.
There were 1 segments/files not aligned. Please see ./mfa_result/unaligned.txt for more details on why alignment failed for these files.
Done! Everything took 53.481459617614746 seconds
generate durations.txt
extract feature
686 1
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 686/686 [00:00<00:00, 8198.77it/s]Done
Traceback (most recent call last):
File "local/extract_feature.py", line 346, in
extract_feature(
File "local/extract_feature.py", line 266, in extract_feature
normalize(speech_scaler, pitch_scaler, energy_scaler, vocab_phones,
File "local/extract_feature.py", line 155, in normalize
dataset = DataTable(
File "/mnt/d/voice/PaddleSpeech/paddlespeech/t2s/datasets/data_table.py", line 47, in init
assert len(data) > 0, "This dataset has no examples"
AssertionError: This dataset has no examples

(venv) ant@DESKTOP-MEKU9AN:/mnt/d/voice/PaddleSpeech/examples/other/tts_finetune/tts3$ ls ~/nltk_data/
corpora taggers

@UserName-wang follow https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md to download nltk_data to your ${HOME}

我这个已经下载nltk_data到home目录了还是提示这个错误,是什么原因呢?

@jzhang533 jzhang533 unpinned this issue Feb 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
8 participants