Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TTS Finetune / TTS3对multi-speaker数据进行微调 #2442

Closed
dc3ea9f opened this issue Sep 23, 2022 · 13 comments
Closed

TTS Finetune / TTS3对multi-speaker数据进行微调 #2442

dc3ea9f opened this issue Sep 23, 2022 · 13 comments
Assignees

Comments

@dc3ea9f
Copy link

dc3ea9f commented Sep 23, 2022

您好,我在使用examples/other/tts_finetune/tts3(commit_id 863609) finetune自己的数据集时遇到了问题:

example只提供了在csmsc_mini single-speaker上finetune的tutorial,但是对于tune on multi speaker dataset仍然是不可用的

为了finetune on multi speaker dataset,我尝试通过MFA align来获取音素的duration,但使用./tools/montreal-forced-aligner/bin/mfa_align时会有一部分文件无法生成TextGrid结果,查看log显示:

WARNING (gmm-align-compiled[5.4.247~1-2148]:main():gmm-align-compiled.cc:103) No features for utterance 000xxx

而后,我尝试使用latest MFA (from conda)与repo中提供的字典和AM提取音素duration,可正常生成结果,但是在训练一些step后会产生维度匹配错误,我想咨询下我的处理流程是否有问题?为什么在训练过程中会有bug?如果想保证训练过程的正常进行,应如何修改?

multiple speaker fastspeech2!
spk_num: 174
samplers done!
dataloaders done!
vocab_size: 306
W0923 19:56:10.536396 43391 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.4, Runtime API Version: 10.1
W0923 19:56:10.542753 43391 gpu_resources.cc:91] device: 0, cuDNN Version: 7.6.
model done!
optimizer done!
/home/xxxx/.custom/cuda-11.4.2-cudnn8-devel-ubuntu20.04-pytorch1.9.0_full_tensorboard/envs/paddle_env/lib/python3.9/site-packages/paddle/nn/layer/norm.py:653: UserWarning: When training, we now always track global mean and variance.
  warnings.warn(
INFO 2022-09-23 19:56:16,280 trainer.py:167]  iter: 96401/1200, Rank: 0, l1_loss: 1.856183, duration_loss: 0.316927, pitch_loss: 0.978752, energy_loss: 6.169956, loss: 9.321817, avg_reader_cost: 0.00075 sec, avg_batch_cost: 2.49490 sec, avg_samples: 32, avg_ips: 12.82615 sequences/sec
INFO 2022-09-23 19:56:16,432 trainer.py:167]  iter: 96402/1200, Rank: 0, l1_loss: 1.837183, duration_loss: 0.331705, pitch_loss: 2.075590, energy_loss: 5.443483, loss: 9.687962, avg_reader_cost: 0.00023 sec, avg_batch_cost: 0.15020 sec, avg_samples: 32, avg_ips: 213.05078 sequences/sec
INFO 2022-09-23 19:56:16,797 trainer.py:167]  iter: 96403/1200, Rank: 0, l1_loss: 1.924996, duration_loss: 0.372018, pitch_loss: 1.120224, energy_loss: 5.148971, loss: 8.566210, avg_reader_cost: 0.00028 sec, avg_batch_cost: 0.36255 sec, avg_samples: 32, avg_ips: 88.26467 sequences/sec
INFO 2022-09-23 19:56:17,012 trainer.py:167]  iter: 96404/1200, Rank: 0, l1_loss: 1.770270, duration_loss: 0.278578, pitch_loss: 0.868206, energy_loss: 4.675483, loss: 7.592536, avg_reader_cost: 0.00017 sec, avg_batch_cost: 0.21332 sec, avg_samples: 32, avg_ips: 150.00853 sequences/sec
INFO 2022-09-23 19:56:17,226 fastspeech2_updater.py:174] Evaluate: l1_loss: 2.153552, duration_loss: 0.385811, pitch_loss: 0.574073, energy_loss: 5.581622, loss: 8.695057
INFO 2022-09-23 19:56:20,039 trainer.py:167]  iter: 96405/1200, Rank: 0, l1_loss: 1.780190, duration_loss: 0.277861, pitch_loss: 1.635613, energy_loss: 5.039042, loss: 8.732705, avg_reader_cost: 0.24950 sec, avg_batch_cost: 0.51928 sec, avg_samples: 32, avg_ips: 61.62427 sequences/sec
INFO 2022-09-23 19:56:20,205 trainer.py:167]  iter: 96406/1200, Rank: 0, l1_loss: 1.688078, duration_loss: 0.244555, pitch_loss: 0.762871, energy_loss: 4.326387, loss: 7.021891, avg_reader_cost: 0.00062 sec, avg_batch_cost: 0.16330 sec, avg_samples: 32, avg_ips: 195.95860 sequences/sec
INFO 2022-09-23 19:56:20,473 trainer.py:167]  iter: 96407/1200, Rank: 0, l1_loss: 1.639309, duration_loss: 0.291534, pitch_loss: 0.861395, energy_loss: 4.975361, loss: 7.767599, avg_reader_cost: 0.00019 sec, avg_batch_cost: 0.26607 sec, avg_samples: 32, avg_ips: 120.26807 sequences/sec
INFO 2022-09-23 19:56:20,654 trainer.py:167]  iter: 96408/1200, Rank: 0, l1_loss: 1.648149, duration_loss: 0.320221, pitch_loss: 0.792124, energy_loss: 4.360466, loss: 7.120960, avg_reader_cost: 0.00018 sec, avg_batch_cost: 0.17950 sec, avg_samples: 32, avg_ips: 178.27264 sequences/sec
INFO 2022-09-23 19:56:20,908 fastspeech2_updater.py:174] Evaluate: l1_loss: 2.084182, duration_loss: 0.419054, pitch_loss: 0.366559, energy_loss: 4.915794, loss: 7.785589
INFO 2022-09-23 19:56:23,792 trainer.py:167]  iter: 96409/1200, Rank: 0, l1_loss: 1.634000, duration_loss: 0.245067, pitch_loss: 1.825069, energy_loss: 4.637261, loss: 8.341396, avg_reader_cost: 0.27610 sec, avg_batch_cost: 0.54012 sec, avg_samples: 32, avg_ips: 59.24561 sequences/sec
INFO 2022-09-23 19:56:24,009 trainer.py:167]  iter: 96410/1200, Rank: 0, l1_loss: 1.596452, duration_loss: 0.312945, pitch_loss: 0.717812, energy_loss: 3.500055, loss: 6.127264, avg_reader_cost: 0.00023 sec, avg_batch_cost: 0.21478 sec, avg_samples: 32, avg_ips: 148.99175 sequences/sec
INFO 2022-09-23 19:56:24,189 trainer.py:167]  iter: 96411/1200, Rank: 0, l1_loss: 1.579894, duration_loss: 0.246891, pitch_loss: 0.851606, energy_loss: 4.034445, loss: 6.712835, avg_reader_cost: 0.00030 sec, avg_batch_cost: 0.17803 sec, avg_samples: 32, avg_ips: 179.74425 sequences/sec
INFO 2022-09-23 19:56:24,458 trainer.py:167]  iter: 96412/1200, Rank: 0, l1_loss: 1.544847, duration_loss: 0.266166, pitch_loss: 0.645351, energy_loss: 4.618836, loss: 7.075200, avg_reader_cost: 0.00021 sec, avg_batch_cost: 0.26733 sec, avg_samples: 32, avg_ips: 119.70131 sequences/sec
INFO 2022-09-23 19:56:24,691 fastspeech2_updater.py:174] Evaluate: l1_loss: 2.006938, duration_loss: 0.450204, pitch_loss: 0.262614, energy_loss: 4.423265, loss: 7.143022
INFO 2022-09-23 19:56:27,653 trainer.py:167]  iter: 96413/1200, Rank: 0, l1_loss: 1.563614, duration_loss: 0.288662, pitch_loss: 1.887361, energy_loss: 3.617754, loss: 7.357391, avg_reader_cost: 0.27405 sec, avg_batch_cost: 0.53012 sec, avg_samples: 32, avg_ips: 60.36321 sequences/sec
INFO 2022-09-23 19:56:27,927 trainer.py:167]  iter: 96414/1200, Rank: 0, l1_loss: 1.547978, duration_loss: 0.275924, pitch_loss: 0.660862, energy_loss: 3.920330, loss: 6.405094, avg_reader_cost: 0.00029 sec, avg_batch_cost: 0.27138 sec, avg_samples: 32, avg_ips: 117.91451 sequences/sec
INFO 2022-09-23 19:56:28,144 trainer.py:167]  iter: 96415/1200, Rank: 0, l1_loss: 1.496017, duration_loss: 0.253219, pitch_loss: 0.574959, energy_loss: 3.712186, loss: 6.036382, avg_reader_cost: 0.00030 sec, avg_batch_cost: 0.21546 sec, avg_samples: 32, avg_ips: 148.51825 sequences/sec
INFO 2022-09-23 19:56:28,325 trainer.py:167]  iter: 96416/1200, Rank: 0, l1_loss: 1.476836, duration_loss: 0.215573, pitch_loss: 0.774607, energy_loss: 4.139956, loss: 6.606973, avg_reader_cost: 0.00029 sec, avg_batch_cost: 0.17925 sec, avg_samples: 32, avg_ips: 178.51734 sequences/sec
INFO 2022-09-23 19:56:28,529 fastspeech2_updater.py:174] Evaluate: l1_loss: 1.925813, duration_loss: 0.436651, pitch_loss: 0.225853, energy_loss: 4.057889, loss: 6.646207
INFO 2022-09-23 19:56:31,268 trainer.py:167]  iter: 96417/1200, Rank: 0, l1_loss: 1.507729, duration_loss: 0.278215, pitch_loss: 0.658901, energy_loss: 3.469983, loss: 5.914828, avg_reader_cost: 0.28339 sec, avg_batch_cost: 0.48521 sec, avg_samples: 32, avg_ips: 65.95148 sequences/sec
INFO 2022-09-23 19:56:31,511 trainer.py:167]  iter: 96418/1200, Rank: 0, l1_loss: 1.498667, duration_loss: 0.226933, pitch_loss: 0.807640, energy_loss: 3.796876, loss: 6.330114, avg_reader_cost: 0.00028 sec, avg_batch_cost: 0.24016 sec, avg_samples: 32, avg_ips: 133.24319 sequences/sec
Exception in main training loop: (InvalidArgument) The value (281) of the non-singleton dimension does not match the corresponding value (289) in shape for expand_v2 op.
  [Hint: Expected vec_in_dims[i] == expand_shape[i], but received vec_in_dims[i]:281 != expand_shape[i]:289.] (at /paddle/paddle/phi/kernels/impl/expand_kernel_impl.h:61)
  [operator < expand_v2 > error]
Traceback (most recent call last):
  File "/home/xxxx/icassp_workspace/PaddleSpeech/paddlespeech/t2s/training/trainer.py", line 149, in run
    update()
  File "/home/xxxx/icassp_workspace/PaddleSpeech/paddlespeech/t2s/training/updaters/standard_updater.py", line 110, in update
    self.update_core(batch)
  File "/home/xxxx/icassp_workspace/PaddleSpeech/paddlespeech/t2s/models/fastspeech2/fastspeech2_updater.py", line 63, in update_core
    before_outs, after_outs, d_outs, p_outs, e_outs, ys, olens = self.model(
  File "/home/xxxx/.custom/cuda-11.4.2-cudnn8-devel-ubuntu20.04-pytorch1.9.0_full_tensorboard/envs/paddle_env/lib/python3.9/site-packages/paddle/fluid/dygraph/layers.py", line 930, in __call__
    return self._dygraph_call_func(*inputs, **kwargs)
  File "/home/xxxx/.custom/cuda-11.4.2-cudnn8-devel-ubuntu20.04-pytorch1.9.0_full_tensorboard/envs/paddle_env/lib/python3.9/site-packages/paddle/fluid/dygraph/layers.py", line 915, in _dygraph_call_func
    outputs = self.forward(*inputs, **kwargs)
  File "/home/xxxx/icassp_workspace/PaddleSpeech/paddlespeech/t2s/models/fastspeech2/fastspeech2.py", line 550, in forward
    before_outs, after_outs, d_outs, p_outs, e_outs = self._forward(
  File "/home/xxxx/icassp_workspace/PaddleSpeech/paddlespeech/t2s/models/fastspeech2/fastspeech2.py", line 667, in _forward
    zs, _ = self.decoder(hs, h_masks)
  File "/home/xxxx/.custom/cuda-11.4.2-cudnn8-devel-ubuntu20.04-pytorch1.9.0_full_tensorboard/envs/paddle_env/lib/python3.9/site-packages/paddle/fluid/dygraph/layers.py", line 930, in __call__
    return self._dygraph_call_func(*inputs, **kwargs)
  File "/home/xxxx/.custom/cuda-11.4.2-cudnn8-devel-ubuntu20.04-pytorch1.9.0_full_tensorboard/envs/paddle_env/lib/python3.9/site-packages/paddle/fluid/dygraph/layers.py", line 915, in _dygraph_call_func
    outputs = self.forward(*inputs, **kwargs)
  File "/home/xxxx/icassp_workspace/PaddleSpeech/paddlespeech/t2s/modules/transformer/encoder.py", line 409, in forward
    xs, masks = self.encoders(xs, masks)
  File "/home/xxxx/.custom/cuda-11.4.2-cudnn8-devel-ubuntu20.04-pytorch1.9.0_full_tensorboard/envs/paddle_env/lib/python3.9/site-packages/paddle/fluid/dygraph/layers.py", line 930, in __call__
    return self._dygraph_call_func(*inputs, **kwargs)
  File "/home/xxxx/.custom/cuda-11.4.2-cudnn8-devel-ubuntu20.04-pytorch1.9.0_full_tensorboard/envs/paddle_env/lib/python3.9/site-packages/paddle/fluid/dygraph/layers.py", line 915, in _dygraph_call_func
    outputs = self.forward(*inputs, **kwargs)
  File "/home/xxxx/icassp_workspace/PaddleSpeech/paddlespeech/t2s/modules/transformer/repeat.py", line 25, in forward
    args = m(*args)
  File "/home/xxxx/.custom/cuda-11.4.2-cudnn8-devel-ubuntu20.04-pytorch1.9.0_full_tensorboard/envs/paddle_env/lib/python3.9/site-packages/paddle/fluid/dygraph/layers.py", line 930, in __call__
    return self._dygraph_call_func(*inputs, **kwargs)
  File "/home/xxxx/.custom/cuda-11.4.2-cudnn8-devel-ubuntu20.04-pytorch1.9.0_full_tensorboard/envs/paddle_env/lib/python3.9/site-packages/paddle/fluid/dygraph/layers.py", line 915, in _dygraph_call_func
    outputs = self.forward(*inputs, **kwargs)
  File "/home/xxxx/icassp_workspace/PaddleSpeech/paddlespeech/t2s/modules/transformer/encoder_layer.py", line 99, in forward
    x = residual + self.dropout(self.self_attn(x_q, x, x, mask))
  File "/home/xxxx/.custom/cuda-11.4.2-cudnn8-devel-ubuntu20.04-pytorch1.9.0_full_tensorboard/envs/paddle_env/lib/python3.9/site-packages/paddle/fluid/dygraph/layers.py", line 930, in __call__
    return self._dygraph_call_func(*inputs, **kwargs)
  File "/home/xxxx/.custom/cuda-11.4.2-cudnn8-devel-ubuntu20.04-pytorch1.9.0_full_tensorboard/envs/paddle_env/lib/python3.9/site-packages/paddle/fluid/dygraph/layers.py", line 915, in _dygraph_call_func
    outputs = self.forward(*inputs, **kwargs)
  File "/home/xxxx/icassp_workspace/PaddleSpeech/paddlespeech/t2s/modules/transformer/attention.py", line 144, in forward
    return self.forward_attention(v, scores, mask)
  File "/home/xxxx/icassp_workspace/PaddleSpeech/paddlespeech/t2s/modules/transformer/attention.py", line 107, in forward_attention
    scores = masked_fill(scores, mask, min_value)
  File "/home/xxxx/icassp_workspace/PaddleSpeech/paddlespeech/t2s/modules/masked_fill.py", line 44, in masked_fill
    mask = mask.broadcast_to(bshape)
  File "/home/xxxx/.custom/cuda-11.4.2-cudnn8-devel-ubuntu20.04-pytorch1.9.0_full_tensorboard/envs/paddle_env/lib/python3.9/site-packages/paddle/tensor/manipulation.py", line 1917, in broadcast_to
    return _C_ops.expand_v2(x, 'shape', shape)
Trainer extensions will try to handle the extension. Then all extensions will finalize.Traceback (most recent call last):
  File "/home/xxxx/icassp_workspace/PaddleSpeech/examples/other/tts_finetune/tts3/local/finetune.py", line 269, in <module>
    train_sp(train_args, config)
  File "/home/xxxx/icassp_workspace/PaddleSpeech/examples/other/tts_finetune/tts3/local/finetune.py", line 202, in train_sp
    trainer.run()
  File "/home/xxxx/icassp_workspace/PaddleSpeech/paddlespeech/t2s/training/trainer.py", line 198, in run
    six.reraise(*exc_info)
  File "/home/xxxx/.custom/cuda-11.4.2-cudnn8-devel-ubuntu20.04-pytorch1.9.0_full_tensorboard/envs/paddle_env/lib/python3.9/site-packages/six.py", line 719, in reraise
    raise value
  File "/home/xxxx/icassp_workspace/PaddleSpeech/paddlespeech/t2s/training/trainer.py", line 149, in run
    update()
  File "/home/xxxx/icassp_workspace/PaddleSpeech/paddlespeech/t2s/training/updaters/standard_updater.py", line 110, in update
    self.update_core(batch)
  File "/home/xxxx/icassp_workspace/PaddleSpeech/paddlespeech/t2s/models/fastspeech2/fastspeech2_updater.py", line 63, in update_core
    before_outs, after_outs, d_outs, p_outs, e_outs, ys, olens = self.model(
  File "/home/xxxx/.custom/cuda-11.4.2-cudnn8-devel-ubuntu20.04-pytorch1.9.0_full_tensorboard/envs/paddle_env/lib/python3.9/site-packages/paddle/fluid/dygraph/layers.py", line 930, in __call__
    return self._dygraph_call_func(*inputs, **kwargs)
  File "/home/xxxx/.custom/cuda-11.4.2-cudnn8-devel-ubuntu20.04-pytorch1.9.0_full_tensorboard/envs/paddle_env/lib/python3.9/site-packages/paddle/fluid/dygraph/layers.py", line 915, in _dygraph_call_func
    outputs = self.forward(*inputs, **kwargs)
  File "/home/xxxx/icassp_workspace/PaddleSpeech/paddlespeech/t2s/models/fastspeech2/fastspeech2.py", line 550, in forward
    before_outs, after_outs, d_outs, p_outs, e_outs = self._forward(
  File "/home/xxxx/icassp_workspace/PaddleSpeech/paddlespeech/t2s/models/fastspeech2/fastspeech2.py", line 667, in _forward
    zs, _ = self.decoder(hs, h_masks)
  File "/home/xxxx/.custom/cuda-11.4.2-cudnn8-devel-ubuntu20.04-pytorch1.9.0_full_tensorboard/envs/paddle_env/lib/python3.9/site-packages/paddle/fluid/dygraph/layers.py", line 930, in __call__
    return self._dygraph_call_func(*inputs, **kwargs)
  File "/home/xxxx/.custom/cuda-11.4.2-cudnn8-devel-ubuntu20.04-pytorch1.9.0_full_tensorboard/envs/paddle_env/lib/python3.9/site-packages/paddle/fluid/dygraph/layers.py", line 915, in _dygraph_call_func
    outputs = self.forward(*inputs, **kwargs)
  File "/home/xxxx/icassp_workspace/PaddleSpeech/paddlespeech/t2s/modules/transformer/encoder.py", line 409, in forward
    xs, masks = self.encoders(xs, masks)
  File "/home/xxxx/.custom/cuda-11.4.2-cudnn8-devel-ubuntu20.04-pytorch1.9.0_full_tensorboard/envs/paddle_env/lib/python3.9/site-packages/paddle/fluid/dygraph/layers.py", line 930, in __call__
    return self._dygraph_call_func(*inputs, **kwargs)
  File "/home/xxxx/.custom/cuda-11.4.2-cudnn8-devel-ubuntu20.04-pytorch1.9.0_full_tensorboard/envs/paddle_env/lib/python3.9/site-packages/paddle/fluid/dygraph/layers.py", line 915, in _dygraph_call_func
    outputs = self.forward(*inputs, **kwargs)
  File "/home/xxxx/icassp_workspace/PaddleSpeech/paddlespeech/t2s/modules/transformer/repeat.py", line 25, in forward
    args = m(*args)
  File "/home/xxxx/.custom/cuda-11.4.2-cudnn8-devel-ubuntu20.04-pytorch1.9.0_full_tensorboard/envs/paddle_env/lib/python3.9/site-packages/paddle/fluid/dygraph/layers.py", line 930, in __call__
    return self._dygraph_call_func(*inputs, **kwargs)
  File "/home/xxxx/.custom/cuda-11.4.2-cudnn8-devel-ubuntu20.04-pytorch1.9.0_full_tensorboard/envs/paddle_env/lib/python3.9/site-packages/paddle/fluid/dygraph/layers.py", line 915, in _dygraph_call_func
    outputs = self.forward(*inputs, **kwargs)
  File "/home/xxxx/icassp_workspace/PaddleSpeech/paddlespeech/t2s/modules/transformer/encoder_layer.py", line 99, in forward
    x = residual + self.dropout(self.self_attn(x_q, x, x, mask))
  File "/home/xxxx/.custom/cuda-11.4.2-cudnn8-devel-ubuntu20.04-pytorch1.9.0_full_tensorboard/envs/paddle_env/lib/python3.9/site-packages/paddle/fluid/dygraph/layers.py", line 930, in __call__
    return self._dygraph_call_func(*inputs, **kwargs)
  File "/home/xxxx/.custom/cuda-11.4.2-cudnn8-devel-ubuntu20.04-pytorch1.9.0_full_tensorboard/envs/paddle_env/lib/python3.9/site-packages/paddle/fluid/dygraph/layers.py", line 915, in _dygraph_call_func
    outputs = self.forward(*inputs, **kwargs)
  File "/home/xxxx/icassp_workspace/PaddleSpeech/paddlespeech/t2s/modules/transformer/attention.py", line 144, in forward
    return self.forward_attention(v, scores, mask)
  File "/home/xxxx/icassp_workspace/PaddleSpeech/paddlespeech/t2s/modules/transformer/attention.py", line 107, in forward_attention
    scores = masked_fill(scores, mask, min_value)
  File "/home/xxxx/icassp_workspace/PaddleSpeech/paddlespeech/t2s/modules/masked_fill.py", line 44, in masked_fill
    mask = mask.broadcast_to(bshape)
  File "/home/xxxx/.custom/cuda-11.4.2-cudnn8-devel-ubuntu20.04-pytorch1.9.0_full_tensorboard/envs/paddle_env/lib/python3.9/site-packages/paddle/tensor/manipulation.py", line 1917, in broadcast_to
    return _C_ops.expand_v2(x, 'shape', shape)
ValueError: (InvalidArgument) The value (281) of the non-singleton dimension does not match the corresponding value (289) in shape for expand_v2 op.
  [Hint: Expected vec_in_dims[i] == expand_shape[i], but received vec_in_dims[i]:281 != expand_shape[i]:289.] (at /paddle/paddle/phi/kernels/impl/expand_kernel_impl.h:61)
  [operator < expand_v2 > error]
@yt605155624 yt605155624 self-assigned this Sep 23, 2022
@yt605155624
Copy link
Collaborator

yt605155624 commented Sep 23, 2022

latest MFA (from conda) 也没用过,不知道生成的音素是否有不对齐的地方
可以 check 一遍预处理过的 MEL 的长度和 duration 之和是否匹配
也可以在代码里面加一些 log 看下 (281) 和 (289) 究竟是哪个变量的长度
另外既然有自己使用 MFA 的能力,其实可以直接自己用 aishell3/tts3 继续 finetune, 参考 #1842

@dc3ea9f
Copy link
Author

dc3ea9f commented Sep 24, 2022

非常感谢您的回复!我已经check过了预处理的MEL的长度dump/{data_split}/data_speech/*.npydump/{data_split}/metadata.jsonl中的duration,发现train/dev/test都是可以匹配的。

看traceback是在attention mask broadcast的时候挂掉的,问题出现在PaddleSpeech框架内部(paddlespeech/t2s/modules/masked_fill.py#44),暂时还没有定位到是哪里的问题。

我接下来会试着用aishell3/tts3做继续的finetune,我主要是刚接触这个area,很多东西不是很熟悉,就想着能用tts_finetune/tts3省心点搞完就好了。

@dc3ea9f
Copy link
Author

dc3ea9f commented Sep 24, 2022

我发现在报错出现前已经经过了数次的Evaluate,注意到代码中evaluator的trigger是per-epoch的,应该是run了几个epoch都正常,但是突然报错,有些confuse

@dc3ea9f
Copy link
Author

dc3ea9f commented Sep 24, 2022

您好,我尝试使用aisheel3/tts3 进行finetune,但是仍然会遇到此bug,不知道应该如何定位、解决问题,您有什么建议吗?


INFO 2022-09-24 11:36:11,383 trainer.py:167]  iter: 90/200, Rank: 0, l1_loss: 1.274704, duration_loss: 0.168186, pitch_loss: 0.668763, energy_loss: 1.406364, loss: 3.518017, avg_reader_cost: 9.86195 sec, avg_batch_cost: 10.37309 sec, avg_samples: 64, avg_ips: 6.16981 sequences/sec
INFO 2022-09-24 11:36:11,383 trainer.py:167]  iter: 90/200, Rank: 1, l1_loss: 1.251410, duration_loss: 0.153488, pitch_loss: 0.876803, energy_loss: 1.262913, loss: 3.544615, avg_reader_cost: 9.75817 sec, avg_batch_cost: 23.90390 sec, avg_samples: 64, avg_ips: 2.67739 sequences/sec
/home/xxxx/.custom/cuda-11.4.2-cudnn8-devel-ubuntu20.04-pytorch1.9.0_full_tensorboard/envs/paddle_env/lib/python3.9/site-packages/librosa/core/constantq.py:1059: DeprecationWarning: `np.complex` is a deprecated alias for the builtin `complex`. To silence this warning, use `complex` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.complex128` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  dtype=np.complex,
/home/xxxx/.custom/cuda-11.4.2-cudnn8-devel-ubuntu20.04-pytorch1.9.0_full_tensorboard/envs/paddle_env/lib/python3.9/site-packages/librosa/core/constantq.py:1059: DeprecationWarning: `np.complex` is a deprecated alias for the builtin `complex`. To silence this warning, use `complex` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.complex128` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  dtype=np.complex,
/home/xxxx/.custom/cuda-11.4.2-cudnn8-devel-ubuntu20.04-pytorch1.9.0_full_tensorboard/envs/paddle_env/lib/python3.9/site-packages/librosa/core/constantq.py:1059: DeprecationWarning: `np.complex` is a deprecated alias for the builtin `complex`. To silence this warning, use `complex` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.complex128` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  dtype=np.complex,
/home/xxxx/.custom/cuda-11.4.2-cudnn8-devel-ubuntu20.04-pytorch1.9.0_full_tensorboard/envs/paddle_env/lib/python3.9/site-packages/librosa/core/constantq.py:1059: DeprecationWarning: `np.complex` is a deprecated alias for the builtin `complex`. To silence this warning, use `complex` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.complex128` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  dtype=np.complex,
INFO 2022-09-24 11:36:18,374 fastspeech2_updater.py:174] Evaluate: l1_loss: 1.686024, duration_loss: 0.357755, pitch_loss: 0.217953, energy_loss: 2.897542, loss: 5.159274
/home/xxxx/.custom/cuda-11.4.2-cudnn8-devel-ubuntu20.04-pytorch1.9.0_full_tensorboard/envs/paddle_env/lib/python3.9/site-packages/librosa/core/constantq.py:1059: DeprecationWarning: `np.complex` is a deprecated alias for the builtin `complex`. To silence this warning, use `complex` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.complex128` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  dtype=np.complex,
/home/xxxx/.custom/cuda-11.4.2-cudnn8-devel-ubuntu20.04-pytorch1.9.0_full_tensorboard/envs/paddle_env/lib/python3.9/site-packages/librosa/core/constantq.py:1059: DeprecationWarning: `np.complex` is a deprecated alias for the builtin `complex`. To silence this warning, use `complex` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.complex128` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  dtype=np.complex,
INFO 2022-09-24 11:36:31,459 trainer.py:167]  iter: 91/200, Rank: 0, l1_loss: 1.267544, duration_loss: 0.171551, pitch_loss: 1.186325, energy_loss: 1.460606, loss: 4.086026, avg_reader_cost: 8.73017 sec, avg_batch_cost: 9.14669 sec, avg_samples: 64, avg_ips: 6.99707 sequences/sec
INFO 2022-09-24 11:36:31,459 trainer.py:167]  iter: 91/200, Rank: 1, l1_loss: 1.259085, duration_loss: 0.155300, pitch_loss: 0.526848, energy_loss: 1.223089, loss: 3.164323, avg_reader_cost: 7.12135 sec, avg_batch_cost: 20.07384 sec, avg_samples: 64, avg_ips: 3.18823 sequences/sec
/home/xxxx/.custom/cuda-11.4.2-cudnn8-devel-ubuntu20.04-pytorch1.9.0_full_tensorboard/envs/paddle_env/lib/python3.9/site-packages/librosa/core/constantq.py:1059: DeprecationWarning: `np.complex` is a deprecated alias for the builtin `complex`. To silence this warning, use `complex` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.complex128` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  dtype=np.complex,
/home/xxxx/.custom/cuda-11.4.2-cudnn8-devel-ubuntu20.04-pytorch1.9.0_full_tensorboard/envs/paddle_env/lib/python3.9/site-packages/librosa/core/constantq.py:1059: DeprecationWarning: `np.complex` is a deprecated alias for the builtin `complex`. To silence this warning, use `complex` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.complex128` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  dtype=np.complex,
/home/xxxx/.custom/cuda-11.4.2-cudnn8-devel-ubuntu20.04-pytorch1.9.0_full_tensorboard/envs/paddle_env/lib/python3.9/site-packages/librosa/core/constantq.py:1059: DeprecationWarning: `np.complex` is a deprecated alias for the builtin `complex`. To silence this warning, use `complex` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.complex128` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  dtype=np.complex,
/home/xxxx/.custom/cuda-11.4.2-cudnn8-devel-ubuntu20.04-pytorch1.9.0_full_tensorboard/envs/paddle_env/lib/python3.9/site-packages/librosa/core/constantq.py:1059: DeprecationWarning: `np.complex` is a deprecated alias for the builtin `complex`. To silence this warning, use `complex` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.complex128` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  dtype=np.complex,
INFO 2022-09-24 11:36:37,434 fastspeech2_updater.py:174] Evaluate: l1_loss: 1.681811, duration_loss: 0.358128, pitch_loss: 0.217943, energy_loss: 2.905669, loss: 5.163549
/home/xxxx/.custom/cuda-11.4.2-cudnn8-devel-ubuntu20.04-pytorch1.9.0_full_tensorboard/envs/paddle_env/lib/python3.9/site-packages/librosa/core/constantq.py:1059: DeprecationWarning: `np.complex` is a deprecated alias for the builtin `complex`. To silence this warning, use `complex` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.complex128` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  dtype=np.complex,
/home/xxxx/.custom/cuda-11.4.2-cudnn8-devel-ubuntu20.04-pytorch1.9.0_full_tensorboard/envs/paddle_env/lib/python3.9/site-packages/librosa/core/constantq.py:1059: DeprecationWarning: `np.complex` is a deprecated alias for the builtin `complex`. To silence this warning, use `complex` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.complex128` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  dtype=np.complex,
Exception in main training loop: (InvalidArgument) The value (281) of the non-singleton dimension does not match the corresponding value (289) in shape for expand_v2 op.
  [Hint: Expected vec_in_dims[i] == expand_shape[i], but received vec_in_dims[i]:281 != expand_shape[i]:289.] (at /paddle/paddle/phi/kernels/impl/expand_kernel_impl.h:61)
  [operator < expand_v2 > error]
Traceback (most recent call last):
  File "/home/xxxx/icassp_workspace/PaddleSpeech/paddlespeech/t2s/training/trainer.py", line 149, in run
    update()
  File "/home/xxxx/icassp_workspace/PaddleSpeech/paddlespeech/t2s/training/updaters/standard_updater.py", line 110, in update
    self.update_core(batch)
  File "/home/xxxx/icassp_workspace/PaddleSpeech/paddlespeech/t2s/models/fastspeech2/fastspeech2_updater.py", line 63, in update_core
    before_outs, after_outs, d_outs, p_outs, e_outs, ys, olens = self.model(
  File "/home/xxxx/.custom/cuda-11.4.2-cudnn8-devel-ubuntu20.04-pytorch1.9.0_full_tensorboard/envs/paddle_env/lib/python3.9/site-packages/paddle/fluid/dygraph/layers.py", line 930, in __call__
    return self._dygraph_call_func(*inputs, **kwargs)
  File "/home/xxxx/.custom/cuda-11.4.2-cudnn8-devel-ubuntu20.04-pytorch1.9.0_full_tensorboard/envs/paddle_env/lib/python3.9/site-packages/paddle/fluid/dygraph/layers.py", line 915, in _dygraph_call_func
    outputs = self.forward(*inputs, **kwargs)
  File "/home/xxxx/.custom/cuda-11.4.2-cudnn8-devel-ubuntu20.04-pytorch1.9.0_full_tensorboard/envs/paddle_env/lib/python3.9/site-packages/paddle/fluid/dygraph/parallel.py", line 752, in forward
    outputs = self._layers(*inputs, **kwargs)
  File "/home/xxxx/.custom/cuda-11.4.2-cudnn8-devel-ubuntu20.04-pytorch1.9.0_full_tensorboard/envs/paddle_env/lib/python3.9/site-packages/paddle/fluid/dygraph/layers.py", line 930, in __call__
    return self._dygraph_call_func(*inputs, **kwargs)
  File "/home/xxxx/.custom/cuda-11.4.2-cudnn8-devel-ubuntu20.04-pytorch1.9.0_full_tensorboard/envs/paddle_env/lib/python3.9/site-packages/paddle/fluid/dygraph/layers.py", line 915, in _dygraph_call_func
    outputs = self.forward(*inputs, **kwargs)
  File "/home/xxxx/icassp_workspace/PaddleSpeech/paddlespeech/t2s/models/fastspeech2/fastspeech2.py", line 550, in forward
    before_outs, after_outs, d_outs, p_outs, e_outs = self._forward(
  File "/home/xxxx/icassp_workspace/PaddleSpeech/paddlespeech/t2s/models/fastspeech2/fastspeech2.py", line 667, in _forward
    zs, _ = self.decoder(hs, h_masks)
  File "/home/xxxx/.custom/cuda-11.4.2-cudnn8-devel-ubuntu20.04-pytorch1.9.0_full_tensorboard/envs/paddle_env/lib/python3.9/site-packages/paddle/fluid/dygraph/layers.py", line 930, in __call__
    return self._dygraph_call_func(*inputs, **kwargs)
  File "/home/xxxx/.custom/cuda-11.4.2-cudnn8-devel-ubuntu20.04-pytorch1.9.0_full_tensorboard/envs/paddle_env/lib/python3.9/site-packages/paddle/fluid/dygraph/layers.py", line 915, in _dygraph_call_func
    outputs = self.forward(*inputs, **kwargs)
  File "/home/xxxx/icassp_workspace/PaddleSpeech/paddlespeech/t2s/modules/transformer/encoder.py", line 409, in forward
    xs, masks = self.encoders(xs, masks)
  File "/home/xxxx/.custom/cuda-11.4.2-cudnn8-devel-ubuntu20.04-pytorch1.9.0_full_tensorboard/envs/paddle_env/lib/python3.9/site-packages/paddle/fluid/dygraph/layers.py", line 930, in __call__
    return self._dygraph_call_func(*inputs, **kwargs)
  File "/home/xxxx/.custom/cuda-11.4.2-cudnn8-devel-ubuntu20.04-pytorch1.9.0_full_tensorboard/envs/paddle_env/lib/python3.9/site-packages/paddle/fluid/dygraph/layers.py", line 915, in _dygraph_call_func
    outputs = self.forward(*inputs, **kwargs)
  File "/home/xxxx/icassp_workspace/PaddleSpeech/paddlespeech/t2s/modules/transformer/repeat.py", line 25, in forward
    args = m(*args)
  File "/home/xxxx/.custom/cuda-11.4.2-cudnn8-devel-ubuntu20.04-pytorch1.9.0_full_tensorboard/envs/paddle_env/lib/python3.9/site-packages/paddle/fluid/dygraph/layers.py", line 930, in __call__
    return self._dygraph_call_func(*inputs, **kwargs)
  File "/home/xxxx/.custom/cuda-11.4.2-cudnn8-devel-ubuntu20.04-pytorch1.9.0_full_tensorboard/envs/paddle_env/lib/python3.9/site-packages/paddle/fluid/dygraph/layers.py", line 915, in _dygraph_call_func
    outputs = self.forward(*inputs, **kwargs)
  File "/home/xxxx/icassp_workspace/PaddleSpeech/paddlespeech/t2s/modules/transformer/encoder_layer.py", line 99, in forward
    x = residual + self.dropout(self.self_attn(x_q, x, x, mask))
  File "/home/xxxx/.custom/cuda-11.4.2-cudnn8-devel-ubuntu20.04-pytorch1.9.0_full_tensorboard/envs/paddle_env/lib/python3.9/site-packages/paddle/fluid/dygraph/layers.py", line 930, in __call__
    return self._dygraph_call_func(*inputs, **kwargs)
  File "/home/xxxx/.custom/cuda-11.4.2-cudnn8-devel-ubuntu20.04-pytorch1.9.0_full_tensorboard/envs/paddle_env/lib/python3.9/site-packages/paddle/fluid/dygraph/layers.py", line 915, in _dygraph_call_func
    outputs = self.forward(*inputs, **kwargs)
  File "/home/xxxx/icassp_workspace/PaddleSpeech/paddlespeech/t2s/modules/transformer/attention.py", line 144, in forward
    return self.forward_attention(v, scores, mask)
  File "/home/xxxx/icassp_workspace/PaddleSpeech/paddlespeech/t2s/modules/transformer/attention.py", line 107, in forward_attention
    scores = masked_fill(scores, mask, min_value)
  File "/home/xxxx/icassp_workspace/PaddleSpeech/paddlespeech/t2s/modules/masked_fill.py", line 44, in masked_fill
    mask = mask.broadcast_to(bshape)
  File "/home/xxxx/.custom/cuda-11.4.2-cudnn8-devel-ubuntu20.04-pytorch1.9.0_full_tensorboard/envs/paddle_env/lib/python3.9/site-packages/paddle/tensor/manipulation.py", line 1917, in broadcast_to
    return _C_ops.expand_v2(x, 'shape', shape)
Trainer extensions will try to handle the extension. Then all extensions will finalize.

--------------------------------------
C++ Traceback (most recent call last):
--------------------------------------
0   void paddle::memory::Copy<phi::CPUPlace, phi::Place>(phi::CPUPlace, void*, phi::Place, void const*, unsigned long, void*)
1   void paddle::memory::Copy<phi::Place, phi::Place>(phi::Place, void*, phi::Place, void const*, unsigned long, void*)
2   void paddle::memory::Copy<phi::CPUPlace, phi::GPUPlace>(phi::CPUPlace, void*, phi::GPUPlace, void const*, unsigned long, void*)
3   phi::backends::gpu::GpuMemcpySync(void*, void const*, unsigned long, cudaMemcpyKind)

----------------------
Error Message Summary:
----------------------
FatalError: `Termination signal` is detected by the operating system.
  [TimeInfo: *** Aborted at 1663990610 (unix time) try "date -d @1663990610" if you are using GNU date ***]
  [SignalInfo: *** SIGTERM (@0x3e800069696) received by PID 431879 (TID 0x7f22d035e4c0) from PID 431766 ***]

Traceback (most recent call last):
  File "/home/xxxx/icassp_workspace/PaddleSpeech/paddlespeech/t2s/exps/fastspeech2/train.py", line 215, in <module>
    main()
  File "/home/xxxx/icassp_workspace/PaddleSpeech/paddlespeech/t2s/exps/fastspeech2/train.py", line 209, in main
    dist.spawn(train_sp, (args, config), nprocs=args.ngpu)
  File "/home/xxxx/.custom/cuda-11.4.2-cudnn8-devel-ubuntu20.04-pytorch1.9.0_full_tensorboard/envs/paddle_env/lib/python3.9/site-packages/paddle/distributed/spawn.py", line 565, in spawn
    while not context.join():
  File "/home/xxxx/.custom/cuda-11.4.2-cudnn8-devel-ubuntu20.04-pytorch1.9.0_full_tensorboard/envs/paddle_env/lib/python3.9/site-packages/paddle/distributed/spawn.py", line 373, in join
    self._throw_exception(error_index)
  File "/home/xxxx/.custom/cuda-11.4.2-cudnn8-devel-ubuntu20.04-pytorch1.9.0_full_tensorboard/envs/paddle_env/lib/python3.9/site-packages/paddle/distributed/spawn.py", line 391, in _throw_exception
    raise Exception(msg)
Exception: 

----------------------------------------------
Process 0 terminated with the following error:
----------------------------------------------

Traceback (most recent call last):
  File "/home/xxxx/.custom/cuda-11.4.2-cudnn8-devel-ubuntu20.04-pytorch1.9.0_full_tensorboard/envs/paddle_env/lib/python3.9/site-packages/paddle/distributed/spawn.py", line 322, in _func_wrapper
    result = func(*args)
  File "/home/xxxx/icassp_workspace/PaddleSpeech/paddlespeech/t2s/exps/fastspeech2/train.py", line 168, in train_sp
    trainer.run()
  File "/home/xxxx/icassp_workspace/PaddleSpeech/paddlespeech/t2s/training/trainer.py", line 198, in run
    six.reraise(*exc_info)
  File "/home/xxxx/.custom/cuda-11.4.2-cudnn8-devel-ubuntu20.04-pytorch1.9.0_full_tensorboard/envs/paddle_env/lib/python3.9/site-packages/six.py", line 719, in reraise
    raise value
  File "/home/xxxx/icassp_workspace/PaddleSpeech/paddlespeech/t2s/training/trainer.py", line 149, in run
    update()
  File "/home/xxxx/icassp_workspace/PaddleSpeech/paddlespeech/t2s/training/updaters/standard_updater.py", line 110, in update
    self.update_core(batch)
  File "/home/xxxx/icassp_workspace/PaddleSpeech/paddlespeech/t2s/models/fastspeech2/fastspeech2_updater.py", line 63, in update_core
    before_outs, after_outs, d_outs, p_outs, e_outs, ys, olens = self.model(
  File "/home/xxxx/.custom/cuda-11.4.2-cudnn8-devel-ubuntu20.04-pytorch1.9.0_full_tensorboard/envs/paddle_env/lib/python3.9/site-packages/paddle/fluid/dygraph/layers.py", line 930, in __call__
    return self._dygraph_call_func(*inputs, **kwargs)
  File "/home/xxxx/.custom/cuda-11.4.2-cudnn8-devel-ubuntu20.04-pytorch1.9.0_full_tensorboard/envs/paddle_env/lib/python3.9/site-packages/paddle/fluid/dygraph/layers.py", line 915, in _dygraph_call_func
    outputs = self.forward(*inputs, **kwargs)
  File "/home/xxxx/.custom/cuda-11.4.2-cudnn8-devel-ubuntu20.04-pytorch1.9.0_full_tensorboard/envs/paddle_env/lib/python3.9/site-packages/paddle/fluid/dygraph/parallel.py", line 752, in forward
    outputs = self._layers(*inputs, **kwargs)
  File "/home/xxxx/.custom/cuda-11.4.2-cudnn8-devel-ubuntu20.04-pytorch1.9.0_full_tensorboard/envs/paddle_env/lib/python3.9/site-packages/paddle/fluid/dygraph/layers.py", line 930, in __call__
    return self._dygraph_call_func(*inputs, **kwargs)
  File "/home/xxxx/.custom/cuda-11.4.2-cudnn8-devel-ubuntu20.04-pytorch1.9.0_full_tensorboard/envs/paddle_env/lib/python3.9/site-packages/paddle/fluid/dygraph/layers.py", line 915, in _dygraph_call_func
    outputs = self.forward(*inputs, **kwargs)
  File "/home/xxxx/icassp_workspace/PaddleSpeech/paddlespeech/t2s/models/fastspeech2/fastspeech2.py", line 550, in forward
    before_outs, after_outs, d_outs, p_outs, e_outs = self._forward(
  File "/home/xxxx/icassp_workspace/PaddleSpeech/paddlespeech/t2s/models/fastspeech2/fastspeech2.py", line 667, in _forward
    zs, _ = self.decoder(hs, h_masks)
  File "/home/xxxx/.custom/cuda-11.4.2-cudnn8-devel-ubuntu20.04-pytorch1.9.0_full_tensorboard/envs/paddle_env/lib/python3.9/site-packages/paddle/fluid/dygraph/layers.py", line 930, in __call__
    return self._dygraph_call_func(*inputs, **kwargs)
  File "/home/xxxx/.custom/cuda-11.4.2-cudnn8-devel-ubuntu20.04-pytorch1.9.0_full_tensorboard/envs/paddle_env/lib/python3.9/site-packages/paddle/fluid/dygraph/layers.py", line 915, in _dygraph_call_func
    outputs = self.forward(*inputs, **kwargs)
  File "/home/xxxx/icassp_workspace/PaddleSpeech/paddlespeech/t2s/modules/transformer/encoder.py", line 409, in forward
    xs, masks = self.encoders(xs, masks)
  File "/home/xxxx/.custom/cuda-11.4.2-cudnn8-devel-ubuntu20.04-pytorch1.9.0_full_tensorboard/envs/paddle_env/lib/python3.9/site-packages/paddle/fluid/dygraph/layers.py", line 930, in __call__
    return self._dygraph_call_func(*inputs, **kwargs)
  File "/home/xxxx/.custom/cuda-11.4.2-cudnn8-devel-ubuntu20.04-pytorch1.9.0_full_tensorboard/envs/paddle_env/lib/python3.9/site-packages/paddle/fluid/dygraph/layers.py", line 915, in _dygraph_call_func
    outputs = self.forward(*inputs, **kwargs)
  File "/home/xxxx/icassp_workspace/PaddleSpeech/paddlespeech/t2s/modules/transformer/repeat.py", line 25, in forward
    args = m(*args)
  File "/home/xxxx/.custom/cuda-11.4.2-cudnn8-devel-ubuntu20.04-pytorch1.9.0_full_tensorboard/envs/paddle_env/lib/python3.9/site-packages/paddle/fluid/dygraph/layers.py", line 930, in __call__
    return self._dygraph_call_func(*inputs, **kwargs)
  File "/home/xxxx/.custom/cuda-11.4.2-cudnn8-devel-ubuntu20.04-pytorch1.9.0_full_tensorboard/envs/paddle_env/lib/python3.9/site-packages/paddle/fluid/dygraph/layers.py", line 915, in _dygraph_call_func
    outputs = self.forward(*inputs, **kwargs)
  File "/home/xxxx/icassp_workspace/PaddleSpeech/paddlespeech/t2s/modules/transformer/encoder_layer.py", line 99, in forward
    x = residual + self.dropout(self.self_attn(x_q, x, x, mask))
  File "/home/xxxx/.custom/cuda-11.4.2-cudnn8-devel-ubuntu20.04-pytorch1.9.0_full_tensorboard/envs/paddle_env/lib/python3.9/site-packages/paddle/fluid/dygraph/layers.py", line 930, in __call__
    return self._dygraph_call_func(*inputs, **kwargs)
  File "/home/xxxx/.custom/cuda-11.4.2-cudnn8-devel-ubuntu20.04-pytorch1.9.0_full_tensorboard/envs/paddle_env/lib/python3.9/site-packages/paddle/fluid/dygraph/layers.py", line 915, in _dygraph_call_func
    outputs = self.forward(*inputs, **kwargs)
  File "/home/xxxx/icassp_workspace/PaddleSpeech/paddlespeech/t2s/modules/transformer/attention.py", line 144, in forward
    return self.forward_attention(v, scores, mask)
  File "/home/xxxx/icassp_workspace/PaddleSpeech/paddlespeech/t2s/modules/transformer/attention.py", line 107, in forward_attention
    scores = masked_fill(scores, mask, min_value)
  File "/home/xxxx/icassp_workspace/PaddleSpeech/paddlespeech/t2s/modules/masked_fill.py", line 44, in masked_fill
    mask = mask.broadcast_to(bshape)
  File "/home/xxxx/.custom/cuda-11.4.2-cudnn8-devel-ubuntu20.04-pytorch1.9.0_full_tensorboard/envs/paddle_env/lib/python3.9/site-packages/paddle/tensor/manipulation.py", line 1917, in broadcast_to
    return _C_ops.expand_v2(x, 'shape', shape)
ValueError: (InvalidArgument) The value (281) of the non-singleton dimension does not match the corresponding value (289) in shape for expand_v2 op.
  [Hint: Expected vec_in_dims[i] == expand_shape[i], but received vec_in_dims[i]:281 != expand_shape[i]:289.] (at /paddle/paddle/phi/kernels/impl/expand_kernel_impl.h:61)
  [operator < expand_v2 > error]

@yt605155624
Copy link
Collaborator

yt605155624 commented Sep 26, 2022

建议先跑通我们的代码,熟悉数据预处理、训练流程,再自己改输入,使用 paddle 稳定版(如 2.3.1)
大概率是数据的问题

@dc3ea9f
Copy link
Author

dc3ea9f commented Sep 26, 2022

好的,我尝试一下

@dc3ea9f
Copy link
Author

dc3ea9f commented Sep 27, 2022

我用的paddle都是pip源安装的,Example代码是可以跑通的,还是没有定位到问题在哪里,不过我尝试修改t2s/training/trainer.py#149为:

try:
    update()
except:
    print("[warning] bug")
    continue

倒是也能训练了,听着训练结果感觉还ok,提示bug的频率不高,没有连着出现3+ bugs的情况。

@graciechen
Copy link

建议先跑通我们的代码,熟悉数据预处理、训练流程,再自己改输入,使用 paddle 稳定版(如 2.3.1) 大概率是数据的问题

paddlespeech建议用哪个版本的?

@hello2mao
Copy link

+1

@hello2mao
Copy link

我用的paddle都是pip源安装的,Example代码是可以跑通的,还是没有定位到问题在哪里,不过我尝试修改t2s/training/trainer.py#149为:

try:
    update()
except:
    print("[warning] bug")
    continue

倒是也能训练了,听着训练结果感觉还ok,提示bug的频率不高,没有连着出现3+ bugs的情况。

image

@ray1a1
Copy link

ray1a1 commented Jul 18, 2023

我也是遇到这个问题,在9900k+3090的电脑上没问题,换到13900k+4090的电脑上训练就是这样,bug一大堆,感觉是有一半的语料都没有练进去,最后有解决吗?

@Tony-xubiao
Copy link

我遇到了一模一样的问题

@hello2mao
Copy link

朋友们,换成这个吧,贼爽,https://github.com/Plachtaa/VITS-fast-fine-tuning

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants