Using Parallel WaveGan as vocoder #379

Charlottecuc · 2020-06-22T06:49:53Z

Hi. I trained the tacotron2 model and can successfully synthesize voice by using WaveGlow as vocoder. However, when I turned to the Parallel WaveGan (https://github.com/kan-bayashi/ParallelWaveGAN) , the synthzised waveform is quite strange:
.

(In the training time, the hop_size, sample_rate and window_size were set as the same for the tacotron, WaveGlow and waveGan model.)

Previously, I successfully used WaveGan as vocoder to synthesize speech from the FastSpeech acoustic model. The only difference here is that in FastSpeech, mel-spectrogram features were normalized to have zero mean and unit variance before training.

My question is, in your tacotron2 implementation, except audio_norm = audio / self.max_wav_value , is there any other preprocessing of the input mel-spectrogram features? Or, could you kindly give me some advice?

Thank you very much!

The text was updated successfully, but these errors were encountered:

rafaelvalle · 2020-06-25T20:48:59Z

Take a look at this and this

Charlottecuc · 2020-06-28T08:31:04Z

Thank you. For anyone who tried tacotron2+WaveGan and also encountered this problem, see kan-bayashi/ParallelWaveGAN#169

Charlottecuc closed this as completed Jun 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using Parallel WaveGan as vocoder #379

Using Parallel WaveGan as vocoder #379

Charlottecuc commented Jun 22, 2020

rafaelvalle commented Jun 25, 2020

Charlottecuc commented Jun 28, 2020

Using Parallel WaveGan as vocoder #379

Using Parallel WaveGan as vocoder #379

Comments

Charlottecuc commented Jun 22, 2020

rafaelvalle commented Jun 25, 2020

Charlottecuc commented Jun 28, 2020