Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using Parallel WaveGan as vocoder #379

Closed
Charlottecuc opened this issue Jun 22, 2020 · 2 comments
Closed

Using Parallel WaveGan as vocoder #379

Charlottecuc opened this issue Jun 22, 2020 · 2 comments

Comments

@Charlottecuc
Copy link

Hi. I trained the tacotron2 model and can successfully synthesize voice by using WaveGlow as vocoder. However, when I turned to the Parallel WaveGan (https://github.com/kan-bayashi/ParallelWaveGAN) , the synthzised waveform is quite strange:
Screenshot 2020-06-22 at 2 31 07 PM.
Screenshot 2020-06-22 at 2 47 42 PM
(In the training time, the hop_size, sample_rate and window_size were set as the same for the tacotron, WaveGlow and waveGan model.)

Previously, I successfully used WaveGan as vocoder to synthesize speech from the FastSpeech acoustic model. The only difference here is that in FastSpeech, mel-spectrogram features were normalized to have zero mean and unit variance before training.

My question is, in your tacotron2 implementation, except audio_norm = audio / self.max_wav_value , is there any other preprocessing of the input mel-spectrogram features? Or, could you kindly give me some advice?

Thank you very much!

@rafaelvalle
Copy link
Contributor

Take a look at this and this

@Charlottecuc
Copy link
Author

Thank you. For anyone who tried tacotron2+WaveGan and also encountered this problem, see kan-bayashi/ParallelWaveGAN#169

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants