Not compatible with nvidia-tacotron? #175

chazo1994 · 2020-06-30T12:35:06Z

I trained Multiband-Melgan model and intergrate with Nvidia-tacotron2 model, I also use this comment to make it work. But the results voice is bad with discontinuous pitch. The melspechtrogram below show the difference melspectrogram of output wave files of tacotron2+waveglow and tacotron2+MB_melgan (tacotron2+waveglow have great audio output). I try to replace the preprocess of this repos by nvidia-tacotron2 repos, but the results is same.

Tacotron2+waveglow:

Tacotron2+Mb_Melgan:

Tacotron2+Mb_Melgan (Replaced preprocess):

I also attached results audio.
results.zip

@kan-bayashi Can you have any idea to fix this problem?

kan-bayashi · 2020-06-30T12:39:58Z

I want to clarify what is the problem.
When you synthesize the audio with natural features, how is the quality?
If the quality is still bad, we need to tune the hyperparameters of MB-MelGAN training.

chazo1994 · 2020-06-30T12:41:32Z

When you synthesize the audio with natural features, how

The quality of audio which generated from natural features is very good.

kan-bayashi · 2020-06-30T12:45:01Z

Could you share the sample of MB-MelGAN with natural features?
If the audio sounds good, I think there are something mismatched between the models.
Please describe the feature extraction setting.

chazo1994 · 2020-06-30T14:44:28Z

@kan-bayashi Oke, I will report it tomorrow. Hope you help me.

chazo1994 · 2020-07-01T03:18:25Z

@kan-bayashi
Here is the sample of MB-MelGAN with natural features (the audio sound is very good):
sample.zip
Melspectrogram of sample:

I also compare the audio and melspec output of nvidia-tacotron2 in tranning phase, and the input of MB-MelGAN in trainning phase, both audio and melspec is the same between nvidia-tacotron2 and MB-MelGan if I replace MB-MelGAN preprocessing stage.

The Feature extraction setting of MB-MelGAN:

###########################################################
#                FEATURE EXTRACTION SETTING               #
###########################################################
sampling_rate: 22050     # Sampling rate.
fft_size: 1024           # FFT size.
hop_size: 256            # Hop size.
win_length: 1024         # Window length.
                         # If set to null, it will be the same as fft_size.
window: "hann"           # Window function.
num_mels: 80             # Number of mel basis.
fmin: 80                 # Minimum freq in mel basis calculation.
fmax: 7600               # Maximum frequency in mel basis calculation.
global_gain_scale: 1.0   # Will be multiplied to all of waveform.
trim_silence: true       # Whether to trim the start and end of silence.
trim_threshold_in_db: 60 # Need to tune carefully if the recording is not good.
trim_frame_size: 2048    # Frame size in trimming.
trim_hop_size: 512       # Hop size in trimming.
format: "hdf5"           # Feature file format. "npy" or "hdf5" is supported.

kan-bayashi · 2020-07-01T03:50:24Z

I also compare the audio and melspec output of nvidia-tacotron2 in tranning phase, and the input of MB-MelGAN in trainning phase, both audio and melspec is the same between nvidia-tacotron2 and MB-MelGan if I replace MB-MelGAN preprocessing stage.

OK. How did you perform normalization?
Did you use nomralized features for both text2mel and vocoder models?

chazo1994 · 2020-07-01T04:50:47Z

OK. How did you perform normalization?

In the case of use this comment I keep normalized features in trainning phase of MB-MelGan and use original preprocess of nvidia-tacotron2. In the inference phase, I generate melspec from tacotron2 and then convert it by using that code to compatible with Melgan, finally I generate audio from converted melspec.

In the case of replace preprocess of Mb-MelGan by nvidia-tacotron2 preprocess, I remove normalize procedure of MB-melgan both tranning and infererence stage.

In addition, I generate audio from one generated melspec output of nvidia-tacotron2 with Waveglow and MB-Melgan, and I see that the pulse of MB-Melgan output audio is not continuous:

kan-bayashi · 2020-07-01T05:05:16Z

In the case of replace preprocess of Mb-MelGan by nvidia-tacotron2 preprocess, I remove normalize procedure of MB-melgan both training and inference stage.

OK. Then, did you use the same files to train the vocoder and the model?
If you just replace the function, please try to generate audio using the mel-spectrogram file which exactly used for the training of tacotron2.

In addition, I generate audio from one generated melspec output of nvidia-tacotron2 with Waveglow and MB-Melgan, and I see that the pulse of MB-Melgan output audio is not continuous:

What is the difference compared to the sample you shared?
When I heard your sample, the audio quality is clearly different between GT and generated features.
So I wonder that there is a bug in your code.
But if the quality degradation is reasonable, that may be the problem of MB-MelGAN.

chazo1994 · 2020-07-01T05:28:52Z

If you just replace the function, please try to generate audio using the mel-spectrogram file which exactly used for the training of tacotron2.

I did this way, and get the bad audio quality (same that quality result of tacotron2+MB-Melgan), so I will debug this point and report results here. Please, wait my response.

When I heard your sample, the audio quality is clearly different between GT and generated features.

the quality of GT audio and generated audio by MB-MelGan from natural features is same.

kan-bayashi · 2020-07-01T08:56:50Z

I did this way, and get the bad audio quality (same that quality result of tacotron2+MB-Melgan), so I will debug this point and report results here. Please, wait my response.

Then, there is a bug in your code.
Please carefully check the difference (e.g., log_e vs log_10).

chazo1994 · 2020-07-02T03:08:40Z

The problem is resolved, I realized that the cause of the problem was because I kept the same fmax and fmin as the default configuration of this repos while fmax and fmin of nividia-tacotron2 is different.

kan-bayashi added the question Further information is requested label Jun 30, 2020

chazo1994 closed this as completed Jul 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Not compatible with nvidia-tacotron? #175

Not compatible with nvidia-tacotron? #175

chazo1994 commented Jun 30, 2020 •

edited

Loading

kan-bayashi commented Jun 30, 2020

chazo1994 commented Jun 30, 2020

kan-bayashi commented Jun 30, 2020

chazo1994 commented Jun 30, 2020

chazo1994 commented Jul 1, 2020 •

edited by kan-bayashi

Loading

kan-bayashi commented Jul 1, 2020 •

edited

Loading

chazo1994 commented Jul 1, 2020

kan-bayashi commented Jul 1, 2020 •

edited

Loading

chazo1994 commented Jul 1, 2020

kan-bayashi commented Jul 1, 2020

chazo1994 commented Jul 2, 2020

Not compatible with nvidia-tacotron? #175

Not compatible with nvidia-tacotron? #175

Comments

chazo1994 commented Jun 30, 2020 • edited Loading

kan-bayashi commented Jun 30, 2020

chazo1994 commented Jun 30, 2020

kan-bayashi commented Jun 30, 2020

chazo1994 commented Jun 30, 2020

chazo1994 commented Jul 1, 2020 • edited by kan-bayashi Loading

kan-bayashi commented Jul 1, 2020 • edited Loading

chazo1994 commented Jul 1, 2020

kan-bayashi commented Jul 1, 2020 • edited Loading

chazo1994 commented Jul 1, 2020

kan-bayashi commented Jul 1, 2020

chazo1994 commented Jul 2, 2020

chazo1994 commented Jun 30, 2020 •

edited

Loading

chazo1994 commented Jul 1, 2020 •

edited by kan-bayashi

Loading

kan-bayashi commented Jul 1, 2020 •

edited

Loading

kan-bayashi commented Jul 1, 2020 •

edited

Loading