-
-
Notifications
You must be signed in to change notification settings - Fork 343
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Not compatible with nvidia-tacotron? #175
Comments
I want to clarify what is the problem. |
The quality of audio which generated from natural features is very good. |
Could you share the sample of MB-MelGAN with natural features? |
@kan-bayashi Oke, I will report it tomorrow. Hope you help me. |
@kan-bayashi The Feature extraction setting of MB-MelGAN: ###########################################################
# FEATURE EXTRACTION SETTING #
###########################################################
sampling_rate: 22050 # Sampling rate.
fft_size: 1024 # FFT size.
hop_size: 256 # Hop size.
win_length: 1024 # Window length.
# If set to null, it will be the same as fft_size.
window: "hann" # Window function.
num_mels: 80 # Number of mel basis.
fmin: 80 # Minimum freq in mel basis calculation.
fmax: 7600 # Maximum frequency in mel basis calculation.
global_gain_scale: 1.0 # Will be multiplied to all of waveform.
trim_silence: true # Whether to trim the start and end of silence.
trim_threshold_in_db: 60 # Need to tune carefully if the recording is not good.
trim_frame_size: 2048 # Frame size in trimming.
trim_hop_size: 512 # Hop size in trimming.
format: "hdf5" # Feature file format. "npy" or "hdf5" is supported. |
OK. How did you perform normalization? |
In the case of use this comment I keep normalized features in trainning phase of MB-MelGan and use original preprocess of nvidia-tacotron2. In the inference phase, I generate melspec from tacotron2 and then convert it by using that code to compatible with Melgan, finally I generate audio from converted melspec. In the case of replace preprocess of Mb-MelGan by nvidia-tacotron2 preprocess, I remove normalize procedure of MB-melgan both tranning and infererence stage. In addition, I generate audio from one generated melspec output of nvidia-tacotron2 with Waveglow and MB-Melgan, and I see that the pulse of MB-Melgan output audio is not continuous: |
OK. Then, did you use the same files to train the vocoder and the model?
What is the difference compared to the sample you shared? |
I did this way, and get the bad audio quality (same that quality result of tacotron2+MB-Melgan), so I will debug this point and report results here. Please, wait my response.
the quality of GT audio and generated audio by MB-MelGan from natural features is same. |
Then, there is a bug in your code. |
The problem is resolved, I realized that the cause of the problem was because I kept the same fmax and fmin as the default configuration of this repos while fmax and fmin of nividia-tacotron2 is different. |
I trained Multiband-Melgan model and intergrate with Nvidia-tacotron2 model, I also use this comment to make it work. But the results voice is bad with discontinuous pitch. The melspechtrogram below show the difference melspectrogram of output wave files of tacotron2+waveglow and tacotron2+MB_melgan (tacotron2+waveglow have great audio output). I try to replace the preprocess of this repos by nvidia-tacotron2 repos, but the results is same.
Tacotron2+waveglow:
Tacotron2+Mb_Melgan:
Tacotron2+Mb_Melgan (Replaced preprocess):
I also attached results audio.
results.zip
@kan-bayashi Can you have any idea to fix this problem?
The text was updated successfully, but these errors were encountered: