-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can I remove the dropout on forward function? #481
Comments
In my opinion, I think you could choose some best random weight replace to dropout that make consistent mel. |
@ntdat017 can you please explain more on this? how can I do it? |
I would say this is a bug turned features, actually and there have been multiple attempts to get rid of dropout during inference. But honestly I just moved on, even Google now runs experiments without attention. https://arxiv.org/abs/2010.04301 |
@m-toman where can I find an implementation of this paper? or a TTS project without attention as you mentioned? |
I think that paper from google haven't implemented yet.
In my way, I random a boolean mask that have probability ~50%, then, change the dropout layer in prenet (at link) by my mask in inference phrase, of course the boolean mask should be choose carefully. In that way, I have consistent mel during inference time, and can debug easily. |
Well, I like https://github.com/as-ideas/ForwardTacotron as it's rather simple and slim, no transformers attention etc But there's also https://github.com/NVIDIA/Nemo implementing different methods https://github.com/espnet/espnet a few Also https://github.com/TensorSpeech/TensorFlowTTS Most got Fastspeech though. Glow TTS is also quite interesting. Oh and https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/SpeechSynthesis/FastPitch Personally I do alignment using HTK (for example there's a script in Merlin) but there are different options. |
@m-toman Thanks for those links, I want to ask a few more questions
and I dont fully understand what |
@ntdat017 Can I PM you for more detail how to do this? this is a little beyond my level xD |
I think this repo expects alignments from external, like extracted from Taco2. Not sure how the others do it. Think in the ForwardTacotron repo there is now another method. |
@EuphoriaCelestial Sure, you can pm me at ntdat017@gmail.com.
I think you could use Fastspeech 2, easy to training.
@m-toman In my experiment, almost currently non-autoregressive models have lower performance than autoregressive model. How about your experience? |
I have a question, which model is non-autoregressive and which is autoregressive? |
can you reply to my question Hi, I m new to deep learning, I need to understand 3 things from this project. Please excuse my clumsy question but I need to know the answers. can I train on my own dataset which is hindi language and text is in latin.( hindi written in english) Please answer these question. Regards,
I m new to deep learning, I need to understand 3 things from this project. Please excuse my clumsy question but I need to know the answers. can I train on my own dataset which is hindi language and text is in latin.( hindi written in english) Please answer these question. Regards, |
Hi, I m new to deep learning, I need to understand 3 things from this project. Please excuse my clumsy question but I need to know the answers. can I train on my own dataset which is hindi language and text is in latin.( hindi written in english) Please answer these question. Regards, |
of course, just change characters list in text/symbols.py and text/cmudict.py to make sure all character in your dataset is included, change cleaner and some file path in hparams.py (just start with basic cleaner). Create a dataset with the same folder structure like LJS and you are good to go
in hparams.py
just use the checkpoint file, no need to export .pt file, they are basically the same type
enable distributed training in hparams.py |
Thank you so very much for a quick reply. I really appreciate that. |
@EuphoriaCelestial Hi. I am using Nvidia\Tacotron 2 to train my own data (20 hour of Kurdish data) which is different from English language I have some questions:
|
|
As rafaelvalle mentioned here #336 (comment) ; the dropout caused Tacotron model to "say the same phrase in multiple ways". In theory, this is a very interesting, innovation idea to make the voice more human like.
But I found out it also caused some problem, because of the randomness variable, with 1 input sentences, the model sometime give out errors like skipping words, unable to end the audio, repeating a part of sentence. It doesnt happen all the time, like 2-3 times out of 10 inferences; which make it impossible to debug because I dont know when it will broke
So, the main point is I want to remove this feature. How can I do this safely? Because rafaelvalle said I cant just set p=0 to remove it
The text was updated successfully, but these errors were encountered: