Can I remove the dropout on forward function? #481

EuphoriaCelestial · 2021-05-04T02:55:20Z

As rafaelvalle mentioned here #336 (comment) ; the dropout caused Tacotron model to "say the same phrase in multiple ways". In theory, this is a very interesting, innovation idea to make the voice more human like.
But I found out it also caused some problem, because of the randomness variable, with 1 input sentences, the model sometime give out errors like skipping words, unable to end the audio, repeating a part of sentence. It doesnt happen all the time, like 2-3 times out of 10 inferences; which make it impossible to debug because I dont know when it will broke
So, the main point is I want to remove this feature. How can I do this safely? Because rafaelvalle said I cant just set p=0 to remove it

ntdat017 · 2021-05-04T08:40:33Z

In my opinion, I think you could choose some best random weight replace to dropout that make consistent mel.

EuphoriaCelestial · 2021-05-05T06:14:13Z

In my opinion, I think you could choose some best random weight replace to dropout that make consistent mel.

@ntdat017 can you please explain more on this? how can I do it?

m-toman · 2021-05-05T10:10:30Z

I would say this is a bug turned features, actually and there have been multiple attempts to get rid of dropout during inference.
See for example here: mozilla/TTS#50 (comment)
This "dropping out the dropout" (randomizing dropout probability during training) worked for me when I tried it back then but the results were still not really convincing. As also shown in that thread, there seems to be a batch norm approach that works.

But honestly I just moved on, even Google now runs experiments without attention. https://arxiv.org/abs/2010.04301
Most others did already (DurIAN, the IBM system, FastSpeech, FastPitch, ForwardTacotron etc.) and I feel that's much more robust than messing around with the attention plots and trying all kinds of monotonic attention mechanisms with obscure tricks.

EuphoriaCelestial · 2021-05-07T02:37:57Z

https://arxiv.org/abs/2010.04301

@m-toman where can I find an implementation of this paper? or a TTS project without attention as you mentioned?

ntdat017 · 2021-05-07T09:18:16Z

https://arxiv.org/abs/2010.04301

@m-toman where can I find an implementation of this paper? or a TTS project without attention as you mentioned?

I think that paper from google haven't implemented yet.

In my opinion, I think you could choose some best random weight replace to dropout that make consistent mel.

@ntdat017 can you please explain more on this? how can I do it?

In my way, I random a boolean mask that have probability ~50%, then, change the dropout layer in prenet (at link) by my mask in inference phrase, of course the boolean mask should be choose carefully. In that way, I have consistent mel during inference time, and can debug easily.

m-toman · 2021-05-07T09:44:13Z

Well, I like https://github.com/as-ideas/ForwardTacotron as it's rather simple and slim, no transformers attention etc

But there's also https://github.com/NVIDIA/Nemo implementing different methods

https://github.com/espnet/espnet a few

Also https://github.com/TensorSpeech/TensorFlowTTS

Most got Fastspeech though. Glow TTS is also quite interesting.

Oh and https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/SpeechSynthesis/FastPitch

Personally I do alignment using HTK (for example there's a script in Merlin) but there are different options.

EuphoriaCelestial · 2021-05-07T09:54:21Z

@m-toman Thanks for those links, I want to ask a few more questions
I've tried Fastspeech (from this repo: https://github.com/xcmyz/FastSpeech ) before having error:

File "/FastSpeech/modules.py", line 72, in LR output = alignment @ x RuntimeError: invalid argument 6: wrong matrix size at /pytorch/aten/src/THC/generic/THCTensorMathBlas.cu:499

and I dont fully understand what alignments.zip file does, how can I generate those alignments myself, since I am working with another language. And should I use Fastspeech or Fastspeech 2? what is the difference between those two?

EuphoriaCelestial · 2021-05-07T09:56:17Z

In my way, I random a boolean mask that have probability ~50%, then, change the dropout layer in prenet (at link) by my mask in inference phrase, of course the boolean mask should be choose carefully. In that way, I have consistent mel during inference time, and can debug easily.

@ntdat017 Can I PM you for more detail how to do this? this is a little beyond my level xD

m-toman · 2021-05-07T10:07:50Z

@m-toman Thanks for those links, I want to ask a few more questions
I've tried Fastspeech (from this repo: https://github.com/xcmyz/FastSpeech ) before having error:

File "/FastSpeech/modules.py", line 72, in LR output = alignment @ x RuntimeError: invalid argument 6: wrong matrix size at /pytorch/aten/src/THC/generic/THCTensorMathBlas.cu:499

and I dont fully understand what alignments.zip file does, how can I generate those alignments myself, since I am working with another language. And should I use Fastspeech or Fastspeech 2? what is the difference between those two?

I think this repo expects alignments from external, like extracted from Taco2. Not sure how the others do it. Think in the ForwardTacotron repo there is now another method.

ntdat017 · 2021-05-07T10:23:00Z

In my way, I random a boolean mask that have probability ~50%, then, change the dropout layer in prenet (at link) by my mask in inference phrase, of course the boolean mask should be choose carefully. In that way, I have consistent mel during inference time, and can debug easily.

@ntdat017 Can I PM you for more detail how to do this? this is a little beyond my level xD

@EuphoriaCelestial Sure, you can pm me at ntdat017@gmail.com.

And should I use Fastspeech or Fastspeech 2? what is the difference between those two?

I think you could use Fastspeech 2, easy to training.

But honestly I just moved on, even Google now runs experiments without attention. https://arxiv.org/abs/2010.04301
Most others did already (DurIAN, the IBM system, FastSpeech, FastPitch, ForwardTacotron etc.) and I feel that's much more robust than messing around with the attention plots and trying all kinds of monotonic attention mechanisms with obscure tricks.

@m-toman In my experiment, almost currently non-autoregressive models have lower performance than autoregressive model. How about your experience?

EuphoriaCelestial · 2021-05-07T11:02:50Z

@m-toman In my experiment, almost currently non-autoregressive models have lower performance than autoregressive model. How about your experience?

I have a question, which model is non-autoregressive and which is autoregressive?

Syed044 · 2021-05-21T00:54:42Z

In my opinion, I think you could choose some best random weight replace to dropout that make consistent mel.

can you reply to my question

Hi,

I m new to deep learning, I need to understand 3 things from this project. Please excuse my clumsy question but I need to know the answers.

can I train on my own dataset which is hindi language and text is in latin.( hindi written in english)
python train.py --output_directory=outdir --log_directory=logdir ( what path for the dataset? where do i define the path for my dataset?)
after completing the training which I m assuming it will give me checkpoint file. how do use it or get pretrainned.pt flile?
I m new to this so I need to understand, last question. I've 2 rtx3090 with nvlink and i m using windows 10 and anaconda how do i use both the gpu to train.

Please answer these question.

Regards,
Sid

In my way, I random a boolean mask that have probability ~50%, then, change the dropout layer in prenet (at link) by my mask in inference phrase, of course the boolean mask should be choose carefully. In that way, I have consistent mel during inference time, and can debug easily.
@ntdat017 Can I PM you for more detail how to do this? this is a little beyond my level xD

@EuphoriaCelestial Sure, you can pm me at ntdat017@gmail.com.

And should I use Fastspeech or Fastspeech 2? what is the difference between those two?

I think you could use Fastspeech 2, easy to training.

But honestly I just moved on, even Google now runs experiments without attention. https://arxiv.org/abs/2010.04301
Most others did already (DurIAN, the IBM system, FastSpeech, FastPitch, ForwardTacotron etc.) and I feel that's much more robust than messing around with the attention plots and trying all kinds of monotonic attention mechanisms with obscure tricks.

@m-toman In my experiment, almost currently non-autoregressive models have lower performance than autoregressive model. How about your experience?
Hi,

I m new to deep learning, I need to understand 3 things from this project. Please excuse my clumsy question but I need to know the answers.

can I train on my own dataset which is hindi language and text is in latin.( hindi written in english)
python train.py --output_directory=outdir --log_directory=logdir ( what path for the dataset? where do i define the path for my dataset?)
after completing the training which I m assuming it will give me checkpoint file. how do use it or get pretrainned.pt flile?
I m new to this so I need to understand, last question. I've 2 rtx3090 with nvlink and i m using windows 10 and anaconda how do i use both the gpu to train.

Please answer these question.

Regards,
Sid

Syed044 · 2021-05-21T00:54:54Z

Hi,

I m new to deep learning, I need to understand 3 things from this project. Please excuse my clumsy question but I need to know the answers.

can I train on my own dataset which is hindi language and text is in latin.( hindi written in english)
python train.py --output_directory=outdir --log_directory=logdir ( what path for the dataset? where do i define the path for my dataset?)
after completing the training which I m assuming it will give me checkpoint file. how do use it or get pretrainned.pt flile?
I m new to this so I need to understand, last question. I've 2 rtx3090 with nvlink and i m using windows 10 and anaconda how do i use both the gpu to train.

Please answer these question.

Regards,
Sid

EuphoriaCelestial · 2021-05-21T02:32:59Z

can I train on my own dataset which is hindi language and text is in latin.( hindi written in english)

of course, just change characters list in text/symbols.py and text/cmudict.py to make sure all character in your dataset is included, change cleaner and some file path in hparams.py (just start with basic cleaner). Create a dataset with the same folder structure like LJS and you are good to go

python train.py --output_directory=outdir --log_directory=logdir ( what path for the dataset? where do i define the path for my dataset?)

in hparams.py

after completing the training which I m assuming it will give me checkpoint file. how do use it or get pretrainned.pt flile?

just use the checkpoint file, no need to export .pt file, they are basically the same type

I've 2 rtx3090 with nvlink and i m using windows 10 and anaconda how do i use both the gpu to train.

enable distributed training in hparams.py

Syed044 · 2021-05-21T02:57:50Z

Thank you so very much for a quick reply. I really appreciate that.

sabat84 · 2022-04-11T19:36:36Z

@EuphoriaCelestial Hi. I am using Nvidia\Tacotron 2 to train my own data (20 hour of Kurdish data) which is different from English language I have some questions:

should I use pre-trained english model to train my model? or I have to train from scratch? I tried 3 times to train from scratch with batch size 40 but the model didn't converge.
I changed characters list in text/symbols.py but didn't change valid_symbol list in text/cmudict.py?

EuphoriaCelestial · 2022-04-12T06:20:58Z

Kurdish

Technically, you can use the pre-trained English model to start with any language. In some rare situation you will encounter audio quality problem, word skipping, loop, ... but most of the time, it worked well for me
I don't quite sure what do you mean, but you only need to change character list in text/symbols.py, the text/cmudict.py file is just an addition, only required for English in this case.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can I remove the dropout on forward function? #481

Can I remove the dropout on forward function? #481

EuphoriaCelestial commented May 4, 2021

ntdat017 commented May 4, 2021

EuphoriaCelestial commented May 5, 2021

m-toman commented May 5, 2021

EuphoriaCelestial commented May 7, 2021 •

edited

Loading

ntdat017 commented May 7, 2021

m-toman commented May 7, 2021

EuphoriaCelestial commented May 7, 2021

EuphoriaCelestial commented May 7, 2021

m-toman commented May 7, 2021

ntdat017 commented May 7, 2021 •

edited

Loading

EuphoriaCelestial commented May 7, 2021

Syed044 commented May 21, 2021

Syed044 commented May 21, 2021

EuphoriaCelestial commented May 21, 2021

Syed044 commented May 21, 2021

sabat84 commented Apr 11, 2022

EuphoriaCelestial commented Apr 12, 2022

Can I remove the dropout on forward function? #481

Can I remove the dropout on forward function? #481

Comments

EuphoriaCelestial commented May 4, 2021

ntdat017 commented May 4, 2021

EuphoriaCelestial commented May 5, 2021

m-toman commented May 5, 2021

EuphoriaCelestial commented May 7, 2021 • edited Loading

ntdat017 commented May 7, 2021

m-toman commented May 7, 2021

EuphoriaCelestial commented May 7, 2021

EuphoriaCelestial commented May 7, 2021

m-toman commented May 7, 2021

ntdat017 commented May 7, 2021 • edited Loading

EuphoriaCelestial commented May 7, 2021

Syed044 commented May 21, 2021

Syed044 commented May 21, 2021

EuphoriaCelestial commented May 21, 2021

Syed044 commented May 21, 2021

sabat84 commented Apr 11, 2022

EuphoriaCelestial commented Apr 12, 2022

EuphoriaCelestial commented May 7, 2021 •

edited

Loading

ntdat017 commented May 7, 2021 •

edited

Loading