-
Notifications
You must be signed in to change notification settings - Fork 184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
confused about source speaker id in style and rhythm transfer #18
Comments
What you mentioned could've happened during training, for example, when the training and validation filelists have different number of speakers. We circumvent this by first getting a mellotron speaker ids dictionary from the training data and using it for the validation data. |
Thanks for your reply. I've noticed this part of codes in training.
In this case, the mellotron speaker ids are related to the number of speakers in the reference filelist. Then, we do mellotron.forward to get the reference rhythm as below:
where the x contains ref_text, ref_mel, ref_f0 and ref_melltron_speaker_ids, and the generated rhythm will changed if the number of speakers in the reference filelist changed, for the same reference audio. |
During experiments, we noticed that the rhythm (alignment map) we get from Tacotron seems to be independent of providing the correct speaker id. You can try, for example, to provide different speaker ids while using Tacotron as a forced aligner and observe if there is a significant difference. |
Closing due to inactivity. |
Hi, I'm a little confused about the speaker id in the reference audio and text. When doing the style and rhythm transfer, the given reference speaker ids will be re-ordered as 0,1,2,...
data_utils.py
and inference script
In that case, for the same audio like:
"audio_10|text_10|10"
in 2 different filelists
The reference speaker id(10) will be set as mellotron_id=1 and 0 respectively. It would be sure to cause the attention_map(A.K.A rhythm in Mellotron) to be different.
Is it as expected ? Or I've misunderstand somewhere?
The text was updated successfully, but these errors were encountered: