Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add 'handle multi-speaker and GST inference' in synthesizer class #3

Closed
wants to merge 1 commit into from

Conversation

kirianguiller
Copy link
Contributor

… refactor/type some functionsHi everyone !

This week, I worked on three things :

  • training a better model for Chinese mandarin trained on 126k epochs (here you can see the associated google colab)

  • handling multi speaker and GST inference in the Synthesizer class (that is used in server.py or in the google colab for Chinese that I mentioned in the first point). Now, you can pass the following two optional parameters to the Synthesizer.tts() method :
    speaker_json_key and style_wav . speaker_json_key is the name of the key of one of the speaker in the provided speakers.json . style_wav is either a path to a wav file for GST style transfer, or is a dict containing the {"token1":0.25, "token2" -0.1, etc...}. *The next step is to also give the user the possibility to directly provide the optional parameter speaker_embedding that is a speaker embedding (as a numpy array or a list?) that will be passed to Tacotron at inference time.

  • I've added some typing and made some refactoring to some functions and methods that appear in the Synthesizer class. I've added one abstract for TTS models and one abstract for Vocoder models to get better hinting from editors when handling with models.

The synthesizer class is now easier to use, and we can see in this google colab that this reduces the number of lines required for having working generation samples.

I look forward for your reviews :)

@erogol
Copy link
Member

erogol commented Feb 23, 2021

I think there are 3 different PRs here :)

If you don't mind it is better the split them apart. It'd make things easier to manage.

  1. New Chinese model
  2. Synthesizer class
  3. Abstraction

@kirianguiller
Copy link
Contributor Author

I think there are 3 different PRs here :)

If you don't mind it is better the split them apart. It'd make things easier to manage.

1. New Chinese model

2. Synthesizer class

3. Abstraction

Thanks for your review, i've just made these 3 mentioned PRs.

I'm therefore closing this one :).

erogol added a commit that referenced this pull request Apr 9, 2021
erogol added a commit that referenced this pull request May 6, 2021
erogol added a commit that referenced this pull request May 11, 2021
eginhard referenced this pull request in idiap/coqui-ai-TTS Apr 2, 2024
gravityrail pushed a commit to gravityrail/TTS that referenced this pull request Jul 8, 2024
Use Python logging instead of print()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants