Data driven AI voice cloning

This repository is an implementation of the main part of my master thesis in Data science & Engineering. It is divided in two part:

Speaker Encoder

models: ECAPA-TDNN, wavlm-series

data: VoxCeleb1, private dataset

Text-to-speech

model: FastSpeech2 (microsoft implementation)

data: LibriTTS

This two part are then integrated to achieve a Multi Speaker Text to Speech model that is capable of cloning unseen voices starting from about 5 seconds of audio, the ZeroShotFastSpeech2 model.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Data driven AI voice cloning

Files

README.md

Latest commit

History

README.md

File metadata and controls

Data driven AI voice cloning