Audio-driven Talking Head Generation and Evaluation

Official codebase for the Paper:

"Can One Model Fit All? An Exploration of Wav2Lip’s Lip-Syncing Generalizability Across Culturally Distinct Languages" , presented at ICCSA2024.

🎥 Watch the presentation here.

Content

Lip Synchronization

Lip synchronization (LipSync) is a cutting-edge technology capable of generating highly realistic talking head videos by aligning lip movements precisely with spoken audio. You can use the lipsync_gui.py script to generate lip-synced videos for any face and language. (can be used without GPU 🔥)

Face-to-Face Translation

SOON!

Evaluation

Evaluate the quality of lip synchronization with this Colab notebook in the evaluation folder. It computes LipSync Error Distance (LSE-D) and LipSync Error Confidence (LSE-C) values. These metrics are specifically designed to quantitatively assess lip synchronization accuracy.

Quick Start

To generate LipSynced videos, follow these steps:

Create conda environment and install packages:

conda env create -f environment.yml
conda activate LipSync_GUI

Download the weights for the pre-trained face detection model from this link and place it in wav2Lip\face_detection\detection\sfd folder.
Download the weights for the LipSync models (Wav2Lip and Wav2Lip_gan) and place it in the wav2Lip\checkpoints.
Run app\lipsync_gui.py and use the Gradio app to generate videos.

Tips:

For initial testing, you can use the input audio and faces saved in the resources directory.
The generated videos are saved in the app\results by default.

Examples Outputs

Check the misc folder of this repo.

Acknowledgments

We extend our sincere gratitude to authors of this paper for their pioneering research.

Citation 📃

Rafiei Oskooei, A., Yahsi, E., Sungur, M., S. Aktas, M. (2024). Can One Model Fit All? An Exploration of Wav2Lip’s Lip-Syncing Generalizability Across Culturally Distinct Languages. Computational Science and Its Applications – ICCSA 2024 Workshops. ICCSA 2024. Lecture Notes in Computer Science, vol 14819. Springer, Cham. https://doi.org/10.1007/978-3-031-65282-0_10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Audio-driven Talking Head Generation and Evaluation

Content

Lip Synchronization

Face-to-Face Translation

Evaluation

Quick Start

Examples Outputs

Acknowledgments

Citation 📃

Files

README.md

Latest commit

History

README.md

File metadata and controls

Audio-driven Talking Head Generation and Evaluation

Content

Lip Synchronization

Face-to-Face Translation

Evaluation

Quick Start

Examples Outputs

Acknowledgments

Citation 📃