Official codebase for the Paper:
"Can One Model Fit All? An Exploration of Wav2Lip’s Lip-Syncing Generalizability Across Culturally Distinct Languages" , presented at ICCSA2024.
🎥 Watch the presentation here.
Lip synchronization (LipSync) is a cutting-edge technology capable of generating highly realistic talking head videos by aligning lip movements precisely with spoken audio.
You can use the lipsync_gui.py
script to generate lip-synced videos for any face and language. (can be used without GPU 🔥)
SOON!
Evaluate the quality of lip synchronization with this Colab notebook in the evaluation
folder. It computes LipSync Error Distance (LSE-D) and LipSync Error Confidence (LSE-C) values. These metrics are specifically designed to quantitatively assess lip synchronization accuracy.
To generate LipSynced videos, follow these steps:
- Create conda environment and install packages:
conda env create -f environment.yml
conda activate LipSync_GUI
-
Download the weights for the pre-trained face detection model from this link and place it in
wav2Lip\face_detection\detection\sfd
folder. -
Download the weights for the LipSync models (Wav2Lip and Wav2Lip_gan) and place it in the
wav2Lip\checkpoints
. -
Run
app\lipsync_gui.py
and use the Gradio app to generate videos.
Tips:
- For initial testing, you can use the input audio and faces saved in the
resources
directory. - The generated videos are saved in the
app\results
by default.
Check the misc
folder of this repo.
We extend our sincere gratitude to authors of this paper for their pioneering research.
Rafiei Oskooei, A., Yahsi, E., Sungur, M., S. Aktas, M. (2024). Can One Model Fit All? An Exploration of Wav2Lip’s Lip-Syncing Generalizability Across Culturally Distinct Languages. Computational Science and Its Applications – ICCSA 2024 Workshops. ICCSA 2024. Lecture Notes in Computer Science, vol 14819. Springer, Cham. https://doi.org/10.1007/978-3-031-65282-0_10