EAMM: One-Shot Emotional Talking Face via Audio-Based Emotion-Aware Motion Model [SIGGRAPH 2022 Conference]

Xinya Ji, Hang Zhou, Kaisiyuan Wang, Qianyi Wu, Wayne Wu, Feng Xu, Xun Cao

Given a single portrait image, we can synthesize emotional talking faces, where mouth movements match the input audio and facial emotion dynamics follow the emotion source video.

Installation

We train and test based on Python3.6 and Pytorch. To install the dependencies run:

pip install -r requirements.txt

Testing

Download the pre-trained models and data under the following link: google-drive and put the file in corresponding places.
Run the demo：

python demo.py --source_image path/to/image --driving_video path/to/emotion_video --pose_file path/to/pose --in_file path/to/audio --emotion emotion_type
Prepare testing data：

prepare source_image -- crop_image in process_data.py

prepare driving_video -- crop_image_tem in process_data.py

prepare pose -- detect pose using 3DDFA_V2

Training

Training data structure:

./data/<dataset_name>
├──fomm_crop
│  ├──id/file_name   # cropped images
│  │  ├──0.png
│  │  ├──...
├──fomm_pose_crop
│  ├──id   
│  │  ├──file_name.npy  # pose of the cropped images
│  │  ├──...
├──MFCC
│  ├──id   
│  │  ├──file_name.npy  # MFCC of the audio
│  │  ├──...


*The cropped images are generated by 'crop_image_tem' in process_data.py
*The pose of the cropped video are generated by 3DDFA_V2/demo.py
*The MFCC of the audio are generated by 'audio2mfcc' in process_data.py

Step 1 : Train the Audio2Facial-Dynamics Module using LRW dataset

python run.py --config config/train_part1.yaml --mode train_part1 --checkpoint log/124_52000.pth.tar
Step 2 : Fine-tune the Audio2Facial-Dynamics Module after getting stable results from step1

python run.py --config config/train_part1_fine_tune.yaml --mode train_part1_fine_tune --checkpoint log/124_52000.pth.tar --audio_chechpoint checkpoint/from/step_1
Setp 3 : Train the Implicit Emotion Displacement Learner

python run.py --config config/train_part2.yaml --mode train_part2 --checkpoint log/124_52000.pth.tar --audio_chechpoint checkpoint/from/step_2

Citation

@inproceedings{10.1145/3528233.3530745,
author = {Ji, Xinya and Zhou, Hang and Wang, Kaisiyuan and Wu, Qianyi and Wu, Wayne and Xu, Feng and Cao, Xun},
title = {EAMM: One-Shot Emotional Talking Face via Audio-Based Emotion-Aware Motion Model},
year = {2022},
isbn = {9781450393379},
url = {https://doi.org/10.1145/3528233.3530745},
doi = {10.1145/3528233.3530745},
booktitle = {ACM SIGGRAPH 2022 Conference Proceedings},
series = {SIGGRAPH '22}
}

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
3DDFA_V2		3DDFA_V2
config		config
dataset		dataset
demo		demo
modules		modules
sync_batchnorm		sync_batchnorm
test		test
LICENSE		LICENSE
M003_template.npy		M003_template.npy
README.md		README.md
augmentation.py		augmentation.py
demo.py		demo.py
filter1.py		filter1.py
frames_dataset.py		frames_dataset.py
logger.py		logger.py
ops.py		ops.py
process_data.py		process_data.py
requirements.txt		requirements.txt
run.py		run.py
teaser-1.png		teaser-1.png
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EAMM: One-Shot Emotional Talking Face via Audio-Based Emotion-Aware Motion Model [SIGGRAPH 2022 Conference]

Installation

Testing

Training

Citation

About

Releases

Packages

Languages

License

jixinya/EAMM

Folders and files

Latest commit

History

Repository files navigation

EAMM: One-Shot Emotional Talking Face via Audio-Based Emotion-Aware Motion Model [SIGGRAPH 2022 Conference]

Installation

Testing

Training

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages