Skip to content

[TVCG 2024] ReactFace: Online Multiple Appropriate Facial Reaction Generation in Dyadic Interactions

License

Notifications You must be signed in to change notification settings

lingjivoo/ReactFace

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

60 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

ReactFace: Online Multiple Appropriate Facial Reaction Generation in Dyadic Interactions

Project Page Paper Paper Code

generated_sample1.mp4
generated_sample2.mp4
generated_sample3.mp4

πŸ“’ News

  • Our paper has been accepted by IEEE Transactions on Visualization and Computer Graphics (TVCG)! πŸŽ‰πŸŽ‰ (Oct/2024)

πŸ“‹ Table of Contents

πŸ› οΈ Installation

Prerequisites

  • Python 3.8+
  • PyTorch 1.9+
  • CUDA 11.8+

Setup Environment

Create and activate conda environment

conda create -n react python=3.9
conda activate react

Install PyTorch

pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118

Install PyTorch3D

pip install --no-index --no-cache-dir pytorch3d -f https://dl.fbaipublicfiles.com/pytorch3d/packaging/wheels/py39_cu118_pyt201/download.html

Install other dependencies

pip install -r requirements.txt

πŸ‘¨β€πŸ« Getting Started

1. Data Preparation

Download and Setup Dataset

The REACT 2023/2024 Multimodal Challenge Dataset is compiled from the following public datasets for studying dyadic interactions:

Apply for data access at:

Data organization (data/) follows this structure:

data/partition/modality/site/chat_index/person_index/clip_index/actual_data_files

Example data structure:

data
β”œβ”€β”€ test
β”œβ”€β”€ val
β”œβ”€β”€ train
   β”œβ”€β”€ Video_files
       β”œβ”€β”€ NoXI
           β”œβ”€β”€ 010_2016-03-25_Paris
               β”œβ”€β”€ Expert_video
               β”œβ”€β”€ Novice_video
                   β”œβ”€β”€ 1
                       β”œβ”€β”€ 1.png
                       β”œβ”€β”€ ....
                       β”œβ”€β”€ 751.png
                   β”œβ”€β”€ ....
           β”œβ”€β”€ ....
       β”œβ”€β”€ RECOLA
   β”œβ”€β”€ Audio_files
       β”œβ”€β”€ NoXI
       β”œβ”€β”€ RECOLA
           β”œβ”€β”€ group-1
               β”œβ”€β”€ P25 
               β”œβ”€β”€ P26
                   β”œβ”€β”€ 1.wav
                   β”œβ”€β”€ ....
           β”œβ”€β”€ group-2
           β”œβ”€β”€ group-3
   β”œβ”€β”€ Emotion
       β”œβ”€β”€ NoXI
       β”œβ”€β”€ RECOLA
           β”œβ”€β”€ group-1
               β”œβ”€β”€ P25 
               β”œβ”€β”€ P26
                   β”œβ”€β”€ 1.csv
                   β”œβ”€β”€ ....
           β”œβ”€β”€ group-2
           β”œβ”€β”€ group-3
   β”œβ”€β”€ 3D_FV_files
       β”œβ”€β”€ NoXI
       β”œβ”€β”€ RECOLA
           β”œβ”€β”€ group-1
               β”œβ”€β”€ P25 
               β”œβ”€β”€ P26
                   β”œβ”€β”€ 1.npy
                   β”œβ”€β”€ ....
           β”œβ”€β”€ group-2
           β”œβ”€β”€ group-3

Important details:

  • Task: Predict one role's reaction ('Expert' or 'Novice', 'P25' or 'P26') to the other
  • 3D_FV_files contain 3DMM coefficients (expression: 52 dim, angle: 3 dim, translation: 3 dim)
  • Video specifications:
    • Frame rate: 25 fps
    • Resolution: 256x256
    • Clip length: 751 frames (~30s)
    • Audio sampling rate: 44100
  • CSV files for training/validation are available at: 'data/train.csv', 'data/val.csv', 'data/test.csv'
Download Additional Resources
  1. Listener Reaction Neighbors
    • Download the appropriate listener reaction neighbors dataset from here
    • Place the downloaded files in the dataset root folder
  2. Ground Truth 3DMMs
    • Download the ground truth 3DMMs (test set) for speaker-listener evaluation from here
    • Place the downloaded files in the metric/gt folder

2. External Tool Preparation

Required Models and Tools

We use 3DMM coefficients for 3D listener/speaker representation and 3D-to-2D frame rendering.

  1. 3DMM Model Setup

  2. PIRender Setup

    • We use PIRender for 3D-to-2D rendering
    • Download our retrained checkpoint (cur_model_fold.pth)
    • Place in external/PIRender/

3. Training

Training Options

Training with rendering during training:

python train.py \
  --batch-size 8 \
  --window-size 64 \
  --momentum 0.1 \
  --gpu-ids 0 \
  -lr 0.00002 \
  -e 200 \
  -j 4 \
  --sm-p 10 \
  --kl-p 0.00001 \
  --div-p 100 \
  --rendering \
  --outdir results/train-reactface

Training without rendering during validation (faster):

python train.py \
  --batch-size 8 \
  --window-size 64 \
  --momentum 0.1 \
  --gpu-ids 0 \
  -lr 0.00002 \
  -e 200 \
  -j 4 \
  --sm-p 10 \
  --kl-p 0.00001 \
  --div-p 100 \
  --outdir results/train-reactface

4. Evaluation

Generate Results

To generate listener reactions using a trained ReactFace model, run:

python evaluate.py \
  --split test \
  --batch-size 16 \
  --window-size 8 \
  --momentum 0.9 \
  --gpu-ids 0 \
  -j 4 \
  --rendering \
  --outdir results/eval \
  --resume results/training-reactface/best_checkpoint.pth
Metric-based Evaluations Our evaluation methodology is based on established research in Multiple Appropriate Listener Reaction:

Paper1 Paper2 Paper3

Metrics Overview

Diversity Metrics
  • FRDvs: Measures diversity across speaker behavior conditions
  • FRVar: Evaluates diversity within a single generated facial reaction sequence
  • FRDiv: Assesses diversity of different generated listener reactions to the same speaker behavior
Quality Metrics
  • FRRea: Uses FrΓ©chet Video Distance (FVD) to evaluate realism of generated video sequences
  • FRCorr: Measures appropriateness by correlating each generated facial reaction with its most similar real facial reaction
  • FRSyn: Evaluates synchronization between generated listener reactions and varying speaker sequences

Running Evaluation

Execute the following command to compute all metrics:

python evaluate_metric.py \
  --split test \
  --gt-speaker-3dmm-path ./metric/gt/tdmm_speaker.npy \
  --gt-listener-3dmm-path ./metric/gt/tdmm_listener.npy \
  --gn-listener-3dmm-path ./results/eval/test/coeffs/tdmm_10x.npy

Assessing realism by FVD:

  • Download model(rgb_imagenet.pt) from the lib
  • Put the model to the folder metric/FVD/pytorch_i3d_model/models
  • Execute the following command to compute the FVD metric:
python metric/FVD/fvd_eval.py \
  --source-dir PATH/TO/A-COLLECTION-OF-GT-LISTENER-VIDEOS \
  --target-dir /path/to/your/generated/videos \
  --model-path metric/FVD/pytorch_i3d_model/models/rgb_imagenet.pt \
  --num-videos 100 \
  --frame-size 224 \
  --max-frames 750

5. Customized Inference

Generate Dyadic Reaction with Custom Speaker Video

Execute the following command to generate a listener's reaction to your speaker video:

python dyadic_reaction_inference.py \
    --speaker-video /path/to/your_video.mp4 \
    --speaker-audio /path/to/your_audio.wav \
    --listener-portrait /path/to/your_portrait.png \
    --window-size 8 \
    --momentum 0.9 \
    --output-dir results/customized_inference \
    --checkpoint results/training-reactface/best_checkpoint.pth

Required Inputs:

  • speaker-video: Path to the input speaker video file (MP4 format)
  • speaker-audio: Path to the speaker's audio file (WAV format)
  • listener-portrait: Path to the portrait photo of your custom listener (PNG format)

Optional Parameters:

  • window-size: Size of the temporal window (default: 8)
  • momentum: controlling speed (default: 0.9)
  • output-dir: Directory for saving generated results
  • checkpoint: Path to the trained model checkpoint

πŸ–ŠοΈ Citation

If this work helps in your research, please cite the following papers:

@article{10756784,
  author={Luo, Cheng and Song, Siyang and Xie, Weicheng and Spitale, Micol and Ge, Zongyuan and Shen, Linlin and Gunes, Hatice},
  journal={IEEE Transactions on Visualization and Computer Graphics}, 
  title={ReactFace: Online Multiple Appropriate Facial Reaction Generation in Dyadic Interactions}, 
  year={2024},
  volume={},
  number={},
  pages={1-18},
}


@article{luo2023reactface,
  title={Reactface: Multiple appropriate facial reaction generation in dyadic interactions},
  author={Luo, Cheng and Song, Siyang and Xie, Weicheng and Spitale, Micol and Shen, Linlin and Gunes, Hatice},
  journal={arXiv preprint arXiv:2305.15748},
  year={2023}
}

🀝 Acknowledgements

Thanks to the open source of the following projects:

About

[TVCG 2024] ReactFace: Online Multiple Appropriate Facial Reaction Generation in Dyadic Interactions

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages