ReactFace: Online Multiple Appropriate Facial Reaction Generation in Dyadic Interactions

generated_sample1.mp4

generated_sample2.mp4

generated_sample3.mp4

📢 News

Our paper has been accepted by IEEE Transactions on Visualization and Computer Graphics (TVCG)! 🎉🎉 (Oct/2024)

📋 Table of Contents

Installation
Getting Started
Citation
Acknowledgements

🛠️ Installation

Prerequisites

Python 3.8+
PyTorch 1.9+
CUDA 11.8+

Setup Environment

Create and activate conda environment

conda create -n react python=3.9
conda activate react

Install PyTorch

pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118

Install PyTorch3D

pip install --no-index --no-cache-dir pytorch3d -f https://dl.fbaipublicfiles.com/pytorch3d/packaging/wheels/py39_cu118_pyt201/download.html

Install other dependencies

pip install -r requirements.txt

👨‍🏫 Getting Started

1. Data Preparation

Download and Setup Dataset

The REACT 2023/2024 Multimodal Challenge Dataset is compiled from the following public datasets for studying dyadic interactions:

NOXI
RECOLA

Apply for data access at:

Data organization (data/) follows this structure:

data/partition/modality/site/chat_index/person_index/clip_index/actual_data_files

Example data structure:

data
├── test
├── val
├── train
   ├── Video_files
       ├── NoXI
           ├── 010_2016-03-25_Paris
               ├── Expert_video
               ├── Novice_video
                   ├── 1
                       ├── 1.png
                       ├── ....
                       ├── 751.png
                   ├── ....
           ├── ....
       ├── RECOLA
   ├── Audio_files
       ├── NoXI
       ├── RECOLA
           ├── group-1
               ├── P25 
               ├── P26
                   ├── 1.wav
                   ├── ....
           ├── group-2
           ├── group-3
   ├── Emotion
       ├── NoXI
       ├── RECOLA
           ├── group-1
               ├── P25 
               ├── P26
                   ├── 1.csv
                   ├── ....
           ├── group-2
           ├── group-3
   ├── 3D_FV_files
       ├── NoXI
       ├── RECOLA
           ├── group-1
               ├── P25 
               ├── P26
                   ├── 1.npy
                   ├── ....
           ├── group-2
           ├── group-3

Important details:

Task: Predict one role's reaction ('Expert' or 'Novice', 'P25' or 'P26') to the other
3D_FV_files contain 3DMM coefficients (expression: 52 dim, angle: 3 dim, translation: 3 dim)
Video specifications:
- Frame rate: 25 fps
- Resolution: 256x256
- Clip length: 751 frames (~30s)
- Audio sampling rate: 44100
CSV files for training/validation are available at: 'data/train.csv', 'data/val.csv', 'data/test.csv'

Download Additional Resources

Listener Reaction Neighbors
- Download the appropriate listener reaction neighbors dataset from here
- Place the downloaded files in the dataset root folder
Ground Truth 3DMMs
- Download the ground truth 3DMMs (test set) for speaker-listener evaluation from here
- Place the downloaded files in the metric/gt folder

2. External Tool Preparation

Required Models and Tools

We use 3DMM coefficients for 3D listener/speaker representation and 3D-to-2D frame rendering.

3DMM Model Setup
- Download FaceVerse version 2 model (faceverse_simple_v2.npy)
- Place in external/FaceVerse/data/
- Get pre-extracted data:
  - 3DMM coefficients (Place in dataset_root/3D_FV_files)
  - Reference files (mean_face, std_face, reference_full)
  - Place in external/FaceVerse/
PIRender Setup
- We use PIRender for 3D-to-2D rendering
- Download our retrained checkpoint (cur_model_fold.pth)
- Place in external/PIRender/

3. Training

Training Options

Training with rendering during training:

python train.py \
  --batch-size 8 \
  --window-size 64 \
  --momentum 0.1 \
  --gpu-ids 0 \
  -lr 0.00002 \
  -e 200 \
  -j 4 \
  --sm-p 10 \
  --kl-p 0.00001 \
  --div-p 100 \
  --rendering \
  --outdir results/train-reactface

Training without rendering during validation (faster):

python train.py \
  --batch-size 8 \
  --window-size 64 \
  --momentum 0.1 \
  --gpu-ids 0 \
  -lr 0.00002 \
  -e 200 \
  -j 4 \
  --sm-p 10 \
  --kl-p 0.00001 \
  --div-p 100 \
  --outdir results/train-reactface

4. Evaluation

Generate Results

To generate listener reactions using a trained ReactFace model, run:

python evaluate.py \
  --split test \
  --batch-size 16 \
  --window-size 8 \
  --momentum 0.9 \
  --gpu-ids 0 \
  -j 4 \
  --rendering \
  --outdir results/eval \
  --resume results/training-reactface/best_checkpoint.pth

Metric-based Evaluations

Our evaluation methodology is based on established research in Multiple Appropriate Listener Reaction:

Metrics Overview

Diversity Metrics

FRDvs: Measures diversity across speaker behavior conditions
FRVar: Evaluates diversity within a single generated facial reaction sequence
FRDiv: Assesses diversity of different generated listener reactions to the same speaker behavior

Quality Metrics

FRRea: Uses Fréchet Video Distance (FVD) to evaluate realism of generated video sequences
FRCorr: Measures appropriateness by correlating each generated facial reaction with its most similar real facial reaction
FRSyn: Evaluates synchronization between generated listener reactions and varying speaker sequences

Running Evaluation

Execute the following command to compute all metrics:

python evaluate_metric.py \
  --split test \
  --gt-speaker-3dmm-path ./metric/gt/tdmm_speaker.npy \
  --gt-listener-3dmm-path ./metric/gt/tdmm_listener.npy \
  --gn-listener-3dmm-path ./results/eval/test/coeffs/tdmm_10x.npy

Assessing realism by FVD:

Download model(rgb_imagenet.pt) from the lib
Put the model to the folder metric/FVD/pytorch_i3d_model/models
Execute the following command to compute the FVD metric:

python metric/FVD/fvd_eval.py \
  --source-dir PATH/TO/A-COLLECTION-OF-GT-LISTENER-VIDEOS \
  --target-dir /path/to/your/generated/videos \
  --model-path metric/FVD/pytorch_i3d_model/models/rgb_imagenet.pt \
  --num-videos 100 \
  --frame-size 224 \
  --max-frames 750

5. Customized Inference

Generate Dyadic Reaction with Custom Speaker Video

Execute the following command to generate a listener's reaction to your speaker video:

python dyadic_reaction_inference.py \
    --speaker-video /path/to/your_video.mp4 \
    --speaker-audio /path/to/your_audio.wav \
    --listener-portrait /path/to/your_portrait.png \
    --window-size 8 \
    --momentum 0.9 \
    --output-dir results/customized_inference \
    --checkpoint results/training-reactface/best_checkpoint.pth

Required Inputs:

speaker-video: Path to the input speaker video file (MP4 format)
speaker-audio: Path to the speaker's audio file (WAV format)
listener-portrait: Path to the portrait photo of your custom listener (PNG format)

Optional Parameters:

window-size: Size of the temporal window (default: 8)
momentum: controlling speed (default: 0.9)
output-dir: Directory for saving generated results
checkpoint: Path to the trained model checkpoint

🖊️ Citation

If this work helps in your research, please cite the following papers:

@article{10756784,
  author={Luo, Cheng and Song, Siyang and Xie, Weicheng and Spitale, Micol and Ge, Zongyuan and Shen, Linlin and Gunes, Hatice},
  journal={IEEE Transactions on Visualization and Computer Graphics}, 
  title={ReactFace: Online Multiple Appropriate Facial Reaction Generation in Dyadic Interactions}, 
  year={2024},
  volume={},
  number={},
  pages={1-18},
}


@article{luo2023reactface,
  title={Reactface: Multiple appropriate facial reaction generation in dyadic interactions},
  author={Luo, Cheng and Song, Siyang and Xie, Weicheng and Spitale, Micol and Shen, Linlin and Gunes, Hatice},
  journal={arXiv preprint arXiv:2305.15748},
  year={2023}
}

🤝 Acknowledgements

Thanks to the open source of the following projects:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ReactFace: Online Multiple Appropriate Facial Reaction Generation in Dyadic Interactions

📢 News

📋 Table of Contents

🛠️ Installation

Prerequisites

Setup Environment

Create and activate conda environment

Install PyTorch

Install PyTorch3D

Install other dependencies

👨‍🏫 Getting Started

1. Data Preparation

2. External Tool Preparation

3. Training

4. Evaluation

Metrics Overview

Diversity Metrics

Quality Metrics

Running Evaluation

5. Customized Inference

🖊️ Citation

🤝 Acknowledgements

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
external		external
metric		metric
model		model
LICENSE		LICENSE
README.md		README.md
dataset.py		dataset.py
dyadic_reaction_inference.py		dyadic_reaction_inference.py
evaluate.py		evaluate.py
evaluate_metric.py		evaluate_metric.py
render.py		render.py
requirements.txt		requirements.txt
train.py		train.py
utils.py		utils.py

License

lingjivoo/ReactFace

Folders and files

Latest commit

History

Repository files navigation

ReactFace: Online Multiple Appropriate Facial Reaction Generation in Dyadic Interactions

📢 News

📋 Table of Contents

🛠️ Installation

Prerequisites

Setup Environment

Create and activate conda environment

Install PyTorch

Install PyTorch3D

Install other dependencies

👨‍🏫 Getting Started

1. Data Preparation

2. External Tool Preparation

3. Training

4. Evaluation

Metrics Overview

Diversity Metrics

Quality Metrics

Running Evaluation

5. Customized Inference

🖊️ Citation

🤝 Acknowledgements

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages