ReactFace: Online Multiple Appropriate Facial Reaction Generation in Dyadic Interactions

generated_sample1.mp4

generated_sample2.mp4

generated_sample3.mp4

📢 News

Our paper has been accepted by IEEE Transactions on Visualization and Computer Graphics (TVCG)! 🎉🎉 (Oct/2024)

📋 Table of Contents

Installation
Getting Started
Citation
Acknowledgements

🛠️ Installation

Prerequisites

Python 3.8+
PyTorch 1.9+
CUDA 11.8+

Setup Environment

Create and activate conda environment

conda create -n react python=3.9
conda activate react

Install PyTorch

pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118

Install PyTorch3D

pip install --no-index --no-cache-dir pytorch3d -f https://dl.fbaipublicfiles.com/pytorch3d/packaging/wheels/py39_cu118_pyt201/download.html

Install other dependencies

pip install -r requirements.txt

👨‍🏫 Getting Started

1. Data Preparation

Download and Setup Dataset

The REACT 2023/2024 Multimodal Challenge Dataset is compiled from the following public datasets for studying dyadic interactions:

NOXI
RECOLA

Apply for data access at:

REACT 2023 Homepage
REACT 2024 Homepage

Data organization (data/) follows this structure:

data/partition/modality/site/chat_index/person_index/clip_index/actual_data_files

Example data structure:

data
├── test
├── val
├── train
   ├── Video_files
       ├── NoXI
           ├── 010_2016-03-25_Paris
               ├── Expert_video
               ├── Novice_video
                   ├── 1
                       ├── 1.png
                       ├── ....
                       ├── 751.png
                   ├── ....
           ├── ....
       ├── RECOLA
   ├── Audio_files
       ├── NoXI
       ├── RECOLA
           ├── group-1
               ├── P25 
               ├── P26
                   ├── 1.wav
                   ├── ....
           ├── group-2
           ├── group-3
   ├── Emotion
       ├── NoXI
       ├── RECOLA
           ├── group-1
               ├── P25 
               ├── P26
                   ├── 1.csv
                   ├── ....
           ├── group-2
           ├── group-3
   ├── 3D_FV_files
       ├── NoXI
       ├── RECOLA
           ├── group-1
               ├── P25 
               ├── P26
                   ├── 1.npy
                   ├── ....
           ├── group-2
           ├── group-3

Important details:

Task: Predict one role's reaction ('Expert' or 'Novice', 'P25' or 'P26') to the other
3D_FV_files contain 3DMM coefficients (expression: 52 dim, angle: 3 dim, translation: 3 dim)
Video specifications:
- Frame rate: 25 fps
- Resolution: 256x256
- Clip length: 751 frames (~30s)
- Audio sampling rate: 44100
CSV files for training/validation are available at: 'data/train.csv', 'data/val.csv', 'data/test.csv'

Download Additional Resources

Listener Reaction Neighbors
- Download the appropriate listener reaction neighbors dataset from here
- Place the downloaded files in the dataset root folder
Ground Truth 3DMMs
- Download the ground truth 3DMMs (test set) for speaker-listener evaluation from here
- Place the downloaded files in the metric/gt folder

2. External Tool Preparation

Required Models and Tools

We use 3DMM coefficients for 3D listener/speaker representation and 3D-to-2D frame rendering.

3DMM Model Setup
- Download FaceVerse version 2 model (faceverse_simple_v2.npy)
- Place in external/FaceVerse/data/
- Get pre-extracted data:
  - 3DMM coefficients (Place in dataset_root/3D_FV_files)
  - Reference files (mean_face, std_face, reference_full)
  - Place in external/FaceVerse/
PIRender Setup
- We use PIRender for 3D-to-2D rendering
- Download our retrained checkpoint (cur_model_fold.pth)
- Place in external/PIRender/

3. Training

Training Options

Training with rendering during training:

python train.py \
  --batch-size 8 \
  --window-size 64 \
  --momentum 0.1 \
  --gpu-ids 0 \
  -lr 0.00002 \
  -e 200 \
  -j 4 \
  --sm-p 10 \
  --kl-p 0.00001 \
  --div-p 100 \
  --rendering \
  --outdir results/train-reactface

Training without rendering during validation (faster):

python train.py \
  --batch-size 8 \
  --window-size 64 \
  --momentum 0.1 \
  --gpu-ids 0 \
  -lr 0.00002 \
  -e 200 \
  -j 4 \
  --sm-p 10 \
  --kl-p 0.00001 \
  --div-p 100 \
  --outdir results/train-reactface

4. Evaluation

Generate Results

To generate listener reactions using a trained ReactFace model, run:

python evaluate.py \
  --split test \
  --batch-size 16 \
  --window-size 8 \
  --momentum 0.9 \
  --gpu-ids 0 \
  -j 4 \
  --rendering \
  --outdir results/eval \
  --resume results/training-reactface/best_checkpoint.pth

Metric-based Evaluations

Our evaluation methodology is based on established research in Multiple Appropriate Listener Reaction:

Metrics Overview

Diversity Metrics

FRDvs: Measures diversity across speaker behavior conditions
FRVar: Evaluates diversity within a single generated facial reaction sequence
FRDiv: Assesses diversity of different generated listener reactions to the same speaker behavior

Quality Metrics

FRRea: Uses Fréchet Video Distance (FVD) to evaluate realism of generated video sequences
FRCorr: Measures appropriateness by correlating each generated facial reaction with its most similar real facial reaction
FRSyn: Evaluates synchronization between generated listener reactions and varying speaker sequences

Running Evaluation

Execute the following command to compute all metrics:

python evaluate_metric.py \
  --split test \
  --gt-speaker-3dmm-path ./metric/gt/tdmm_speaker.npy \
  --gt-listener-3dmm-path ./metric/gt/tdmm_listener.npy \
  --gn-listener-3dmm-path ./results/eval/test/coeffs/tdmm_10x.npy

Assessing realism by FVD:

Download model(rgb_imagenet.pt) from the lib
Put the model to the folder metric/FVD/pytorch_i3d_model/models
Execute the following command to compute the FVD metric:

python metric/FVD/fvd_eval.py \
  --source-dir PATH/TO/A-COLLECTION-OF-GT-LISTENER-VIDEOS \
  --target-dir /path/to/your/generated/videos \
  --model-path metric/FVD/pytorch_i3d_model/models/rgb_imagenet.pt \
  --num-videos 100 \
  --frame-size 224 \
  --max-frames 750

5. Customized Inference

Generate Dyadic Reaction with Custom Speaker Video

Execute the following command to generate a listener's reaction to your speaker video:

python dyadic_reaction_inference.py \
    --speaker-video /path/to/your_video.mp4 \
    --speaker-audio /path/to/your_audio.wav \
    --listener-portrait /path/to/your_portrait.png \
    --window-size 8 \
    --momentum 0.9 \
    --output-dir results/customized_inference \
    --checkpoint results/training-reactface/best_checkpoint.pth

Required Inputs:

speaker-video: Path to the input speaker video file (MP4 format)
speaker-audio: Path to the speaker's audio file (WAV format)
listener-portrait: Path to the portrait photo of your custom listener (PNG format)

Optional Parameters:

window-size: Size of the temporal window (default: 8)
momentum: controlling speed (default: 0.9)
output-dir: Directory for saving generated results
checkpoint: Path to the trained model checkpoint

🖊️ Citation

If this work helps in your research, please cite the following papers:

@article{10756784,
  author={Luo, Cheng and Song, Siyang and Xie, Weicheng and Spitale, Micol and Ge, Zongyuan and Shen, Linlin and Gunes, Hatice},
  journal={IEEE Transactions on Visualization and Computer Graphics}, 
  title={ReactFace: Online Multiple Appropriate Facial Reaction Generation in Dyadic Interactions}, 
  year={2024},
  volume={},
  number={},
  pages={1-18},
}


@article{luo2023reactface,
  title={Reactface: Multiple appropriate facial reaction generation in dyadic interactions},
  author={Luo, Cheng and Song, Siyang and Xie, Weicheng and Spitale, Micol and Shen, Linlin and Gunes, Hatice},
  journal={arXiv preprint arXiv:2305.15748},
  year={2023}
}

🤝 Acknowledgements

Thanks to the open source of the following projects:

FaceVerse
PIRender

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

ReactFace: Online Multiple Appropriate Facial Reaction Generation in Dyadic Interactions

📢 News

📋 Table of Contents

🛠️ Installation

Prerequisites

Setup Environment

Create and activate conda environment

Install PyTorch

Install PyTorch3D

Install other dependencies

👨‍🏫 Getting Started

1. Data Preparation

2. External Tool Preparation

3. Training

4. Evaluation

Metrics Overview

Diversity Metrics

Quality Metrics

Running Evaluation

5. Customized Inference

🖊️ Citation

🤝 Acknowledgements

Files

README.md

Latest commit

History

README.md

File metadata and controls

ReactFace: Online Multiple Appropriate Facial Reaction Generation in Dyadic Interactions

📢 News

📋 Table of Contents

🛠️ Installation

Prerequisites

Setup Environment

Create and activate conda environment

Install PyTorch

Install PyTorch3D

Install other dependencies

👨‍🏫 Getting Started

1. Data Preparation

2. External Tool Preparation

3. Training

4. Evaluation

Metrics Overview

Diversity Metrics

Quality Metrics

Running Evaluation

5. Customized Inference

🖊️ Citation

🤝 Acknowledgements