Real-time demo and evaluation code of our Face and Gesture 2024 paper: "CCDb-HG: Novel Annotations and Gaze-Aware Representations for Head Gesture Recognition" Link
You can download the CCDb dataset from here and our annotation from here.
- Nod is an up-down rotation along the pitch axis. It involves a slight, quick, or repetitive lowering and raising of the head.
- Shake is a left-right horizontal rotation along the yaw axis. It involves a rapid and potentially repeated side-to-side motion, typically with small or moderate amplitude.
- Tilt is a sideways rotation along the roll axis involving a shift of the head in which one ear moves closer to the shoulder while the other ear moves away.
- Turn corresponds to a left or right rotation involving the shifting of the head from its original position to another one facing a different direction. Head turns can vary in amplitude, ranging from a slight turn to a complete reorientation of the head. It differentiates from a shake by being a nonrepetitive movement and often initiated by a gaze shift.
- Up/Down is similar to a turn but along the pitch direction and usually involves a gaze shift in the same direction as the head.
- Waggle usually happens when speaking and involves a rhythmic swaying motion typically performed in a repeated manner. Unlike nod, shake, and tilt, waggle involves several head axis at the same time.
Nod | Shake | Tilt |
Turn | Up Down |
├── data <- CCDbHG folder can download from [here](http://example.com)
│ ├── extraction <- Extracted cues from CCDb dataset
│
├── images <- Visualizations of CCDb dataset and annotation
│
├── src <- Source code
│ ├── classifier/ <- Model architectures
│ ├── model_checkpoints/ <- Model checkpoints
│ │
│ ├── dataset.py <- Dataset class
│ ├── inference.py <- Model inference
│ ├── metrics.py <- Torchmetrics methods
│ └── utils_*.py <- Utility scripts
│
├── .gitignore <- List of files ignored by git
├── .project-root <- File to identify the root of the project
├── .pre-commit-config.yaml <- Configuration of pre-commit hooks for code formatting
├── requirements.txt <- Python dependencies for the project
├── THIRDPARTY <- License file
├── Dockerfile <- Dockerfile for the project
├── demo.py <- Demo script for real-time head gesture recognition
├── app.py <- WebApp script for real-time head gesture gif generation
├── run_evaluation.py <- Run our evaluation script from the paper on CCDbHG dataset
└── README.md
First, clone this repository:
git clone https://github.com/idiap/ccdbhg-head-gesture-recognition.git
cd ccdbhg-head-gesture-recognition
Then, install the requirements using conda:
conda create --name head_gesture python=3.11
conda activate head_gesture
conda install pip
pip install -r requirements.txt
Run the evaluation script to evaluate the best model on the CCDbHG dataset:
CNN model with landmarks and head pose features:
python run_evaluation.py --model cnn_lmk_hp --batch_size 128
CNN model with landmarks, head pose and gaze features:
python run_evaluation.py --model cnn_lmk_hp_gaze --batch_size 128
If GPU is available, you can specify the device:
python run_evaluation.py --model cnn_lmk_hp_gaze --device cuda --batch_size 128
Note: you can increase the batch size for faster evaluation.
Real-time head gesture recognition for up to 4 people at the same time in the frame. In this repo, two demo scripts are provided, but you need a webcam to run them.
In the first demo, a window will open showing the webcam feed with the head gesture recognition, bounding boxes, and landmarks.
# make sure to have a webcam connected and conda environment activated
python demo.py --face_detector CV2
There are two face detectors available: CV2
and YUNET
. The YUNET
detector is more accurate (especially if far from the camera) but slower compared to CV2.
Note: Please wait 3 seconds for the track and detect face to be set up.
In this demo, you will run the app localy, you have to open a browser to see the webcam feed with the captured head gesture gif.
# make sure to have a webcam connected and conda environment activated
python app.py
Then, open a browser and go to http://localhost:5000/
Alternatively, you can run it with docker, but it doesn't work (it's too slow ). Otherwise, the Dockerfile can be used as a template to run the evaluation code. First, build the docker image, if not already done:
docker build -t head_gesture .
Then, run the docker container:
docker run -p 5000:5000 head_gesture
Then, open a browser and go to http://localhost:5000/
Note: Please wait 3 seconds for the track and detect face to be set up. The app is not error-proof; if an error occurs, please restart the app.
The work was co-financed by Innosuisse, the Swiss innovation agency, through the NL-CH Eureka Innovation project ePartner4ALL (a personalized and blended care solution with a virtual buddy for child health, number 57272.1 IP-ICT).
Warning: The code is under the license of GPL-3.0-only license. For the model chekpoints, the model cnn_lmk_hp/ is under the license of GPL-3.0-only license but the model cnn_lmk_hp_gaze/ is under the license of CC BY-NC-SA 4.0 license which is non-commercial use. This is because the gaze used and extracted from the ETH-XGaze dataset is under the license of CC BY-NC-SA 4.0 license. However, the demo code is using the cnn_lmk_hp/ thus it is under the license of GPL-3.0-only license.
Dataset:
Extracted features from the CCDb dataset:
- Landmarks and head pose Mediapipe
- Gaze Xgaze trained on ETH-XGaze dataset
Demo:
- Tracking of face bounding box MotPy code
- Face detection YUNET and CV2
- Landmarks and head pose Mediapipe
If you use this dataset, please cite the following paper:
@INPROCEEDINGS{Vuillecard_FG_2024,
author = {Vuillecard, Pierre and Farkhondeh, Arya and Villamizar, Michael and Odobez, Jean-Marc},
title = {CCDb-HG: Novel Annotations and Gaze-Aware Representations for Head Gesture Recognition},
booktitle = {18th IEEE Int. Conference on Automatic Face and Gesture Recognition (FG), Istanbul,},
year = {2024},
pdf = {https://publications.idiap.ch/attachments/papers/2024/Vuillecard_FG_2024.pdf}
}