This is the official code of AAAI'23 paper AudioEar: Single-View Ear Reconstruction for Personalized Spatial Audio
written in PyTorch.
Note: This codebase is a naive version and is under refactoring. If you have any questions, feel free to contact the author.
Spatial audio, which focuses on immersive 3D sound rendering, is widely applied in the acoustic industry. One of t he key problems of current spatial audio rendering methods is the lack of personalization based on different anatomies of individuals, which is essential to produce accurate sound source positions. In this work, we address this problem from an interdisciplinary perspective. The rendering of spatial audio is strongly correlated with the 3D shape of human bodies, particularly ears.
To this end, we propose to achieve personalized spatial audio by reconstructing 3D human ears with single-view images. First, to benchmark the ear reconstruction task, we introduce AudioEar3D, a high-quality 3D ear dataset consisting of 112 point cloud ear scans with RGB images. To self-supervisedly train a reconstruction model, we further collect a 2D ear dataset composed of 2,000 images, each one with manual annotation of occlusion and 55 landmarks, named AudioEar2D. To our knowledge, both datasets have the largest scale and best quality of their kinds for public use. Further, we propose AudioEarM, a reconstruction method guided by a depth estimation network that is trained on synthetic data, with two loss functions tailored for ear data. Lastly, to fill the gap between the vision and acoustics community, we develop a pipeline to integrate the reconstructed ear mesh with an off-the-shelf 3D human body and simulate a personalized Head-Related Transfer Function (HRTF), which is the core of spatial audio rendering.
Our collected datasets, AudioEar3D and AudioEar2D, could be downloaded in Zenodo. If you find any downloading problem, you could also use this mirror link from Google Drive.
3D Ear Dataset | Scale | with Image | Quality | Accessibility |
---|---|---|---|---|
UND-J2 | 1,800 | ✔ | * | ✔ |
York3DEar | 500 | ✘ | * | ✔ |
SYMARE-1 | 20 | ✘ | *** | ✔ |
SYMARE-2 | 102 | ✘ | *** | ✘ |
Ploumpis et al. | 234 | ✘ | *** | ✘ |
AudioEar3D | 112 | ✔ | **** | ✔ |
2D Ear Dataset | Scale | Source | Landmark Annotations | Usage |
---|---|---|---|---|
UND-E | 464 | Limited | ✘ | Biometrics |
AMI | 700 | Limited | ✘ | Biometrics |
IIT Delhi Ear | 754 | Limited | ✘ | Biometrics |
WPUTEDB | 3348 | Limited | ✘ | Biometrics |
UBEAR | 4,410 | Limited | ✘ | Biometrics |
IBug-B | 2,058 | In-the-wild | ✘ | Biometrics |
AWE | 9,500 | In-the-wild | ✘ | Biometrics |
EarVN | 28,412 | In-the-wild | ✘ | Biometrics |
IBug-A | 605 | In-the-wild | ✔ | Reconstruction |
AudioEar2D | 2,000 | In-the-wild | ✔ | Reconstruction |
-
Prepare AudioEar2D Dataset
- Download AudioEar2D dataset, then modify the
cfg.model.ear_dataset_path
in theconfig.py
to the directory of the dataset. - Copy the train/test split file
split.json file
in./data/AudioEar2D
to the dataset folder.
- Download AudioEar2D dataset, then modify the
-
Prepare Texture Model
- Follow the instructions for the Albedo model to get 'FLAME_albedo_from_BFM.npz', then modify the
cfg.model.tex_path
in theconfig.py
to the path of the npz file.
- Follow the instructions for the Albedo model to get 'FLAME_albedo_from_BFM.npz', then modify the
-
Prepare AudioEar3D Dataset
- Download AudioEar3D dataset, then modify the
cfg.s2m.s2m_data_path
in theconfig.py
to the directory of the dataset.
- Download AudioEar3D dataset, then modify the
-
Prepare Synthetic Dataset
- Download Synthetic dataset from here, then modify the
cfg.s2m.sythetic_dataset_path
in theconfig.py
to the directory of the dataset.
- Download Synthetic dataset from here, then modify the
The pre-train weights of our ResNet encoder and Monocular Depth Estimation (MDE) model could downloaded through this link.
-
Python 3.7
-
PyTorch>=1.6
-
PyTorch3D>=0.7.0
-
CUDA Toolkit>10.2
-
Trimesh>=3.9
-
numpy>=1.18.5
-
scipy>=1.4.1
-
chumpy>=0.69
-
scikit-image>=0.15
-
opencv-python>=4.1.1
-
scikit-image>=0.15
-
PyYAML>=5.1.1
You can install them either manually or through the command:
pip install -r requirements.txt
Pytorch3D might need manual installation. Follow the official instruction to install it.
-
Training and validating MDE model on Synthetic dataset
python train_depth.py
-
Training and validating on AudioEar2D dataset
Modify the
cfg.model.depth_model_path
to the model obtained fromtrain_depth.py
. Then run:python train_recon.py
-
Evaluation on AudioEar3D dataset
Change the
cfg.s2m.recon_model_path
in theconfig.py
to the model checkpoint file obtained fromtrain_recon.py
. Then run:python s2mdemo.py
If you find this project useful in your research, please cite the paper as:
Xiaoyang Huang, Yanjun Wang, Yang Liu, Bingbing Ni, Wenjun Zhang, Jinxian Liu, Teng Li. "AudioEar: Single-View Ear Reconstruction for Personalized Spatial Audio". arXiv preprint arXiv:2301.12613, 2023.
or using bibtex:
@article{huang2023audioear,
title={AudioEar: Single-View Ear Reconstruction for Personalized Spatial Audio},
author={Huang, Xiaoyang and Wang, Yanjun and Liu, Yang and Ni, Bingbing and Zhang Wenjun and Liu Jinxian and Li, Teng},
journal={arXiv preprint arXiv:2301.12613},
year={2023}
}