Skip to content

Latest commit

 

History

History
70 lines (48 loc) · 2.66 KB

README.md

File metadata and controls

70 lines (48 loc) · 2.66 KB

FNAC_AVL

Official Pytorch implementation of FNAC.

Our paper is accpeted to CVPR 2023:

Learning Audio-Visual Source Localization via False Negative Aware Contrastive Learning

Environment

Setup environment:

pip install -r requirements.txt

Data preparation

Flickr-SoundNet

Data can be downloaded from Learning to localize sound sources

VGG-Sound Source

Data can be downloaded from Localizing Visual Sounds the Hard Way

VGG-SS Unheard & Heard Test Data

Data can be downloaded from Unheard and Heard

Training and Evaluation

For training FNAC on Flickr: set the dataset path, train set ('flickr_10k' or 'flickr_144k'), and experiment name accordingly and run:

bash train_flickr.sh

For evaluation: set the dataset path, test set, and experiment name accordingly and run:

bash test_flickr.sh

You can follow the same implementation to train and test on VGGSS and heard&unheard

Checkpoints

Train Set Test Set VGG-SS CIoU VGG_SS AUC url
Flickr 10k VGG-SS 35.27 38.00 checkpoints
Flickr 144k VGG-SS 33.93 37.29 checkpoints
VGG-Sound 10k VGG-SS 37.29 38.99 checkpoints
VGG-Sound 144k VGG-SS 39.50 39.66 checkpoints

Citation

If you find our work useful, please cite our paper:

@article{sun2023learning,
  title={Learning Audio-Visual Source Localization via False Negative Aware Contrastive Learning},
  author={Sun, Weixuan and Zhang, Jiayi and Wang, Jianyuan and Liu, Zheyuan and Zhong, Yiran and Feng, Tianpeng and Guo, Yandong and Zhang, Yanhao and Barnes, Nick},
  journal={arXiv preprint arXiv:2303.11302},
  year={2023}
}

Acknowledgement

We thank EZ-VSL for their great codebase.