This repsoitory conatins the audio-visual dataset proposed for the task of multi-modal zeroshot learning.
The dataset is curated from a large dataset, AudioSet. While the original dataset was multilabel, the example videos were selected such that every video in AudioSetZSL has only one label, ie. it is a multiclass dataset. For more details on creation of the dataset, refer to our paper.
Here, we provide the Youtube IDs for each class in the folder youtube-id
.
The dataset is divided into 2-parts for a broader use for both the task of classification and zero-shot learning.
The examples for each class has been divided into three subsets namely, train, test and val.
Similary, for the task of ZSL the classes in the dataset is divided into seen and unseen.
We also provide the pre-trained features for both audio and video. The features are so obtained that it can be used for the task of ZSL as there is no unseen class overlap with the pre-training of the network (refer to our paper for the detailed process of the dataset split). To download the pretrained feature follow the link : Download
Kindly contact kranti@cse.iitk.ac.in for any issues, comments etc.
- The dataset collection was done at IIT Kanpur.
- The dataset is intended to be used for academic research only.
- The links are YouTube links and the user is responsible for compliance with YouTube's terms and conditions.
- The videos are the property of the respective YouTube uploader. If any video belongs to you and you would like to have it removed kindly let us know and we will remove it from the dataset.