Official repository for the AAAI 2024 paper NuScenes-QA: A Multi-modal Visual Question Answering Benchmark for Autonomous Driving Scenario.
2024.11.01
CenterPoint feature released.2024.10.11
Training and Testing code released.2023.12.09
Our paper is accepted by AAAI 2024!2023.09.04
Our NuScenes-QA dataset v1.0 released.
- Release question & anwswer data
- Release visual feature
- Release training and testing code
We have released our question-answer annotations, please download it from HERE.
For the visual data, you can download CenterPoint feature that we have extracted from HERE. As an alternative, you can also download the origin nuScenes dataset from HERE, and extract the object-level features refer to this LINK with different backbones. For specific details on feature extraction, you can refer to the Visual Feature Extraction and Object Embedding sections of our paper.
The folder structure should be organized as follows before training.
NuScenes-QA
+-- configs/
| +-- butd.yaml
| +-- mcan_small.yaml
+-- data/
| +-- questions/ # downloaded
| | +-- NuScenes_train_questions.json
| | +-- NuScenes_val_questions.json
| +-- features/ # downloaded or extracted
| | +-- CenterPoint/
| | | +-- xxx.npz
| | | +-- ...
| | +-- BEVDet/
| | | +-- xxx.npz
| | | +-- ...
| | +-- MSMDFusion/
| | | +-- xxx.npz
| | | +-- ...
+-- src/
+-- run.py
The following packages are required to build the project:
python >= 3.5
CUDA >= 9.0
PyTorch >= 1.4.0
SpaCy == 2.1.0
For the SpaCy, you can install it by:
wget https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-2.1.0/en_core_web_lg-2.1.0.tar.gz
pip install en_core_web_lg-2.1.0.tar.gz
The following script will start training a man_small
model with CenterPoint
feature on 2
GPUs:
python3 run.py --RUN='train' --MODEL='mcan_small' --VIS_FEAT='CenterPoint' --GPU='0, 1'
All checkpoint files and the training logs will be saved to the following paths respectively:
outputs/ckpts/ckpt_<VERSION>/epoch<EPOCH_INDEX>.pkl
outputs/log/log_run_<VERSION>.txt
For testing, you can use the following script:
python3 run.py --RUN='val' --MODEL='mcan_small' --VIS_FEAT='CenterPoint' --CKPT_PATH'path/to/ckpt.pkl'
The evaluation results and the answers for all questions will ba saved to the following paths respectively:
outputs/log/log_run_xxx.txt
outputs/result/result_run_xxx.txt
If you have any questions about the dataset and its generation or the object-level feature extraction, feel free to cantact me with twqian19@fudan.edu.cn
.
If you find our paper and project useful, please consider citing:
@article{qian2023nuscenes,
title={NuScenes-QA: A Multi-modal Visual Question Answering Benchmark for Autonomous Driving Scenario},
author={Qian, Tianwen and Chen, Jingjing and Zhuo, Linhai and Jiao, Yang and Jiang, Yu-Gang},
journal={arXiv preprint arXiv:2305.14836},
year={2023}
}
We sincerely thank the authors of MMDetection3D and OpenVQA for open sourcing their methods.