Mobile Vision Transformer-based Visual Object Tracking [BMVC2023] official implementation
11-03-2024
: C++ implementation of our tracker is available now
10-11-2023
: ONNX-Runtime and TensorRT-based inference code is released. Now, our MVT runs at ~70 fps on CPU and ~300 fps on GPU ⚡⚡. Check the page for details.
14-09-2023
: The pretrained tracker model is released
13-09-2023
: The paper is available on arXiv now
22-08-2023
: The MVT tracker training and inference code is released
21-08-2023
: The paper is accepted at BMVC2023
Install the dependency packages using the environment file mvt_pyenv.yml
.
Generate the relevant files:
python tracking/create_default_local_file.py --workspace_dir . --data_dir ./data --save_dir ./output
After running this command, modify the datasets paths by editing these files
lib/train/admin/local.py # paths about training
lib/test/evaluation/local.py # paths about testing
- Set the path of training datasets in
lib/train/admin/local.py
- Place the pretrained backbone model under the
pretrained_models/
folder - For data preparation, please refer to this
- Uncomment lines
63, 67, and 71
in the base_backbone.py file. Replace these lines withself.z_dict1 = template.tensors
. - Run
python tracking/train.py --script mobilevit_track --config mobilevit_256_128x1_got10k_ep100_cosine_annealing --save_dir ./output --mode single
- The training logs will be saved under
output/logs/
folder
The pretrained tracker model can be found here
- Update the test dataset paths in
lib/test/evaluation/local.py
- Place the pretrained tracker model under
output/checkpoints/
folder - Run
python tracking/test.py --tracker_name mobilevit_track --tracker_param mobilevit_256_128x1_got10k_ep100_cosine_annealing --dataset got10k_test/trackingnet/lasot
- Change the
DEVICE
variable betweencuda
andcpu
in the--tracker_param
file for GPU and CPU-based inference, respectively - The raw results will be stored under
output/test/
folder
- To count the model parameters, run
python tracking/profile_model.py
- We use the Separable Self-Attention Transformer implementation and the pretrained
MobileViT
backbone from ml-cvnets. Thank you! - Our training code is built upon OSTrack and PyTracking
If our work is useful for your research, please consider citing:
@inproceedings{Gopal_2023_BMVC,
author = {Goutam Yelluru Gopal and Maria Amer},
title = {Mobile Vision Transformer-based Visual Object Tracking},
booktitle = {34th British Machine Vision Conference 2023, {BMVC} 2023, Aberdeen, UK, November 20-24, 2023},
publisher = {BMVA},
year = {2023},
url = {https://papers.bmvc2023.org/0800.pdf}
}