- Low-resolution Google Drive
- High-resolution Google Drive
The way the data are composited following RVM practice.
We conducted all experiments on 8xA6000 GPUs. To train VMFormer with 8 GPUs, run:
GPUS_PER_NODE=8 ./tools/run_dist_launch.sh 8 ./configs/mv3_vmformer.sh
Evaluating VMFormer on the low-resoution composited testing set:
CUDA_VISIBLE_DEVICES=0 python inference_vm.py --model_path path/to/model_weights --masks --num_frames 20 --img_path path/to/vmformer_512x288_public --query_temporal weight_sum --fpn_temporal
Evaluating VMFormer on the high-resoution composited testing set:
CUDA_VISIBLE_DEVICES=0 python inference_vm.py --model_path path/to/model_weights --masks --num_frames 5 --img_path path/to/vmformer_1920x1080_public --query_temporal weight_sum --fpn_temporal
Evaluating VMFormer on the RVM low-resoution testing set:
CUDA_VISIBLE_DEVICES=0 python inference_rvm.py --model_path path/to/model_weights --masks --num_frames 20 --img_path path/to/rvm_512x288_public --query_temporal weight_sum --fpn_temporal
Evaluating VMFormer on the RVM high-resoution testing set:
CUDA_VISIBLE_DEVICES=0 python inference_rvm.py --model_path path/to/model_weights --masks --num_frames 20 --img_path path/to/rvm_1920x1080_public --query_temporal weight_sum --fpn_temporal