GeoMIM: Towards Better 3D Knowledge Transfer via Masked Image Modeling for Multi-view 3D Understanding (ICCV 2023)

arXiv

Introduction

Welcome to the official repository of GeoMIM, a groundbreaking pretraining approach for multi-view camera-based 3D perception. This repository provides the pretraining and finetuning code and pretrained models to reproduce the exceptional results presented in our paper.

The implementation of pretraining is based on bevfusion. See the pretrain folder for further details.

After pretraining, we finetune the pretrained Swin Transformer for multi-view camera-based 3D perception. We use the BEVDet for finetuning. We provide models with different techniques used in BEVDet, including CBGS, 4D, Depth, and Stereo. We also provide models for occpancy prediction using the implementation in BEVDet repo. See the bevdet folder for further details.

Key Results

We provide the GeoMIM pretrained Swin-Base and Large checkpoints.

Model	Download
Swin-Base	Model
Swin-Large	Model

We have achieved strong performance on the nuScenes benchmark with GeoMIM. Here are some quantitative results on 3D detection:

Config	mAP	NDS	Download
bevdet-swinb-4d-256x704-cbgs	33.98	47.19	Model
bevdet-swinb-4d-256x704-cbgs-geomim	42.25	53.1	Model
bevdet-swinb-4d-stereo-256x704-cbgs-geomim	45.33	55.1	Model
bevdet-swinb-4d-stereo-512x1408-cbgs	47.2	57.6	Model (#)
bevdet-swinb-4d-stereo-512x1408-cbgs-geomim	52.04	60.92	Model

Here are some quantitative results on occpancy prediction:

Config	mIoU	Download
bevdet-occ-swinb-4d-stereo-2x (*)	42.0	Model (#)
bevdet-occ-swinb-4d-stereo-2x-geomim	45.0	Model
bevdet-occ-swinb-4d-stereo-2x-geomim (*)	45.73	Model
bevdet-occ-swinl-4d-stereo-2x-geomim	46.27	Model

(*) Load 3D detection checkpoint. (#) Original BEVDet checkpoint.

Get Start

Citation

If you find GeoMIM beneficial for your research, kindly consider citing our paper:

@inproceedings{liu2023geomim,
  title={GeoMIM: Towards Better 3D Knowledge Transfer via Masked Image Modeling for Multi-view 3D Understanding},
  author={Jihao Liu, Tai Wang, Boxiao Liu, Qihang Zhang, Yu Liu, Hongsheng Li},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
  year={2023}
}

Contact

For any questions or inquiries, please feel free to reach out to the authors: Jihao Liu (email) and Tai Wang (email)

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
assets		assets
bevdet		bevdet
pretrain		pretrain
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GeoMIM: Towards Better 3D Knowledge Transfer via Masked Image Modeling for Multi-view 3D Understanding (ICCV 2023)

arXiv

Introduction

Key Results

Get Start

Citation

Contact

About

Releases

Packages

Languages

Sense-X/GeoMIM

Folders and files

Latest commit

History

Repository files navigation

GeoMIM: Towards Better 3D Knowledge Transfer via Masked Image Modeling for Multi-view 3D Understanding (ICCV 2023)

arXiv

Introduction

Key Results

Get Start

Citation

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages