Skip to content

Latest commit

 

History

History
98 lines (69 loc) · 3.52 KB

README.md

File metadata and controls

98 lines (69 loc) · 3.52 KB

MFuseNet

This is the official implementation code for MFuseNet. For technical details, please refer to :

MFuseNet: Robust Depth Estimation with Learned Multiscopic Fusion
Weihao Yuan, Rui Fan, Michael Yu Wang, Qifeng Chen
ICRA2020, RA-L
[Paper] [Project Page]

Bibtex

If you find this code useful, please consider citing:

@article{yuan2020mfusenet,
  title={MFuseNet: Robust Depth Estimation With Learned Multiscopic Fusion},
  author={Yuan, Weihao and Fan, Rui and Wang, Michael Yu and Chen, Qifeng},
  journal={IEEE Robotics and Automation Letters},
  volume={5},
  number={2},
  pages={3113--3120},
  year={2020},
  publisher={IEEE}
}

Contents

  1. Environment Setup
  2. Data Preparation
  3. Train

Environment setup

This code has been tested on Ubuntu 16.04, CUDA 9.0, two GTX 1080 Ti GPUs.

Dependencies:

  • Python2.7
  • PyTorch (0.4.0+)
  • torchvision (0.2.0+)
  • os, time, numpy, argparse, cv2, matplotlib, PIL

Data Preparation

The input of the network are the cost volumes obtained by cost calculation step in stereo matching algorithms. They can be calculated by block matching, semi-global matching, graph cuts, deep-network-based methods, etc. The default costs are obtained by MC-CNN. Please refer to MC-CNN for computing the cost volumes.

The training data for three-view fusion are organized as follows:

dataset/
    TRAIN/
        scene1/
            view0.png
            view1.png
            view2.png
            disp1.png
            left.bin
            right.bin
    TEST/
    EVAL/

The view0.png, view1.png, view2.png are the color images of the left, center, and right view. The disp1.png is the ground-truth disparity map for view1. The left.bin and right.bin are the cost volumes obtained by MC-CNN for the matching between the left, right view and the center view.

For five-view fusion, there are additional view3.png for the bottom view and view4.png for the top view, and their corresponding cost volumes bottom.bin and top.bin.

Example data are available here.

Train

. train.sh

Pretrained Models

Five views, four costs fusion

Model_5view

Three views, two costs fusion

Model_3view

Results on Middlebury 2006:

Model AvgErr RMS Bad 0.5 Bad 1 Bad 2
Model_3view 0.250 1.036 4.08% 1.83% 1.15%

License

Licensed under an MIT license.