This is the official implementation code for MFuseNet. For technical details, please refer to :
MFuseNet: Robust Depth Estimation with Learned Multiscopic Fusion
Weihao Yuan, Rui Fan, Michael Yu Wang, Qifeng Chen
ICRA2020, RA-L
[Paper] [Project Page]
If you find this code useful, please consider citing:
@article{yuan2020mfusenet,
title={MFuseNet: Robust Depth Estimation With Learned Multiscopic Fusion},
author={Yuan, Weihao and Fan, Rui and Wang, Michael Yu and Chen, Qifeng},
journal={IEEE Robotics and Automation Letters},
volume={5},
number={2},
pages={3113--3120},
year={2020},
publisher={IEEE}
}
This code has been tested on Ubuntu 16.04, CUDA 9.0, two GTX 1080 Ti GPUs.
Dependencies:
- Python2.7
- PyTorch (0.4.0+)
- torchvision (0.2.0+)
- os, time, numpy, argparse, cv2, matplotlib, PIL
The input of the network are the cost volumes obtained by cost calculation step in stereo matching algorithms. They can be calculated by block matching, semi-global matching, graph cuts, deep-network-based methods, etc. The default costs are obtained by MC-CNN. Please refer to MC-CNN for computing the cost volumes.
The training data for three-view fusion are organized as follows:
dataset/
TRAIN/
scene1/
view0.png
view1.png
view2.png
disp1.png
left.bin
right.bin
TEST/
EVAL/
The view0.png
, view1.png
, view2.png
are the color images of the left, center, and right view. The disp1.png
is the ground-truth disparity map for view1. The left.bin
and right.bin
are the cost volumes obtained by MC-CNN for the matching between the left, right view and the center view.
For five-view fusion, there are additional view3.png
for the bottom view and view4.png
for the top view, and their corresponding cost volumes bottom.bin
and top.bin
.
Example data are available here.
. train.sh
Results on Middlebury 2006:
Model | AvgErr | RMS | Bad 0.5 | Bad 1 | Bad 2 |
---|---|---|---|---|---|
Model_3view | 0.250 | 1.036 | 4.08% | 1.83% | 1.15% |
Licensed under an MIT license.