Skip to content

Latest commit

 

History

History
153 lines (129 loc) · 29.6 KB

File metadata and controls

153 lines (129 loc) · 29.6 KB

Monocular Depth Estimation Rankings
and 2D to 3D Video Conversion Rankings

List of Rankings

2D to 3D Video Conversion Rankings

  1. 22_dogskateboarder.MOV (1 frame): Rank (human perceptual judgment)

Monocular Depth Estimation Rankings

I. New layout

  1. ScanNet++ (98 video clips with 32 frames each): TAE
  2. NYU-Depth V2: OPW<=0.37
  3. Bonn RGB-D Dynamic (5 video clips with 110 frames each): AbsRel<=0.075
  4. NYU-Depth V2: AbsRel<=0.045 [test: new layout]

II. Old layout [currently no longer up to date]

  1. NYU-Depth V2 (640×480): AbsRel<=0.058 [currently no longer up to date]
  2. DA-2K (mostly 1500×2000): Acc (%)>=86
  3. UnrealStereo4K (3840×2160): AbsRel<=0.04
  4. Middlebury2021 (1920×1080): SqRel<=0.5

Appendices


22_dogskateboarder.MOV (1 frame): Rank (human perceptual judgment)

📝 Note: There are no quantitative comparison results of StereoCrafter yet, so this ranking is based on my own perceptual judgement of the qualitative comparison results shown in Figure 7. One output frame (right view) is compared with one input frame (left view) from the video file: 22_dogskateboarder.MOV

RK Model
Links:
         Venue   Repository    
Rank (human perceptual
judgment) ↓
arXiv
StereoCrafter
1 StereoCrafter
arXiv
1
2-3 Immersity AI 2-3
2-3 Owl3D 2-3
4 Deep3D
ECCV GitHub Stars
4

Back to Top Back to the List of Rankings

ScanNet++ (98 video clips with 32 frames each): TAE

RK Model
Links:
         Venue   Repository    
  TAE ↓  
{Input fr.}
arXiv
DAV
1 Depth Any Video
arXiv GitHub Stars
2.1 {MF}
2 DepthCrafter
arXiv GitHub Stars
2.2 {MF}
3 ChronoDepth
arXiv GitHub Stars
2.3 {MF}
4 NVDS
ICCV GitHub Stars
3.7 {4}

Back to Top Back to the List of Rankings

NYU-Depth V2: OPW<=0.37

RK Model
Links:
         Venue   Repository    
  OPW ↓  
{Input fr.}
arXiv
FD
   OPW ↓   
{Input fr.}
TPAMI
NVDS+
  OPW ↓  
{Input fr.}
ICCV
NVDS
1 FutureDepth
arXiv
0.303 {4} - -
2 NVDS+
TPAMI GitHub Stars
- 0.339 {4} -
3 NVDS
ICCV GitHub Stars
0.364 {4} - 0.364 {4}

Back to Top Back to the List of Rankings

Bonn RGB-D Dynamic (5 video clips with 110 frames each): AbsRel<=0.075

RK Model
Links:
         Venue   Repository    
  AbsRel ↓  
{Input fr.}
arXiv
MonST3R
  AbsRel ↓  
{Input fr.}
arXiv
DC
1 MonST3R
arXiv GitHub Stars
0.063 {MF} -
2 DepthCrafter
arXiv GitHub Stars
0.075 {MF} 0.075 {MF}

Back to Top Back to the List of Rankings

NYU-Depth V2: AbsRel<=0.045 [test: new layout]

RK Model
Links:
         Venue   Repository    
  AbsRel ↓  
{Input fr.}
arXiv
BD
   AbsRel ↓   
{Input fr.}
TPAMI
M3D v2
  AbsRel ↓  
{Input fr.}
CVPR
DA
    AbsRel ↓    
{Input fr.}
NeurIPS
DA V2
- - - -
1-2 BetterDepth
arXiv
0.042 {1} - - - - - - -
1-2 Metric3D v2 ViT-Large
TPAMI GitHub Stars
- 0.042 {1} - - - - - -
3 Depth Anything Large
CVPR GitHub Stars
0.043 {1} 0.043 {1} 0.043 {1} 0.043 {1} - - - -
4 Depth Anything V2 Large
NeurIPS GitHub Stars
- - - 0.045 {1} - - - -

Back to Top Back to the List of Rankings

NYU-Depth V2 (640×480): AbsRel<=0.058 [currently no longer up to date]

RK     Model       AbsRel ↓  
{Input fr.}
Training
dataset
Official
  repository  
Practical
model
Vapour-
Synth
1-2 BetterDepth
arXiv
Backbone:
Depth Anything & Marigold
0.042 {1}
arXiv
Hypersim & Virtual KITTI - - -
1-2 Metric3D v2 CSTM_label
ICCV
ENH:
arXiv
Backbone:
DINOv2 with registers (ViT-L/14)
0.042 {1}
arXiv
DDAD & Lyft & Driving Stereo & DIML & Arogoverse2 & Cityscapes & DSEC & Mapillary PSD & Pandaset & UASOL & Virtual KITTI & Waymo & Matterport3d & Taskonomy & Replica & ScanNet & HM3d & Hypersim GitHub Stars - -
3 Depth Anything Large
CVPR
Backbone:
DINOv2 (ViT-L/14)
0.043 {1}
CVPR
Pretraining: BlendedMVS & DIML & HR-WSI & IRS & MegaDepth & TartanAir
Training: BDD100K & Google Landmarks & ImageNet-21K & LSUN & Objects365 & Open Images V7 & Places365 & SA-1B
GitHub Stars - -
4 MiDaS v3.1 BEiTL-512
TPAMI
ENH:
arXiv
Backbone:
BEiT512-L (ViT-L/16)
0.048 {1}
CVPR
Pretraining: ReDWeb & HR-WSI & BlendedMVS & NYU-Depth V2 & KITTI
Training: ReDWeb & DIML & 3D Movies & MegaDepth & WSVD & TartanAir & HR-WSI & ApolloScape & BlendedMVS & IRS & NYU-Depth V2 & KITTI
GitHub Stars - PyTorch
GitHub Stars
5 GeoWizard
arXiv
Backbone:
Stable Diffusion v2
0.052 {1}
arXiv
Hypersim & Replica & 3D Ken Burns & Objaverse & proprietary GitHub Stars - -
6 Marigold
CVPR
Backbone:
Stable Diffusion v2
0.055 {1}
CVPR
Hypersim & Virtual KITTI GitHub Stars - -
7 GenPercept
arXiv
Backbone:
Stable Diffusion v2.1
0.056 {1}
arXiv
Hypersim & Virtual KITTI GitHub Stars - -
8 NeWCRFs + LightedDepth
CVPR
ENH:
CVPR
0.057 {2}
CVPR
ENH:
NYU-Depth V2
GitHub Stars
ENH:
GitHub Stars
- -
9 UniDepth-V
CVPR
Backbone:
DINOv2 (ViT-L/14)
0.0578 {1}
CVPR
A2D2 & Argoverse2 & BDD100k & CityScapes & DrivingStereo & Mapillary PSD & ScanNet & Taskonomy & Waymo GitHub Stars - -

Back to Top Back to the List of Rankings

DA-2K (mostly 1500×2000): Acc (%)>=86

RK     Model      Acc (%) ↑ 
{Input fr.}
Training
dataset
Official
  repository  
Practical
model
Vapour-
Synth
1 Depth Anything V2 Giant
CVPR
ENH:
arXiv
Backbone:
DINOv2 (ViT-G/14)
97.4 {1}
arXiv
Pretraining: BlendedMVS & Hypersim & IRS & TartanAir & VKITTI 2
Training: BDD100K & Google Landmarks & ImageNet-21K & LSUN & Objects365 & Open Images V7 & Places365 & SA-1B
GitHub Stars
ENH:
GitHub Stars
- -
2 GeoWizard
arXiv
Backbone:
Stable Diffusion v2
88.1 {1}
arXiv
Hypersim & Replica & 3D Ken Burns & Objaverse & proprietary GitHub Stars - -
3 Marigold
CVPR
Backbone:
Stable Diffusion v2
86.8 {1}
arXiv
Hypersim & Virtual KITTI GitHub Stars - -

Back to Top Back to the List of Rankings

UnrealStereo4K (3840×2160): AbsRel<=0.04

RK     Model       AbsRel ↓  
{Input fr.}
Training
dataset
Official
  repository  
Practical
model
Vapour-
Synth
1 ZoeDepth +PFR=128
arXiv
ENH:
CVPR
0.0388 {1}
CVPR
ENH:
UnrealStereo4K
GitHub Stars
ENH:
GitHub Stars
- -

Back to Top Back to the List of Rankings

Middlebury2021 (1920×1080): SqRel<=0.5

RK     Model       SqRel ↓  
{Input fr.}
Training
dataset
Official
  repository  
Practical
model
VapourSynth
1 LeReS-GBDMF
CVPR
ENH:
AAAI
0.444 {1}
AAAI
ENH:
HR-WSI
GitHub Stars
ENH:
GitHub Stars
- -

Back to Top Back to the List of Rankings

Appendix 3: List of all research papers from the above rankings

Method Paper     Venue    
BetterDepth BetterDepth: Plug-and-Play Diffusion Refiner for Zero-Shot Monocular Depth Estimation arXiv
ChronoDepth Learning Temporally Consistent Video Depth from Video Diffusion Priors arXiv
Deep3D Deep3D: Fully Automatic 2D-to-3D Video Conversion with Deep Convolutional Neural Networks ECCV
Depth Any Video Depth Any Video with Scalable Synthetic Data arXiv
Depth Anything Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data CVPR
Depth Anything V2 Depth Anything V2 NeurIPS
DepthCrafter DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos arXiv
DPT Vision Transformers for Dense Prediction ICCV
FutureDepth FutureDepth: Learning to Predict the Future Improves Video Depth Estimation arXiv
GBDMF Multi-Resolution Monocular Depth Map Fusion by Self-Supervised Gradient-Based Composition AAAI
GenPercept Diffusion Models Trained with Large Data Are Transferable Visual Models arXiv
GeoWizard GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image arXiv
LeReS Learning to Recover 3D Scene Shape from a Single Image CVPR
LightedDepth LightedDepth: Video Depth Estimation in light of Limited Inference View Angles CVPR
Marigold Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation CVPR
Metric3D Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image ICCV
Metric3D v2 Metric3D v2: A Versatile Monocular Geometric Foundation Model for Zero-shot Metric Depth and Surface Normal Estimation TPAMI
MiDaS Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-Shot Cross-Dataset Transfer TPAMI
MiDaS v3.1 MiDaS v3.1 – A Model Zoo for Robust Monocular Relative Depth Estimation arXiv
MonST3R MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion arXiv
NeWCRFs Neural Window Fully-connected CRFs for Monocular Depth Estimation CVPR
NVDS Neural Video Depth Stabilizer ICCV
NVDS+ NVDS+: Towards Efficient and Versatile Neural Stabilizer for Video Depth Estimation TPAMI
PatchFusion PatchFusion: An End-to-End Tile-Based Framework for High-Resolution Monocular Metric Depth Estimation CVPR
StereoCrafter StereoCrafter: Diffusion-based Generation of Long and High-fidelity Stereoscopic 3D from Monocular Videos arXiv
UniDepth UniDepth: Universal Monocular Metric Depth Estimation CVPR
ZoeDepth ZoeDepth: Zero-shot Transfer by Combining Relative and Metric Depth arXiv

Back to Top Back to the List of Rankings