Skip to content

Nightmare-n/DepthAnyVideo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Depth Any Video with Scalable Synthetic Data

Depth Any Video introduces a scalable synthetic data pipeline, capturing 40,000 video clips from diverse games, and leverages powerful priors of generative video diffusion models to advance video depth estimation. By incorporating rotary position encoding, flow matching, and a mixed-duration training strategy, it robustly handles varying video lengths and frame rates. Additionally, a novel depth interpolation method enables high-resolution depth inference, achieving superior spatial accuracy and temporal consistency over previous models.

This repository is the official implementation of the paper:

teaser

News

[2024-10-20] The Replicate Demo and API is added here.

[2024-10-20] The Hugging Face online demo is live here.

[2024-10-15] The arXiv submission is available here.

Installation

Setting up the environment with conda. With support for the app.

git clone https://github.com/Nightmare-n/DepthAnyVideo
cd DepthAnyVideo

# create env using conda
conda create -n dav python==3.10
conda activate dav
pip install -r requirements.txt
pip install gradio

Inference

  • To run inference on an image, use the following command:
python run_infer.py --data_path ./demos/arch_2.jpg --output_dir ./outputs/ --max_resolution 2048
  • To run inference on a video, use the following command:
python run_infer.py --data_path ./demos/wooly_mammoth.mp4 --output_dir ./outputs/ --max_resolution 960

Citation

If you find our work useful, please cite:

@article{yang2024depthanyvideo,
  author    = {Honghui Yang and Di Huang and Wei Yin and Chunhua Shen and Haifeng Liu and Xiaofei He and Binbin Lin and Wanli Ouyang and Tong He},
  title     = {Depth Any Video with Scalable Synthetic Data},
  journal   = {arXiv preprint arXiv:2410.10815},
  year      = {2024}
}