This fork of DepthCrafter has batch processing using run_batch.py
The utils.py file has been updated to force aspect ratio scale to match source

Note: Run inside PyCharm IDE (highly recommended):

Required Dependencies (NVIDIA):

pip install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 --index-url https://download.pytorch.org/whl/cu118

pip install -U xformers --index-url https://download.pytorch.org/whl/cu118

DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos

Wenbo Hu^{1* †}, Xiangjun Gao^2*, Xiaoyu Li^{1* †}, Sijie Zhao¹, Xiaodong Cun¹,
Yong Zhang¹, Long Quan², Ying Shan^{3, 1}

¹Tencent AI Lab ²The Hong Kong University of Science and Technology ³ARC Lab, Tencent PCG

arXiv preprint, 2024

🔆 Introduction

🔥🔥🔥 DepthCrafter is released now, have fun!

🤗 DepthCrafter can generate temporally consistent long depth sequences with fine-grained details for open-world videos, without requiring additional information such as camera poses or optical flow.

🎥 Visualization

We provide some demos of unprojected point cloud sequences, with reference RGB and estimated depth videos. Please refer to our project page for more details.

365030500-ff625ffe-93ab-4b58-a62a-50bf75c89a92.mov

🚀 Quick Start

🛠️ Installation

Clone this repo:

git clone https://github.com/Tencent/DepthCrafter.git

Install dependencies (please refer to requirements.txt):

pip install -r requirements.txt

🤗 Model Zoo

DepthCrafter is available in the Hugging Face Model Hub.

🏃‍♂️ Inference

1. High-resolution inference, requires a GPU with ~26GB memory for 1024x576 resolution:

Full inference (~0.6 fps on A100, recommended for high-quality results):
```
python run.py  --video-path examples/example_01.mp4
```
Fast inference through 4-step denoising and without classifier-free guidance （~2.3 fps on A100）:
```
python run.py  --video-path examples/example_01.mp4 --num-inference-steps 4 --guidance-scale 1.0
```

2. Low-resolution inference, requires a GPU with ~9GB memory for 512x256 resolution:

Full inference (~2.3 fps on A100):

python run.py  --video-path examples/example_01.mp4 --max-res 512

Fast inference through 4-step denoising and without classifier-free guidance (~9.4 fps on A100):

python run.py  --video-path examples/example_01.mp4  --max-res 512 --num-inference-steps 4 --guidance-scale 1.0

🤖 Gradio Demo

We provide a local Gradio demo for DepthCrafter, which can be launched by running:

gradio app.py

🤝 Contributing

Welcome to open issues and pull requests.
Welcome to optimize the inference speed and memory usage, e.g., through model quantization, distillation, or other acceleration techniques.

📜 Citation

If you find this work helpful, please consider citing:

@article{hu2024-DepthCrafter,
            author      = {Hu, Wenbo and Gao, Xiangjun and Li, Xiaoyu and Zhao, Sijie and Cun, Xiaodong and Zhang, Yong and Quan, Long and Shan, Ying},
            title       = {DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos},
            journal     = {arXiv preprint arXiv:2409.02095},
            year        = {2024}
    }

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Note: Run inside PyCharm IDE (highly recommended):

Required Dependencies (NVIDIA):

DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos

🔆 Introduction

🎥 Visualization

🚀 Quick Start

🛠️ Installation

🤗 Model Zoo

🏃‍♂️ Inference

1. High-resolution inference, requires a GPU with ~26GB memory for 1024x576 resolution:

2. Low-resolution inference, requires a GPU with ~9GB memory for 512x256 resolution:

🤖 Gradio Demo

🤝 Contributing

📜 Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

Note: Run inside PyCharm IDE (highly recommended):

Required Dependencies (NVIDIA):

DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos

🔆 Introduction

🎥 Visualization

🚀 Quick Start

🛠️ Installation

🤗 Model Zoo

🏃‍♂️ Inference

1. High-resolution inference, requires a GPU with ~26GB memory for 1024x576 resolution:

2. Low-resolution inference, requires a GPU with ~9GB memory for 512x256 resolution:

🤖 Gradio Demo

🤝 Contributing

📜 Citation