MVDiffusion: Enabling Holistic Multi-view Image Generation with Correspondence-Aware Diffusion, (NeurIPS2023, spotlight)
Project page | Paper | Demo
If you use our work in your research, please cite it as follows:
@article{tang2023MVDiffusion,
title={MVDiffusion: Enabling Holistic Multi-view Image Generation with Correspondence-Aware Diffusion},
author={Tang, Shitao and Zhang, Fuayng and Chen, Jiacheng and Wang, Peng and Yasutaka, Furukawa},
journal={arXiv preprint 2307.01097},
year={2023}
}
Updates: MVDiffusion is able to extrapolate a single perspective image into a 360-degree view panorama. The paper has been updated.
Install the necessary packages by running the following command:
pip install -r requirements.txt
We provide baseline results and models for the following:
Please put those files in 'MVDiffusion/weights'.
Test the demo by running:
- Text conditioned generation
python demo.py --text "This kitchen is a charming blend of rustic and modern, featuring a large reclaimed wood island with marble countertop, a sink surrounded by cabinets. To the left of the island, a stainless-steel refrigerator stands tall. To the right of the sink, built-in wooden cabinets painted in a muted."
- Dual contioned generation
python demo.py --text_path assets/prompts.txt --image_path assets/outpaint_example.png
- Panorama generation, please download data from matterport3D skybox data and labels.
├── data
├── mp3d_skybox
├── train.npy
├── test.npy
├── 5q7pvUzZiYa
├──blip3
├──matterport_skybox_images
├── 1LXtFkjw3qL
├── ....
- Depth conditioned generation, please download data from scannet, training labels, and testing labels.
├── data
├── scannet
├── train
├── scene0435_01
├── color
├── depth
├── intrinsic
├── pose
├── prompt
├── key_frame_0.6.txt
├── valid_frames.npy
├── test
Execute the following scripts for testing:
sh test_pano.sh
: Generate 8 multi-view panoramic images in the Matterport3D testing dataset.sh test_pano_outpaint.sh
: Generate 8 multi-view images conditioned on a single view image (outpaint) in the Matterport3D testing dataset.sh test_depth_fix_frames.sh
: Generate 12 depth-conditioned images in the ScanNet testing dataset.sh test_depth_fix_interval.sh
: Generate a sequence of depth-conditioned images (every 20 frames) in the ScanNet testing dataset.sh test_depth_two_stage.sh
: Generate a sequence of depth-conditioned images (key frames), and interpolate the in-between images, in the ScanNet testing dataset.
After running either sh test_depth_fix_interval.sh
or sh test_depth_two_stage.sh
, you can use TSDF fusion to get textured mesh.
Execute the following scripts for training:
sh train_pano.sh
: Train the panoramic image generation model.sh train_pano_outpaint.sh
: Train the panoramic image outpaint model.sh train_depth.sh
: Train the depth conditioned generation model.
Panorama generation:
- Convert the panorama into 6 skybox images using the provided tool, Equirec2Perspec. You will get left, right, front, back, up, and down images.
- Convert the panorama to 8 perspective images. Each image will capture a 45-degree horizontal view. Four of these images will overlap with the skybox images, specifically the left, right, front, and back views.
- Once you have the perspective images, you can use BLIP2 to generate prompts from them.
Multi-view Depth-to-Image Generation:
- Using Scannet Format: For this, you would typically follow the structure and format of the Scannet dataset.
- use BLIP2 to generate prompts from each perspective image.
This project is licensed under the terms of the MIT license.
For any questions, feel free to contact us at [shitaot@sfu.ca].