MVDiffusion: Enabling Holistic Multi-view Image Generation with Correspondence-Aware Diffusion, (NeurIPS2023, spotlight)

Project page | Paper | Demo

Citation

If you use our work in your research, please cite it as follows:

@article{tang2023MVDiffusion,
  title={MVDiffusion: Enabling Holistic Multi-view Image Generation with Correspondence-Aware Diffusion},
  author={Tang, Shitao and Zhang, Fuayng and Chen, Jiacheng and Wang, Peng and Yasutaka, Furukawa},
  journal={arXiv preprint 2307.01097},
  year={2023}
}

Updates: MVDiffusion is able to extrapolate a single perspective image into a 360-degree view panorama. The paper has been updated.

Installation

Install the necessary packages by running the following command:

pip install -r requirements.txt

Model Zoo

We provide baseline results and models for the following:

Please put those files in 'MVDiffusion/weights'.

Demo

Test the demo by running:

Text conditioned generation

python demo.py --text "This kitchen is a charming blend of rustic and modern, featuring a large reclaimed wood island with marble countertop, a sink surrounded by cabinets. To the left of the island, a stainless-steel refrigerator stands tall. To the right of the sink, built-in wooden cabinets painted in a muted."

Dual contioned generation

python demo.py --text_path assets/prompts.txt --image_path assets/outpaint_example.png

Data

Panorama generation, please download data from matterport3D skybox data and labels.

├── data
    ├── mp3d_skybox
      ├── train.npy
      ├── test.npy
      ├── 5q7pvUzZiYa
        ├──blip3
        ├──matterport_skybox_images
      ├── 1LXtFkjw3qL
      ├── ....

Depth conditioned generation, please download data from scannet, training labels, and testing labels.

├── data
    ├── scannet
      ├── train
        ├── scene0435_01
          ├── color
          ├── depth
          ├── intrinsic
          ├── pose
          ├── prompt
          ├── key_frame_0.6.txt
          ├── valid_frames.npy
      ├── test

Testing

Execute the following scripts for testing:

sh test_pano.sh: Generate 8 multi-view panoramic images in the Matterport3D testing dataset.
sh test_pano_outpaint.sh: Generate 8 multi-view images conditioned on a single view image (outpaint) in the Matterport3D testing dataset.
sh test_depth_fix_frames.sh: Generate 12 depth-conditioned images in the ScanNet testing dataset.
sh test_depth_fix_interval.sh: Generate a sequence of depth-conditioned images (every 20 frames) in the ScanNet testing dataset.
sh test_depth_two_stage.sh: Generate a sequence of depth-conditioned images (key frames), and interpolate the in-between images, in the ScanNet testing dataset.

After running either sh test_depth_fix_interval.sh or sh test_depth_two_stage.sh, you can use TSDF fusion to get textured mesh.

Training

Execute the following scripts for training:

sh train_pano.sh: Train the panoramic image generation model.
sh train_pano_outpaint.sh: Train the panoramic image outpaint model.
sh train_depth.sh: Train the depth conditioned generation model.

Custom data

Panorama generation:

Convert the panorama into 6 skybox images using the provided tool, Equirec2Perspec. You will get left, right, front, back, up, and down images.
Convert the panorama to 8 perspective images. Each image will capture a 45-degree horizontal view. Four of these images will overlap with the skybox images, specifically the left, right, front, and back views.
Once you have the perspective images, you can use BLIP2 to generate prompts from them.

Multi-view Depth-to-Image Generation:

Using Scannet Format: For this, you would typically follow the structure and format of the Scannet dataset.
use BLIP2 to generate prompts from each perspective image.

License

This project is licensed under the terms of the MIT license.

Contact

For any questions, feel free to contact us at [shitaot@sfu.ca].

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
assets		assets
configs		configs
generate_video_tool		generate_video_tool
scripts		scripts
src		src
README.md		README.md
demo.py		demo.py
requirements.txt		requirements.txt
test.py		test.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MVDiffusion: Enabling Holistic Multi-view Image Generation with Correspondence-Aware Diffusion, (NeurIPS2023, spotlight)

Project page | Paper | Demo

Citation

Updates: MVDiffusion is able to extrapolate a single perspective image into a 360-degree view panorama. The paper has been updated.

Installation

Model Zoo

Demo

Data

Testing

Training

Custom data

License

Contact

About

Releases

Packages

Contributors 3

Languages

Tangshitao/MVDiffusion

Folders and files

Latest commit

History

Repository files navigation

MVDiffusion: Enabling Holistic Multi-view Image Generation with Correspondence-Aware Diffusion, (NeurIPS2023, spotlight)

Project page | Paper | Demo

Citation

Updates: MVDiffusion is able to extrapolate a single perspective image into a 360-degree view panorama. The paper has been updated.

Installation

Model Zoo

Demo

Data

Testing

Training

Custom data

License

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages