VISII - Visual Instruction Inversion 👀

arXiv | BibTeX | Project Page


Visii learn instruction from before → after image, then apply to new images to perform same edit.

👀 Visual Instruction Inversion: Image Editing via Image Prompting (NeurIPS 2023)
Thao Nguyen, Yuheng Li, Utkarsh Ojha, Yong Jae Lee
🦡 University of Wisconsin-Madison

TL;DR: A framework for inverting visual prompts into editing instructions for text-to-image diffusion models.

ELI5 👧: You show the machine how to perform a task (by images), and then it replicates your actions. For example, it can learn your drawing style 🖍️ and use it to create a new drawing 🎨.

Requirements

This script is tested on NVIDIA RTX 3090, Python 3.7 and PyTorch 1.13.0 and diffusers.

pip install -r requirements.txt

Quickstart

Visual Instruction Inversion with InstructPix2Pix.

# optimize <ins> (default checkpoint)
python train.py --image_folder ./images --subfolder painting1
# test <ins>
python test.py
# hybrid instruction: <ins> + "a squirrel" (default checkpoint)
python test.py --hybrid_ins True --prompt "a husky" --guidance_scale 10

Result image will be saved in ./result folder.

Before:

After:

Test:

Visii learns editing instruction from dog → watercolor dog image, then applies it into new image to perform same edit. You can also concatenate new information to achieve new effects: dog → watercolor husky.

	Different photos are generated from different noises.
<ins>
<ins> + "a husky" 🐶
<ins> + "sa quirrel" 🐿️
<ins> + "a tiger" 🐯
<ins> + "a rabbit" 🐰
<ins> + "a blue jay" 🐦
<ins> + "a polar bear" 🐻‍❄️
<ins> + "a badger" 🦡
on & on ...

⚠️ If you're not getting the quality that you want... You might tune the guidance_scale.

<ins> + "a poodle": From left to right: Increase the guidance scale (4, 6, 8, 10, 12, 14)

Starbucks Logo

🧚🧚🧚 Inspired by this reddit, we tested Visii + InstructPix2Pix with Starbucks and Gandour logos.

Before:	After:
Test:	<ins> + "Wonder Woman"	<ins> + "Scarlet Witch"	<ins> + "Daenerys Targaryen"
	<ins> + "Neytiri in Avatar"	<ins> + "She-Hulk"	<ins> + "Maleficent"

(If you're still not getting the quality that you want... You might tune the InstructPix2Pix parameters. See Tips or Optimizing progress ⚠️ for more details.)

Visual Instruction Inversion

1. Prepare before-after images: A basic structure for image-folder should look like below. {image_name}_{0}.png denotes before image, {image_name}_{1}.png denotes after image.

By default, we use 0_0.png as the before image and 0_1.png as the after image. 1_0.png is the test image.

{image_folder}
└───{subfolder}
    │   0_0.png # before image
    │   0_1.png # after image
    │   1_0.png # test image

Check ./images/painting1 for example folder structure.

2. Instruction Optimization: Check the ./configs/ip2p_config.yaml for more details of hyper-parameters and settings.

Visii + InstructPix2Pix

# optimize <ins> (default checkpoint)
python train.py --image_folder ./images --subfolder painting1
# test <ins>
python test.py --log_folder ip2p_painting1_0_0.png
# hybrid instruction: <ins> + "a squirrel" (default checkpoint)
python test_concat.py --prompt "a husky"

Visii + ControlNet!

We plugged Visii with ControlNet 1.1 InstructPix2Pix.

# optimize <ins> (default checkpoint)
python train_controlnet.py --image_folder ./images --subfolder painting1
# test <ins>
python test_controlnet.py --log_folder controlnet_painting1_0_0.png

Optimizing Progress

By default, we use the lowest MSE checkpoint (./logs/{foldername}/best.pth) as the final instruction.

Sometimes, the best.pth checkpoint might not yield the best result.

If you want to use a different checkpoint, you can specify it using the --checkpoint_number argument.

A visualization of the optimization progress is saved in ./logs/{foldername}/eval_100.png ⚠️. You can visually select the best checkpoint for testing.

# test <ins> (with specified checkpoint)
python test.py --log_folder ip2p_painting1_0_0.png --checkpoint_number 800
# hybrid instruction: <ins> + "a squirrel" (with specified checkpoint)
python test_concat.py --prompt "a husky" --checkpoint_number 800

From left to right: [Before, After, Iter 0, Iter 100, ..., Iter 900]. You can visually select the best checkpoint for testing.

Side note: Before-after image should be algined for better results.

Acknowledgement

Ours code is based on InstructPix2Pix, Hard Prompts Made Easy, Imagic, and Textual Inversion. You might also check awesome Visual Prompting via Image Inpainting. Thank you! 🙇‍♀️

Photo credit: Bo the Shiba & Mam the Cat 🐕🐈.

BibTeX

@inproceedings{
nguyen2023visual,
title={Visual Instruction Inversion: Image Editing via Image Prompting},
author={Thao Nguyen and Yuheng Li and Utkarsh Ojha and Yong Jae Lee},
booktitle={Thirty-seventh Conference on Neural Information Processing Systems},
year={2023},
url={https://openreview.net/forum?id=l9BsCh8ikK}
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
assets		assets
configs		configs
images		images
open_clip		open_clip
README.md		README.md
index.html		index.html
pez.py		pez.py
requirements.txt		requirements.txt
test.py		test.py
train.py		train.py
visii.py		visii.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VISII - Visual Instruction Inversion 👀

arXiv | BibTeX | Project Page

Requirements

Quickstart

Starbucks Logo

Visual Instruction Inversion

Visii + InstructPix2Pix

Visii + ControlNet!

Optimizing Progress

Acknowledgement

BibTeX

About

Releases

Packages

Languages

WisconsinAIVision/visii

Folders and files

Latest commit

History

Repository files navigation

VISII - Visual Instruction Inversion 👀

arXiv | BibTeX | Project Page

Requirements

Quickstart

Starbucks Logo

Visual Instruction Inversion

Visii + InstructPix2Pix

Visii + ControlNet!

Optimizing Progress

Acknowledgement

BibTeX

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages