Pix2Pixzero - Zero-shot Image-to-Image Translation #7711

toyxyz · 2023-02-10T15:17:27Z

toyxyz
Feb 10, 2023

A few days ago I found an interesting post on reddit about a new method called "Zero-shot Image-to-Image Translation". The result looks similar to the Depth Guided model, but it is said that it is possible to use it without fine tuning. According to the developer, pix2pix-zero can directly use pre-trained text-to-image diffusion models such as Stable Diffusion. The code hasn't been released yet, but could it be added to the Webui?

https://pix2pixzero.github.io/

https://www.reddit.com/r/StableDiffusion/comments/10wfl0r/zeroshot_imagetoimage_translation/

"We propose pix2pix-zero, a diffusion-based image-to-image approach that allows users to specify the edit direction on-the-fly (e.g., cat to dog). Our method can directly use pre-trained text-to-image diffusion models, such as Stable Diffusion, for editing real and synthetic images while preserving the input image's structure. Our method is training-free and prompt-free, as it requires neither manual text prompting for each input image nor costly fine-tuning for each task.
TL;DR: no finetuning required; no text input needed; input structure preserved."

musicurgy · 2023-02-12T17:33:25Z

musicurgy
Feb 12, 2023

Please add this, it looks amazing. Unofficial huggingface demo : https://huggingface.co/spaces/ysharma/pix2pix-zero-01 (Very barebones and only supports "cat to dog" or "dog to cat" right now)

0 replies

ThereforeGames · 2023-02-12T20:32:58Z

ThereforeGames
Feb 12, 2023

The code hasn't been released yet, but could it be added to the Webui?

It looks like the repo contains source code for the method, although I haven't examined it too closely.

I'd be happy to try adding this as an Unprompted shortcode, similar to what I did with Hard Prompts Made Easy. If anyone is already working on a separate solution, please let me know.

0 replies

ThereforeGames · 2023-02-13T02:30:15Z

ThereforeGames
Feb 13, 2023

Well, I managed to get pix2pix_zero working inside of the WebUI, but there are a few issues I need to solve before releasing the code:

Their method works in diffusers format and it may take some time to adapt the code for ckpt/safetensors compatibility.
Their method requires one of the latest versions of the diffusers package, which I need to make sure won't interfere with other popular extensions (for example, sd-dreambooth).
I feel that the claims of "no finetuning" and "on-the-fly editing" are a bit misleading. The reason that the HuggingFace demo is barebones is because you need to generate embedding files for every concept in advance; the repo only provides pre-made embeddings for horse, dog, cat, and zebra. You have to run a separate script to generate your own embedding files. Maybe this was already obvious to some of you, but it wasn't obvious to me until I actually began looking at the code.
Running the main script does involve a quick training session. I guess it's a form of textual inversion. It takes about 20 seconds on my 3090 at the default setting of 50 steps.

Anyhow, "cat2dog" gave me this:

Not bad, although it's reminding me more of a fox.

6 replies

ThereforeGames Feb 13, 2023

@ClashSAN So, the authors call these embeddings "edit directions." Here's how you would make your own:

To generate new edit directions, users can first generate two files containing a large number of sentences (~1000) and then run the command as shown below.

And here is the script that outputs the embedding file: https://github.com/pix2pixzero/pix2pix-zero/blob/main/src/make_edit_direction.py

We could probably automate this to some degree, but I would hardly say that the method in its current form qualifies as "on the fly."

tekakutli Feb 26, 2023

@ThereforeGames hello, I was wondering if there was progress with this

ThereforeGames Feb 27, 2023

Hi @tekakutli, the [pix2pix_zero] shortcode is available in Unprompted, but you'll need to change the hardcoded model_path variable on line 33 of unprompted/shortcodes/stable_diffusion/pix2pix_zero.py. It should point to a copy of SD 1.4 in diffusers format.

Anyway, given the steps required to set up an embedding for every concept, it is not clear to me that pix2pix_zero offers any advantages over other methods (such as ControlNet.) If someone makes a compelling case for it, I'll continue developing the shortcode.

tekakutli Feb 27, 2023

thank you @ThereforeGames

pix2pixzero Mar 1, 2023

Thank you for the interst in pix2pix-zero! Generating new concepts on the fly is pretty doable by calling a language model.
We do this in the huggingface demo. The relevant code for computing the sentences is here.

We also have a large list of edit directions in the HuggingFace library that can be used directly.

pix2pix-zero can do many tasks that other methods cannot do (see here) and can be used with any stable diffusion checkpoint and does not need any task specific training.

musicurgy · 2023-02-13T03:23:01Z

musicurgy
Feb 13, 2023

I feel that the claims of "no finetuning" and "on-the-fly editing" are a bit misleading. The reason that the HuggingFace demo is barebones is because you need to generate embedding files for every concept in advance; the repo only provides pre-made embeddings for horse, dog, cat, and zebra. You have to run a separate script to generate your own embedding files. Maybe this was already obvious to some of you, but it wasn't obvious to me until I actually began looking at the code.

I also didn't realize you needed to generate the embedding files beforehand, unfortunate but thanks for taking the time to look into it. That said, I got some pretty interesting results testing it.

dog2cat

cat2dog

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pix2Pixzero - Zero-shot Image-to-Image Translation #7711

{{title}}

Replies: 4 comments 6 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Pix2Pixzero - Zero-shot Image-to-Image Translation #7711

Replies: 4 comments · 6 replies

Replies: 4 comments 6 replies