Replies: 4 comments 6 replies
-
Please add this, it looks amazing. Unofficial huggingface demo : https://huggingface.co/spaces/ysharma/pix2pix-zero-01 (Very barebones and only supports "cat to dog" or "dog to cat" right now) |
Beta Was this translation helpful? Give feedback.
-
It looks like the repo contains source code for the method, although I haven't examined it too closely. I'd be happy to try adding this as an Unprompted shortcode, similar to what I did with Hard Prompts Made Easy. If anyone is already working on a separate solution, please let me know. |
Beta Was this translation helpful? Give feedback.
-
Well, I managed to get pix2pix_zero working inside of the WebUI, but there are a few issues I need to solve before releasing the code:
Anyhow, "cat2dog" gave me this: Not bad, although it's reminding me more of a fox. |
Beta Was this translation helpful? Give feedback.
-
I also didn't realize you needed to generate the embedding files beforehand, unfortunate but thanks for taking the time to look into it. That said, I got some pretty interesting results testing it. |
Beta Was this translation helpful? Give feedback.
-
A few days ago I found an interesting post on reddit about a new method called "Zero-shot Image-to-Image Translation". The result looks similar to the Depth Guided model, but it is said that it is possible to use it without fine tuning. According to the developer, pix2pix-zero can directly use pre-trained text-to-image diffusion models such as Stable Diffusion. The code hasn't been released yet, but could it be added to the Webui?
https://pix2pixzero.github.io/
https://www.reddit.com/r/StableDiffusion/comments/10wfl0r/zeroshot_imagetoimage_translation/
"We propose pix2pix-zero, a diffusion-based image-to-image approach that allows users to specify the edit direction on-the-fly (e.g., cat to dog). Our method can directly use pre-trained text-to-image diffusion models, such as Stable Diffusion, for editing real and synthetic images while preserving the input image's structure. Our method is training-free and prompt-free, as it requires neither manual text prompting for each input image nor costly fine-tuning for each task.
TL;DR: no finetuning required; no text input needed; input structure preserved."
Beta Was this translation helpful? Give feedback.
All reactions