Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion: Using Target Prompt for Improved Real Image Editing Results #4

Open
phymhan opened this issue Apr 27, 2023 · 4 comments
Open

Comments

@phymhan
Copy link

phymhan commented Apr 27, 2023

Hi there,

Thank you for the amazing work! I thoroughly enjoyed reading your paper. I have a suggestion for potentially improving real image editing results. I noticed that in some cases, using the target prompt for DDIM inversion seems to yield better editing results compared to using the source prompt (as shown in Figure 3). Here are two examples (input image):
Using source prompt:
all_step4_layer10 (3)

Using target prompt:
all_step4_layer10 (2)

Using source prompt:
all_step4_layer10 (1)

Using target prompt:
all_step4_layer10

I used the cmds from here. The car's pose seems better aligned with the original input image, for which I've also observed similar behavior in my experiments. I guess this share some similarities with the idea behind Imagic. While I'm not certain if this would be universally beneficial, I think it might be worth exploring further. Once again, thank you and congratulations on the fantastic work!

@ljzycmd
Copy link
Collaborator

ljzycmd commented Apr 27, 2023

Hi, many thanks for your insightful suggestions! The results are much promising. I have conducted a quick test with another image with the command:

python playground.py --model_path runwayml/stable-diffusion-v1-5  --image_real corgi.jpg --inv_scale 1 --scale 5 --prompt1 "a photo of a corgi" --prompt2 "a photo of a corgi in lego style" --inv_prompt tar

and the results are:
all_step4_layer10
The reconstructed image differs from the source image significantly. I guess, in some cases, this idea is much helpful thanks to the spatial information encoded in the inverted noise map. Thanks for your insightful suggestions again, and I will explore it further with more real image tests. 😊

@phymhan
Copy link
Author

phymhan commented Apr 27, 2023

Hi @ljzycmd, thanks for your feedback and for conducting a quick test! Looking forward to seeing future developments in the awesome project!

@lavenderrz
Copy link

Hi, many thanks for your insightful suggestions! The results are much promising. I have conducted a quick test with another image with the command:

python playground.py --model_path runwayml/stable-diffusion-v1-5  --image_real corgi.jpg --inv_scale 1 --scale 5 --prompt1 "a photo of a corgi" --prompt2 "a photo of a corgi in lego style" --inv_prompt tar

and the results are: all_step4_layer10 The reconstructed image differs from the source image significantly. I guess, in some cases, this idea is much helpful thanks to the spatial information encoded in the inverted noise map. Thanks for your insightful suggestions again, and I will explore it further with more real image tests. 😊

Do you mind sharing playground.py?

@ljzycmd
Copy link
Collaborator

ljzycmd commented May 3, 2024

@lavenderrz, you can find playground.py here: https://github.com/phymhan/MasaCtrl

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants