Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Community Pipeline] Imagic: Text-Based Real Image Editing with Diffusion Models #895

Closed
apolinario opened this issue Oct 18, 2022 · 18 comments

Comments

@apolinario
Copy link
Collaborator

Intro

Community Pipelines are introduced in diffusers==0.4.0 with the idea of allowing the community to quickly add, integrate, and share their custom pipelines on top of diffusers.

You can find a guide about Community Pipelines here. You can also find all the community examples under examples/community/. If you have questions about the Community Pipelines feature, please head to the parent issue.

Idea: Imagic: Text-based Real Image Editing with Diffusion Models

This pipeline aims to implement this paper to Stable Diffusion, allowing for real-world image editing. Example from the paper:
image

@Alx-AI
Copy link

Alx-AI commented Oct 18, 2022

Would love to see this added, notebook implementation here for reference https://github.com/justinpinkney/stable-diffusion/blob/main/notebooks/imagic.ipynb

@asofiaoliveira
Copy link

I would like to work on this

@patrickvonplaten
Copy link
Contributor

Awesome @asofiaoliveira !

Feel free to open a PR and to attach it here

@MarkRich
Copy link
Contributor

tried my hand at this here: #958 please let me know if there's any comments!

@asofiaoliveira
Copy link

I guess I'll leave it to @MarkRich then 😅

@0xdevalias
Copy link
Contributor

FYI, it looks like that PR has been merged now:

And the implementation is available here:

Can this issue be closed now?

@njucckevin
Copy link

FYI

Is there someone try the effect with this code? I failed to achieve the effect in the paper (for example, let a doy playing with a toy) with the Imagic Stable Diffusion.

@askerlee
Copy link

@0xdevalias It seems there is some issue with the implementation? In train(), the text embedding is optimized first. But prior to that, unet and text_encoder are set to disable BP:

    self.unet.requires_grad_(False)
    self.text_encoder.requires_grad_(False)

Does this mean there won't be valid gradients back propagated from the loss to the text embedding?
I'm not very sure. Thanks.

@0xdevalias
Copy link
Contributor

@askerlee I don't know anything about the implementation, nor really used it. I just noticed the PR and figured I'd link it here

@ShaoTengLiu
Copy link

@njucckevin I also get wrong results.
For example, here are the results for "A photo of a bird spreading wings":
bird_alpha_1
bird_alpha_1_5
bird_alpha_2

Can anyone give me some hints?

@BoyuanJiang
Copy link

@njucckevin I also get wrong results. For example, here are the results for "A photo of a bird spreading wings": bird_alpha_1 bird_alpha_1_5 bird_alpha_2

Can anyone give me some hints?

I also cannot reproduce the result in the paper

@ghost
Copy link

ghost commented Feb 6, 2023

Hey guys,
here is what I have:
Unknown

Unknown-2
Unknown-3

@ghost
Copy link

ghost commented Feb 6, 2023

I am using stable diffusion to replicate their result on imagen(non open source) and used 500 optimisation steps and 1000 fine-tuning steps. The best result come from the last picture where we have lambda coefficient of 1. Not quite sure why the change comes so late.

@tasinislam21
Copy link

@njucckevin I also get wrong results. For example, here are the results for "A photo of a bird spreading wings": bird_alpha_1 bird_alpha_1_5 bird_alpha_2

Can anyone give me some hints?

I am getting the exact same problem. What is the solution to this?

@tasinislam21
Copy link

FYI

Is there someone try the effect with this code? I failed to achieve the effect in the paper (for example, let a doy playing with a toy) with the Imagic Stable Diffusion.

I am also facing this problem. When I set anything below 1 for lambda coefficient, I get an image that is the same as the input image. If I change the coefficient by more than 1 then I get an image of a random bird spreading its wing.

@shwetabhardwaj44
Copy link

shwetabhardwaj44 commented May 9, 2023

I am using stable diffusion to replicate their result on imagen(non open source) and used 500 optimisation steps and 1000 fine-tuning steps. The best result come from the last picture where we have lambda coefficient of 1. Not quite sure why the change comes so late.

Hi @Kathy-Peng I am also following the same config and my code is written as below. Could you kindly confirm if there is any missing step in this. In my results, the edited image also doesn't change much even at lambda coefficient = 1.1

model_id = "CompVis/stable-diffusion-v1-4"
pipe = DiffusionPipeline.from_pretrained(
                          model_id,
                          cache_dir=CACHE_DIR,
                          safety_checker=None,
                          use_auth_token=True,
                          custom_pipeline="imagic_stable_diffusion",
                          scheduler = DDIMScheduler(\
                                      beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear",\
                                      clip_sample=False, set_alpha_to_one=False)
                          )
pipe.to("cuda")
generator = torch.Generator("cuda").manual_seed(0)

alphas = [0.8, 0.9, 1, 1.1, 1.2]
guidance_scale = [6.5, 7.0, 7.5, 8.0, 8.5, 9.5]
# curr_prompt  is given
# curr_image is loaded

res = pipe.train(curr_prompt, image=curr_image, generator=generator)
reconstructed_image = res.images[0]

## Once the pipeline is trained, run inference with different alphas and text guidance scales.
for alpha in alphas:
        for text_guide in guidance_scale:                                                                             
                edited_image  = pipe(num_inference_steps=50, alpha=alpha, guidance_scale=text_guide).images[0]

@HashmatShadab
Copy link

Was anyone able to reproduce the results?

@witcherofresearch
Copy link

@HashmatShadab @njucckevin @BoyuanJiang @ShaoTengLiu @1702609 @shwetabhardwaj44 Please check out my new paper Forgedit, which is much faster than Imagic and edting results are way much better than Imagic with Stable Diffusion. https://github.com/witcherofresearch/Forgedit/
Here is a few examples on the editing results of Forgedit. For example, for the target prompt 'A photo of a bird spreading wings.' and original image
bird
using the DreamBoothForgedit, we could get
6_orig_A photo of a bird spreading wings _guidance_scale=7 5__textalpha=0 8_alpha=1 0_bird jpeg
4_orig_A photo of a bird spreading wings _guidance_scale=7 5__textalpha=1 1_alpha=1 3_bird jpeg
9_orig_A photo of a bird spreading wings _guidance_scale=7 5__textalpha=0 8_alpha=1 2000000000000002_bird jpeg
1_orig_A photo of a bird spreading wings _guidance_scale=7 5__textalpha=0 8_alpha=0 9_bird jpeg
image
image
image
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests