Google's improved 'null textual inversion' implemented in colab #7314

Luke2642 · 2023-01-28T10:25:47Z

Luke2642
Jan 28, 2023

Google's null text inversion produces a near perfect textual inversion and allows prompt to prompt editing on any model!

I've been trying to get it working on google colab, I'm sort of half way there. I've sorted the xformers requirements, and reduced it down to fp16 so it'll fit in the T4 16GB memory, and I'm getting images from the VAE, but I've got some sort of float/half precision problem that means it's only generating black images from the latent. If any pytorch people have any advice I'd really appreciate it!

https://github.com/Luke2642/prompt-to-prompt-colab/blob/main/null_text_w_ptp_colab_fp16.ipynb

You'll have to paste in your own huggingface token to download the sd1.4 model. Once I've got it working in colab I can think about implementing it as an extension or feature.

Could anyone verify it actually works on colab with fp32 with colab pro? The original site/paper is https://null-text-inversion.github.io/

Luke2642 · 2023-01-28T10:45:42Z

Luke2642
Jan 28, 2023
Author

Someone's already got it working, phew!

https://github.com/ouhenio/null-text-inversion-colab

3 replies

Nacurutu Jan 28, 2023

Thanks for the info, really interesting.

could be this implemented to the Train tab?

What are the benefits over the actual training of TI?

it is an upgrade of the actual way we train or is it a new different way?

It needs the amount of memory you mention? (16 gb vram)? or can be used with lower vram cards?

Thanks in advance...

Luke2642 Jan 28, 2023
Author

yes, it's interesting.
yes
textual inversion has two design decisions which this changes (or fixes depending on your aim) 1) training one seed to replicate the image, rather than training all seeds. 2) cfg - classifier free guidance - actually makes the image generation process drift further away from perfect replication.
it's upgraded and different
yes, memory is a problem. I haven't even got it to run on 16gb yet, still tweaking it to work with FP16. Will report back when I do.

Nacurutu Jan 28, 2023

Thanks a lot for all the answers...

Alchete · 2023-01-28T14:54:25Z

Alchete
Jan 28, 2023

Was Google's code the basis of the instruct_to_pix extension in A1111, or is it different/better?

8 replies

Alchete Jan 29, 2023

Thanks. I'm still messing with the plugin. Does this only run with the 1.4 model loaded this way, with a token?

ldm_stable = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", use_auth_token=HF_TOKEN, scheduler=scheduler).to(device)

Because when trying to run it with the shared.sd_model I get errors that the current A1111 model is missing various attributes...

Luke2642 Jan 29, 2023
Author

I'm no help I'm afraid, I've not even started thinking about integration. I'm focused on getting it working in colab first before cluttering up my machine with a dozen versions of transformers/diffusers/xformers etc! There's a couple of other people working on it too, looking at the issues in the main google repo. Ask there!

Alchete Jan 29, 2023

Will do, thank you!

Ehplodor Jan 31, 2023

I'm suprised no-one has implemented a simple ddim image inversion to create an embedding. It's a lot more predictable than textual inversion training... but maybe it's not that useful?

Hi, FYI there are some relevant links on inverse ddim and other samplers here. It's a bit old now but maybe still useful.

Luke2642 Jan 31, 2023
Author

Thanks, that is useful! I really hope it's on a few more dev's radar now. It's potentially such a great tool working on any model.

There's also clip inversion too, which allows variations and image mixing, but that requires a finetuned model:

https://huggingface.co/lambdalabs/sd-image-variations-diffusers
https://huggingface.co/spaces/lambdalabs/image-mixer-demo

Luke2642 · 2023-01-31T12:08:52Z

Luke2642
Jan 31, 2023
Author

Another great repo implementing this https://github.com/cloneofsimo/inversion_edits

original

ddim inversion is almost perfect

plus blue boots - looks a bit strong / too high cfg here

The repo references this paper which is really good - injecting at different layers has interesting results!

So, new feature idea - a way to add emphasis to different prompts at different layers?

https://arxiv.org/pdf/2211.12572.pdf

0 replies

Alchete · 2023-02-01T02:47:47Z

Alchete
Feb 1, 2023

I'm able to duplicate his results with a quick-hack A1111 Extension, where I couldn't even get Google's implementation to compile, but he's got something wrong with his null-text inversion (middle image) and I think that's affecting the output. I tried with lower CFG scales, so it's not the CFG scale. Notice how his inversion image is missing part of the legs. I assume this then affects the re-prompting, which puts the "boots" all over the horse. The results are even worse if using a different image ...

If anyone knows what he did wrong, please chime in!! :D

But here are his results duplicated in A1111.

8 replies

Alchete Feb 1, 2023

Thanks. Unfortunately, I can't get anything that uses "encoder_hidden_states" to compile, which is what Google uses in their code. I'm guessing it's some kind of conflict with xformers versions? I've tried downgrading xformers to 0.3.0, which is what they claim is required, but I still get the same error with that tag. If anyone knows the issue, please let us know.

I'll try to get this into a form that can be published. All I did was copy the Instruct-pix2pix extension and start pasting the two code bases (Google's plus this one https://github.com/cloneofsimo/inversion_edits) into it. So, at the moment, most of it's hardcoded and it's a complete mess.

Luke2642 Feb 2, 2023
Author

If you haven't got past that error I'm confused by how you got anything out of your extension, is it just a ui mock up? :-D

There are two requirements combos that work on colab, that don't error with encoder_hidden_states one with triton:

!pip install -U --pre triton torchinfo xformers==0.0.16rc425 diffusers==0.7.2 transformers==4.22.2 accelerate==0.12.0 ftfy

in

https://github.com/Luke2642/prompt-to-prompt-colab/blob/main/null_text_w_ptp_colab_fp16.ipynb

And this combo works too without triton:

!pip install --quiet diffusers==0.8.0
!pip install --quiet https://github.com/brian6091/xformers-wheels/releases/download/0.0.15.dev0%2B4c06c79/xformers-0.0.15.dev0+4c06c79.d20221205-cp38-cp38-linux_x86_64.whl
!pip install --quiet --upgrade transformers scipy mediapy accelerate ftfy spacy einops

in

https://github.com/ouhenio/null-text-inversion-colab

One tool I recommend is useful forks. Create an access token and then it finds forks that actually have commits.

https://useful-forks.github.io/?repo=google/prompt-to-prompt

gives

Repo	URL	Stars	Ahead	Behind	Last Push
rmokady/prompt-to-prompt	https://github.com/rmokady/prompt-to-prompt	3	1	3	04/12/2022
chenxwh/prompt-to-prompt	https://github.com/chenxwh/prompt-to-prompt	1	1	3	29/10/2022
fyviezhao/prompt-to-prompt	https://github.com/fyviezhao/prompt-to-prompt	0	1	0	06/12/2022
eskalera/prompt-to-prompt	https://github.com/eskalera/prompt-to-prompt	0	2	0	19/12/2022
Pouyaexe/prompt-to-prompt	https://github.com/Pouyaexe/prompt-to-prompt	0	1	3	18/11/2022
ahagai/prompt-to-prompt	https://github.com/ahagai/prompt-to-prompt	0	9	0	25/01/2023
coryjo/prompt-to-prompt	https://github.com/coryjo/prompt-to-prompt	0	3	3	17/11/2022

Alchete Feb 2, 2023

Thanks. I couldn't get the Google code to run. The screenshot I showed was using this code base: https://github.com/cloneofsimo/inversion_edits

But in any event, there's a lot happening in the near future. The diffusers repo is considering implementing prompt-to-prompt natively: huggingface/diffusers#2121

And Stability is supposed to release its Deep Floyd model this month.

Luke2642 Feb 2, 2023
Author

Prompt to prompt isn't that interesting. Perfect inversion of an image to an embedding or clip embedding is much more interesting.

Alchete Feb 2, 2023

Completely agree. Maybe there's a practical use for prompt-to-prompt, but I don't see it. Why would someone generate a "box of apples" if they really wanted a "box of cookies"?

The power seems to be in "inpainting" with text via these other papers that allow a random "real" image to be edited. Their techniques seem fairly straightforward for someone who understands all the innards of SD, but unfortunately, that's not me. For example, this paper talks about accessing various decoder layers (1, 4, 7, 11) ... but how are those accessed, how many layers are there, is there any example code showing how to access these layers? These are such specific questions that YouTube and Google are of little use. If anyone has any information on this, please feel free to share. :)

vladmandic · 2023-02-10T17:31:52Z

vladmandic
Feb 10, 2023
Collaborator

quick one - is anyone working on porting null textual inversion to automatic1111?

4 replies

Ehplodor Feb 22, 2023

up :-)

Luke2642 Feb 22, 2023
Author

Not exactly, but cloneofsimo has been making steady improvements, so it's heading in the right direction:

https://github.com/cloneofsimo/inversion_edits

And there's an issue/request for it here:

#5287

Luke2642 Feb 25, 2023
Author

Pix2Pix zero has already been implement in diffusers, so that will probably come first.

#7711

https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/pix2pix_zero
https://pix2pixzero.github.io/

ClashSAN Mar 2, 2023
Collaborator

hi, maybe check this out - https://github.com/hnmr293/stable-diffusion-webui-ezp2p
by @hnmr293, who has made many other interesting experiments as well.

isamu-isozaki · 2023-02-26T22:15:50Z

isamu-isozaki
Feb 26, 2023

@Luke2642 Thanks for your implementation! Just one note. For the invert function, you/anyone might want to add a grad scaler! I might do a pr if I get the time

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Google's improved 'null textual inversion' implemented in colab #7314

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 6 comments 23 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Google's improved 'null textual inversion' implemented in colab #7314

Replies: 6 comments · 23 replies

Luke2642 Jan 28, 2023 Author

Luke2642 Jan 28, 2023 Author

Luke2642 Jan 29, 2023 Author

Luke2642 Jan 31, 2023 Author

Luke2642 Jan 31, 2023 Author

Luke2642 Feb 2, 2023 Author

Luke2642 Feb 2, 2023 Author

vladmandic Feb 10, 2023 Collaborator

Luke2642 Feb 22, 2023 Author

Luke2642 Feb 25, 2023 Author

ClashSAN Mar 2, 2023 Collaborator

Replies: 6 comments 23 replies

Luke2642
Jan 28, 2023
Author

Luke2642 Jan 28, 2023
Author

Luke2642 Jan 29, 2023
Author

Luke2642 Jan 31, 2023
Author

Luke2642
Jan 31, 2023
Author

Luke2642 Feb 2, 2023
Author

Luke2642 Feb 2, 2023
Author

vladmandic
Feb 10, 2023
Collaborator

Luke2642 Feb 22, 2023
Author

Luke2642 Feb 25, 2023
Author

ClashSAN Mar 2, 2023
Collaborator