[WIP] Implement multi-cond guidance for Composable Diffusion #1695

raefu · 2022-10-05T06:20:35Z

BUGS:

batch_size > 1 generates incorrect results
DDIM and PLMS samplers crash

photo of a cute (dog AND cat PLUS kitten), 4k, HD

As a bonus, add prompt weighting too.

Based on https://arxiv.org/pdf/2206.01714.pdf /
https://energy-based-model.github.io/Compositional-Visual-Generation-with-Composable-Diffusion-Models/

In vanilla Stable Diffusion, prompt generations is guided based on two prompts: towards the positive prompt, and away from the negative prompt. This change allows you to use an arbitrary number of prompts for guidance, for some interesting composition options. See the website above for more concrete examples.

Multi-cond guidance slows generation because it requires evaluating guidance for additional prompts for each step.

New syntax keywords: AND NOT PLUS -- since CLIP is case insensitive, just have them lowercase to use them in a prompt.

SYNTAX GUIDE:

Watch the console for debugging output of how each prompt is evaluated.

New case-sensitive keywords: AND NOT PLUS. Weights are :NUMBER.

"red AND white" guides with a "red" prompt and a "white" prompt.

"red:2 AND white" guides with a "red" prompt 2x stronger than a "white" prompt

"a photo of a (cat AND dog)" is equivalent to "a photo of a cat AND a photo of a dog" and generate an animal hybrid using the two prompts.

"a person NOT human" guides towards "a person" with "human" as a prompt with -0.5x weight.

"cat PLUS dog" guides with a prompt made by adding the CLIP-embeddings from "cat" to "dog" and dividing by 2.

You can combine PLUS and AND. "apple PLUS pear AND banana PLUS eggplant" makes an image containing apple/pear hybrids and banana/eggplant hybrids.

Multiple paren groups are supported and combine groups sensibly. "photo of (dog AND cat), cute, 4k, playing with (ball AND yarn)" => "photo of dog, cute, 4k, playing with ball AND photo of cat, cute, 4k, playing with yarn".

As a bonus, add prompt weighting too. Based on https://arxiv.org/pdf/2206.01714.pdf / https://energy-based-model.github.io/Compositional-Visual-Generation-with-Composable-Diffusion-Models/ In vanilla Stable Diffusion, prompt generations is guided based on two prompts: towards the positive prompt, and away from the negative prompt. This change allows you to use an arbitrary number of prompts for guidance, for some interesting composition options. See the website above for more concrete examples. Multi-cond guidance slows generation because it requires evaluating guidance for additional prompts for each step. New syntax keywords: AND NOT PLUS -- since CLIP is case insensitive, just have them lowercase to use them in a prompt. SYNTAX GUIDE: Watch the console for debugging output of how each prompt is evaluated. New case-sensitive keywords: AND NOT PLUS. Weights are :NUMBER. "red AND white" guides with a "red" prompt and a "white" prompt. "red:2 AND white" guides with a "red" prompt 2x stronger than a "white" prompt "a photo of a (cat AND dog)" is equivalent to "a photo of a cat AND a photo of a dog" and generate an animal hybrid using the two prompts. "a person NOT human" guides towards "a person" with "human" as a prompt with -0.5x weight. "cat PLUS dog" guides with a prompt made by adding the CLIP-embeddings from "cat" to "dog" and dividing by 2. You can combine PLUS and AND. "apple PLUS pear AND banana PLUS eggplant" makes an image containing apple/pear hybrids and banana/eggplant hybrids. Multiple paren groups are supported and combine groups sensibly. "photo of (dog AND cat), cute, 4k, playing with (ball AND yarn)" => "photo of dog, cute, 4k, playing with ball AND photo of cat, cute, 4k, playing with yarn".

AUTOMATIC1111 · 2022-10-05T09:26:27Z

The choice of using parens when you don't actually support nesting them seems wrong. It also clashes with attention. The sensible composition does not feel sensible to me. Sensible for "photo of (dog AND cat), cute, 4k, playing with (ball AND yarn)" would be to make four conds there with all combinations.

NOT seems redundant when you have weights.

PLUS is just unrelated and I still don't want it.

More than anything, the amount of added code is very very unappealing.

The page you link has just AND, without any parens, and that would be a good start. I feel that if we just support AND plus weights, the amount of code would become multiple times smaller and it would a lot simpler.

I don't feel right telling you to throw this away after you stent time working on it, but I don't want this complexity added to the repo. The contributing page does say that you should consult with me before PRing big changes. I have plans to add this kind of compositing myself, so if you don't want to rework the code to conform to those requirements, the feature will make it in anyway at some point.

differentprogramming · 2022-10-05T21:09:50Z

The page you link has just AND, without any parens, and that would be a good start.

I think some kind of grouping is needed
consider: man with (red shirt AND green hat) painted by van gogh
compared: with man with red shirt AND green hat painted by van gogh

In the second case only the green hat is painted by van gogh not the man or the red shirt. You need the grouping because styles and the like apply to the whole picture.

Your way, people would have to type out every combination completely:
man (red shirt AND green pants AND tweed vest) 4k photograph
would have to be:
man with red shirt 4k photograph AND man with green pants 4k photograph AND man with tweed vest 4k photograph

I think he is letting parens that don't have AND or PLUS in them through so that they can be for attention. One possible change would be to pick a new grouping pair like <> instead of ()

Though now that I typed it, the idea of making AND top level and requiring the whole prompt to be duplicated does have the advantage of simplicity.

moorehousew · 2022-10-05T21:39:59Z

@AUTOMATIC1111 Some sort of grouping would be sorely wanted, as another user pointed out. Some sort of standard syntax would be nice so that additional features can be freely added without clashing with old ones. S-expressions are trivial to parse, so if you can devise a prompt DSL with S-expressions it'll cost little in terms of complexity and maintainability.

Just a thought.

differentprogramming · 2022-10-06T08:32:19Z

There are people on reddit claiming that AND has just been added.
Has part of this pull been implemented already or are they wrong?

ArcticEcho · 2022-10-06T09:02:11Z

Added just a few hours ago in c26732?

differentprogramming · 2022-10-06T09:54:10Z

Does it have any grouping or delimiting or is it top level only?

differentprogramming · 2022-10-06T11:57:01Z

A problem with the current version is that there is no way to limit a negative weighted item to a single AND branch.

astrobleem · 2022-10-26T22:30:05Z

DDIM sampler crashes by putting AND NOT in the prompt.

* 🐛 Allow functionally equiv stale units * 🔇 demote stale warning to debug level * 🐛 Update keys check * ✅ Add unittests

AUTOMATIC1111 closed this Oct 18, 2022

ClashSAN mentioned this pull request Oct 26, 2022

[Bug]: Negation Prompts -- AND NOT operator #3747

Closed

1 task

aleksusklim mentioned this pull request Nov 7, 2022

Unlimited Token Works – add explicit mark for end of one token vector (proposing "OR" / "PLUS") – Implemented as "BREAK"! #2305

Closed

Ehplodor mentioned this pull request Dec 6, 2022

[Bug]: DDIM with AND keyword : OK in negative but KO in positive ? #5483

Closed

1 task

catboxanon mentioned this pull request Apr 14, 2023

[Bug]: Composable Diffusion is not aligned with official implementation #9280

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Implement multi-cond guidance for Composable Diffusion #1695

[WIP] Implement multi-cond guidance for Composable Diffusion #1695

raefu commented Oct 5, 2022 •

edited

Loading

AUTOMATIC1111 commented Oct 5, 2022

differentprogramming commented Oct 5, 2022 •

edited

Loading

moorehousew commented Oct 5, 2022

differentprogramming commented Oct 6, 2022

ArcticEcho commented Oct 6, 2022

differentprogramming commented Oct 6, 2022

differentprogramming commented Oct 6, 2022

astrobleem commented Oct 26, 2022

[WIP] Implement multi-cond guidance for Composable Diffusion #1695

[WIP] Implement multi-cond guidance for Composable Diffusion #1695

Conversation

raefu commented Oct 5, 2022 • edited Loading

AUTOMATIC1111 commented Oct 5, 2022

differentprogramming commented Oct 5, 2022 • edited Loading

moorehousew commented Oct 5, 2022

differentprogramming commented Oct 6, 2022

ArcticEcho commented Oct 6, 2022

differentprogramming commented Oct 6, 2022

differentprogramming commented Oct 6, 2022

astrobleem commented Oct 26, 2022

raefu commented Oct 5, 2022 •

edited

Loading

differentprogramming commented Oct 5, 2022 •

edited

Loading