Implement Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis #1417

rabidcopy · 2022-10-01T00:11:53Z

Is your feature request related to a problem? Please describe.
Don't think this is a duplicate of anything else and shouldn't be confused with #1325. This is related to the problems showcased in these images provided by the research paper that will be linked below. Anyone who uses SD on a frequent basis should know some of these issues far too well.

Describe the solution you'd like
Implementation of the changes made to txt2img.py and attention.py to reduce these problems that show up in AI image generation. Obviously this shouldn't replace the default and should be an option offered as opt-in with plenty of warning that it will produce different results than what is currently produced.

Appropriate links to the research page, paper, and zip file that contains their modified txt2img.py and attention.py
https://openreview.net/forum?id=PUIqjT4rzq7
https://openreview.net/pdf?id=PUIqjT4rzq7
https://openreview.net/attachment?id=PUIqjT4rzq7&name=supplementary_material

C43H66N12O12S2 · 2022-10-01T01:15:07Z

This seems to rely on parsing the input prompt to seperate nouns and tokenize each one seperately. I lack any experience with such a thing - though I tried anyways and failed.

It also uses a NLP model and depends on stanza.

As far as attention goes, I believe this would be sufficient.

    if isinstance(context, list):
      uc_context = context[0]
      context_k, context_v = context[1]['k'], context[1]['v']
      k_in = self.to_k(torch.cat([uc_context, context_k], dim=0)) * self.scale
      v_in = self.to_v(torch.cat([uc_context, context_v], dim=0))
    else:
      k_in = self.to_k(context) * self.scale
      v_in = self.to_v(context)

@AUTOMATIC1111 would you be interested in this?

differentprogramming · 2022-10-01T02:30:21Z

This seems to rely on parsing the input prompt to seperate nouns and tokenize each one seperately. I lack any experience with such a thing - though I tried anyways and failed.

As far as attention goes, I believe this would be sufficient.
    if isinstance(context, list):
      uc_context = context[0]
      context_k, context_v = context[1]['k'], context[1]['v']
      k_in = self.to_k(torch.cat([uc_context, context_k], dim=0)) * self.scale
      v_in = self.to_v(torch.cat([uc_context, context_v], dim=0))
    else:
      k_in = self.to_k(context) * self.scale
      v_in = self.to_v(context)
@AUTOMATIC1111 would you be interested in this?

Where would that go, I'd like to try it!

C43H66N12O12S2 · 2022-10-01T03:26:01Z

@differentprogramming It won't work. The hard part is in txt2img

differentprogramming · 2022-10-02T06:53:51Z

I tried to run the sample version but it dies:

2022-10-01 23:53:04 INFO: Use device: gpu
2022-10-01 23:53:04 INFO: Loading: tokenize
2022-10-01 23:53:07 INFO: Loading: pos
2022-10-01 23:53:08 INFO: Loading: constituency
2022-10-01 23:53:09 INFO: Done loading processors!
Traceback (most recent call last):
File "C:\Users\joshu\anaconda3\envs\ldm\lib\site-packages\transformers\utils\hub.py", line 408, in cached_file
resolved_file = hf_hub_download(
File "C:\Users\joshu\anaconda3\envs\ldm\lib\site-packages\huggingface_hub\file_download.py", line 1099, in hf_hub_download
_raise_for_status(r)
File "C:\Users\joshu\anaconda3\envs\ldm\lib\site-packages\huggingface_hub\utils_errors.py", line 148, in _raise_for_status
raise e
File "C:\Users\joshu\anaconda3\envs\ldm\lib\site-packages\huggingface_hub\utils_errors.py", line 111, in _raise_for_status
response.raise_for_status()
File "C:\Users\joshu\anaconda3\envs\ldm\lib\site-packages\requests\models.py", line 1021, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://huggingface.co//resolve/main/preprocessor_config.json (Request ID: UUwBY8TcCL7a11nBMUtFz)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "scripts/txt2img.py", line 35, in
safety_feature_extractor = AutoFeatureExtractor.from_pretrained(safety_model_id)
File "C:\Users\joshu\anaconda3\envs\ldm\lib\site-packages\transformers\models\auto\feature_extraction_auto.py", line 292, in from_pretrained
config_dict, _ = FeatureExtractionMixin.get_feature_extractor_dict(pretrained_model_name_or_path, **kwargs)
File "C:\Users\joshu\anaconda3\envs\ldm\lib\site-packages\transformers\feature_extraction_utils.py", line 398, in get_feature_extractor_dict
resolved_feature_extractor_file = cached_file(
File "C:\Users\joshu\anaconda3\envs\ldm\lib\site-packages\transformers\utils\hub.py", line 465, in cached_file
raise EnvironmentError(f"There was a specific connection error when trying to load {path_or_repo_id}:\n{err}")
OSError: There was a specific connection error when trying to load :
404 Client Error: Not Found for url: https://huggingface.co//resolve/main/preprocessor_config.json (Request ID: UUwBY8TcCL7a11nBMUtFz)

isaac-bender · 2022-10-07T01:46:25Z

I tried to run the sample version but it dies:
...
OSError: There was a specific connection error when trying to load : 404 Client Error: Not Found for url: https://huggingface.co//resolve/main/preprocessor_config.json (Request ID: UUwBY8TcCL7a11nBMUtFz)

That error has nothing to do with the code, you're just failing to connect to huggingface, probably because you didn't supply login info

Ehplodor · 2022-10-24T13:13:47Z

Up. This is a must IMHO

mezotaken added the enhancement New feature or request label Jan 12, 2023

catboxanon added the extension-request Items that should be implemented as an extension rather than part of this repo label Aug 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis #1417

Implement Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis #1417

rabidcopy commented Oct 1, 2022

C43H66N12O12S2 commented Oct 1, 2022 •

edited

Loading

differentprogramming commented Oct 1, 2022

C43H66N12O12S2 commented Oct 1, 2022

differentprogramming commented Oct 2, 2022

isaac-bender commented Oct 7, 2022

Ehplodor commented Oct 24, 2022

Implement Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis #1417

Implement Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis #1417

Comments

rabidcopy commented Oct 1, 2022

C43H66N12O12S2 commented Oct 1, 2022 • edited Loading

differentprogramming commented Oct 1, 2022

C43H66N12O12S2 commented Oct 1, 2022

differentprogramming commented Oct 2, 2022

isaac-bender commented Oct 7, 2022

Ehplodor commented Oct 24, 2022

C43H66N12O12S2 commented Oct 1, 2022 •

edited

Loading