※This script works after it merged.
2023/03/11 This PR has been merged and updated accordingly.
2023/06/22 We have decided to stop inferencing with TensorFlow and ONNX and switch to PyTorch model. As a result, a new model has been downloaded and the old TensorFlow and ONNX models are no longer needed.
PFG(Prompt Free Generation) is a method of guiding with an image by concatenating the image into a text encoder hidden states. It can generate a variety of images without prompting.
- Convert image into (786,) vector by wd14tagger (output of last pooling layer)
- Convert the vector to (num_tokens, 768 or 1024) tensor by pretrained linear.
- Concatenate text encoder hidden states and it.
- Install this extension in the same way as other extensions.
- Download pretrained model (pfg-wd14-n10.pt etc..) and put it in ./models.
- You can see menu bar of "PFG" on txt2img tab.
- pfg_scale: The strength of the pfg.
- pfg model: Weight file of pfg in ./models.
- pfg num tokens: This is determined uniquely for a weight file.
- use onnx: Using onnxruntime.
- The CFG scale is an important parameter. It may work better if it is larger than for normal image generation.
この拡張はstable-diffusion-webui-two-shotおよび、sd-webui-additional-networksを参考にしています。opparco氏とkohya-ss氏、またwebui開発者であるAUTOMATIC1111氏に感謝いたします。