Transfer the T2I-Adapter with any basemodel in diffusers🔥
T2I-Adapter, a simple and small (~70M parameters, ~300M storage space) network that can provide extra guidance to pre-trained text-to-image models while freezing the original large text-to-image models. This repository provides the simplest tutorial code for using T2I-Adapter with diverse basemodel in the diffuser framework. It is very similar to ControlNet.
We have also supported Lora-for-Diffusers and ControlNet-for-Diffusers.
As T2I-Adapter only trains adapter layers and keep all stable-diffusion models frozen, it is flexible to use any stable diffusion models as base. Here, I just use stable-diffusion-1.5 as an example.
mkdir models && cd models
wget https://huggingface.co/TencentARC/T2I-Adapter/resolve/main/models/t2iadapter_keypose_sd14v1.pth
wget https://huggingface.co/TencentARC/T2I-Adapter/resolve/main/models/t2iadapter_seg_sd14v1.pth
wget https://huggingface.co/TencentARC/T2I-Adapter/resolve/main/models/t2iadapter_sketch_sd14v1.pth
cd ..
# please use this dev version of diffusers, as it has supported new pipeline
git clone https://github.com/HimariO/diffusers-t2i-adapter.git
git checkout general-adapter
cd diffusers-t2i-adapter
# manually change ./src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_adapter.py
# in StableDiffusionAdapterPipeline, comment out following lines
# adapter: Adapter,
# self.register_modules(adapter=adapter)
# then install from source
pip install .
cd ..
import torch
from typing import *
from diffusers.utils import load_image
from diffusers import StableDiffusionAdapterPipeline, Adapter
model_name = "runwayml/stable-diffusion-v1-5"
pipe = StableDiffusionAdapterPipeline.from_pretrained(model_name, torch_dtype=torch.float32).to('cuda')
adapter_ckpt = "./models/t2iadapter_seg_sd14v1.pth"
pipe.adapter = Adapter(cin=int(3*64),
channels=[320, 640, 1280, 1280][:4],
nums_rb=2,
ksize=1,
sk=True,
use_conv=False)
pipe.adapter.load_state_dict(torch.load(adapter_ckpt))
pipe.adapter = pipe.adapter.to('cuda')
@torch.no_grad()
def get_color_masks(image: torch.Tensor) -> Dict[Tuple[int], torch.Tensor]:
h, w, c = image.shape
assert c == 3
img_2d = image.view((-1, 3))
colors, freqs = torch.unique(img_2d, return_counts=True, dim=0)
colors = colors[freqs >= h]
color2mask = {}
for color in colors:
mask = (image == color).float().max(dim=-1).values
color = color.cpu().numpy().tolist()
color2mask[tuple(color)] = mask
return color2mask
mask = load_image("./diffusers-t2i-adapter/motor.png")
prompt = ["A black Honda motorcycle parked in front of a garage"]
image = pipe(prompt, [mask, mask]).images[0]
image.save('test.jpg')
You can get the results as below, input segmentation image (left), text-guided generated results (right).
If you want to use pose as input,
mask = load_image("./diffusers-t2i-adapter/pose.png")
prompt = ["A gril"]
# note the difference here!
image = pipe(prompt, [mask]).images[0]
image.save('result.jpg')
Please note that it is required to use correct pose format to make sure the generated results are satisfied. For the pre-trained T2I-Adapter (pose), you need to use COCO format pose. MMPose is recommendated. The following examples show the difference, OpenPose format (upper), COCO format (bottom)
Coming soon!
The diffusers pipeline is supported by HimariO, this repo is highly built on the top of it (fixed several typos) and works just as a handy tutorial.
The repo is still under active development, if you have any issue when using it, feel free to open an issue.