As part of the HugGAN community event, I trained a 105M-parameters latent diffusion model using a knowledge distillation process.
Prompt : "A snowy landscape, oil on canvas"
- Model card for the teacher model on HuggingFace, trained by Jonathan Whitaker. He described the model and training procedure on his blog post
- Model card for the student model on HuggingFace, trained by me. You can check my WandB report. This version has 105M parameters, against 1.2B parameters for the teacher version. It is lighter, and allows for faster inference, while maintaining some of the original model capability at generating paintings from prompts.
- Gradio demo app on HuggingFace's Spaces to try out the model with an online demo app
- iPython Notebook to use the model in Python
- WikiArt dataset on
datasets
hub - GitHub repository
You need some dependencies from multiple repositories linked in this repository : CLOOB latent diffusion :
- CLIP
- CLOOB : the model to encode images and texts in an unified latent space, used for conditioning the latent diffusion.
- Latent Diffusion : latent diffusion model definition
- Taming transformers : a pretrained convolutional VQGAN is used as an autoencoder to go from image space to the latent space in which the diffusion is done.
- v-diffusion : contains some functions for sampling using a diffusion model with text and/or image prompts.
An example code to use the model to sample images from a text prompt can be seen in a Colab Notebook, or directly in the app source code for the Gradio demo on this Space
Prompt : "A martian landscape painting, oil on canvas"