You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This repo includes the official PyTorch implementation of DiffusionCLIP, Text-Guided Diffusion Models for Robust Image Manipulation. DiffusionCLIP resolves the critical issues in zero-shot manipulation with the following contributions.
We revealed that diffusion model is well suited for image manipulation thanks to its nearly perfect inversion capability, which is an important advantage over GAN-based models and hadn't been analyzed in depth before our detailed comparison.
Our novel sampling strategies for fine-tuning can preserve perfect reconstruction at increased speed.
In terms of empirical results, our method enables accurate in- and out-of-domain manipulation, minimizes unintended changes, and significantly outperformes SOTA baselines.
Our method takes another step towards general application by manipulating images from a widely varying ImageNet dataset.
Finally, our zero-shot translation between unseen domains and multi-attribute transfer can effectively reduce manual intervention.
The training process is illustrated in the following figure. Once the diffusion model is fine-tuned, any image from the pretrained domain can be manipulated into the corresponding to the target text without re-training:
Have you guys seen this?
This repo includes the official PyTorch implementation of DiffusionCLIP, Text-Guided Diffusion Models for Robust Image Manipulation. DiffusionCLIP resolves the critical issues in zero-shot manipulation with the following contributions.
We revealed that diffusion model is well suited for image manipulation thanks to its nearly perfect inversion capability, which is an important advantage over GAN-based models and hadn't been analyzed in depth before our detailed comparison.
Our novel sampling strategies for fine-tuning can preserve perfect reconstruction at increased speed.
In terms of empirical results, our method enables accurate in- and out-of-domain manipulation, minimizes unintended changes, and significantly outperformes SOTA baselines.
Our method takes another step towards general application by manipulating images from a widely varying ImageNet dataset.
Finally, our zero-shot translation between unseen domains and multi-attribute transfer can effectively reduce manual intervention.
The training process is illustrated in the following figure. Once the diffusion model is fine-tuned, any image from the pretrained domain can be manipulated into the corresponding to the target text without re-training:
https://github.com/gwang-kim/DiffusionCLIP
The text was updated successfully, but these errors were encountered: