You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
"basic" img2img "using a gaussian-diffusion denoising mechanism as first proposed by SDEdit"
uses a forward pass gaussian noise nice property to "1-jump" directly to full gaussian-noise-encoded latents, then decode gradually into the "new" image (see here for short summary)
@bloc97 made a link to Google's Imagic that combines (if I understood well this comment) latents inversion with the inversion of prompt embeddings and a specific model "fine-tuned on the inverted embeddings" that help reconstruct the image better...
Special section about Justin Pinkney's image variations model "fine-tuned from CompVis/stable-diffusion-v1-3-original to accept CLIP image embedding rather than text embeddings" (model card)[https://huggingface.co/lambdalabs/stable-diffusion-image-conditioned]
linked ?: ref to high cfg scale fix in bloc97/CrossAttentionControl@7482fb2 ("High CFG values do not work well unless using the provided finite difference gradient descent method included in the notebook that corrects for high CFG")
DreamArtist extension (COntrastive prompt tuning - formerly ADVANCED PROMPT TUNING) for Textual Inversion embedding training (down to only 1 image training TI) :
control nets being fine tuned from original model, only "work" for this very specific model (i.e. SD 1.5 for C.Nets available on hugginface) (and possibly close variants) however it is still possible to use it with further variants of SD 1.5 following a procedure described in : [Experiment] Transfer Control to Other SD1.X Models lllyasviel/ControlNet#12
T2I-Adapter : "simple and small (~70M parameters, ~300M storage space) network that can provide extra guidance to pre-trained text-to-image models while freezing the original large text-to-image models"
IMAGIC - "complex (e.g., non-rigid) text-guided semantic edits to a single real image" : combines (if I understood well this How to make image inversion more precise? bloc97/CrossAttentionControl#20 (comment)) latents inversion with the inversion of prompt embeddings and a specific model "fine-tuned on the inverted embeddings" that help reconstruct the image better...
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Precious newsfeed : https://rentry.org/sdupdates3
img2img
"basic" img2img "using a gaussian-diffusion denoising mechanism as first proposed by SDEdit"
img2img Variations
img2img alternative script - identical to bare img2img but using other noise-encoding samplers that CAN'T 1-jump so multi-step encoding is nesserary.
Prompt manipulation (i.e. prompt-to-prompt but NO Cross Attention Control (see unimplemented section for further links on C.A.C.)
Prompt editing
Alternating words
Composable diffusion
Google's Prompt-to-prompt with Cross Attention Control
Instruct-pix-to-pix
Attention manipulation
Latents manipulation
Operations on latents, conditionings and sigmas mid-sampling by @dfaker - merged PR Add mid-kdiffusion cfgdenoiser script callback - access latents, conditionings and sigmas mid-sampling AUTOMATIC1111/stable-diffusion-webui#4021
Latent upscaling
Scale Latent
for improved sharpness, details and color science. AUTOMATIC1111/stable-diffusion-webui#2668Image blending / Latent Interpolation / Multi-image prompt :
Special section about Justin Pinkney's image variations model "fine-tuned from CompVis/stable-diffusion-v1-3-original to accept CLIP image embedding rather than text embeddings" (model card)[https://huggingface.co/lambdalabs/stable-diffusion-image-conditioned]
A1111 extension for image variation via finetuned model, similar to justin pinkney's Image Variation PoC, incoming ! Add cond and uncond hidden states to CFGDenoiserParams AUTOMATIC1111/stable-diffusion-webui#8064
SD Remix using SD2.1 unCLIP : https://github.com/unishift/stable-diffusion-remix
CFG manipulation
Inpainting
Outpainting
Artistic img2img
Partly implemented
WIP
CLIP guidance
MagicMix: Semantic Mixing with Diffusion Models : https://magicmix.github.io/
Paint with words SD (similar to Nvidia's eDiffi functionality)
Dynamic thresholding - better images at high cfg
Update and rescale CFG denoising scale
Noise & Seed
Seed combination : [Feature Request]: Stacking or mixing variation seeds to refine a result AUTOMATIC1111/stable-diffusion-webui#3745
Noise scaling : variable-scale noise and noise operations AUTOMATIC1111/stable-diffusion-webui#2163
Other noise is possible and can be combined => an example with perlin noise :
Latent perturbation : [Feature Request]: Latent Perturbation AUTOMATIC1111/stable-diffusion-webui#4164
Samplers
Schedulers
Fine-tuning methods
Can be combined to enhance results :
Dreambooth * Aesthetic Gradient : Using Aesthetic Images Embeddings to improve Dreambooth or TI results AUTOMATIC1111/stable-diffusion-webui#3350
HyperNetwork * Aesthetic Gradient : Hypernetwork Style Training, a tiny guide AUTOMATIC1111/stable-diffusion-webui#2670
Hypernetwork * TI : interesting question
Textual Inversion (word embedding optimization) - style/object/person integration
Faster Textual Inversion via specific "TI-model" : SD Leap Booster
Hard Prompts made easy :
Text prompt inversion from 1-n images
Dreambooth - whole model fine tuning
Aesthetics gradient - style refining
HyperNetworks (NAI-version) - (mostly) style transfert (?)
multi-concepts partial fine-tuning (75MB) (Adobe Research)
Low Rank Adaptation (LoRA) : dreambooth-like results but <5MB
Control Nets : Conditionning on anything through fine-tuned "helper" models
T2I-Adapter : "simple and small (~70M parameters, ~300M storage space) network that can provide extra guidance to pre-trained text-to-image models while freezing the original large text-to-image models"
Whole model fine-tuning :
Tuning encoder : single image fine tuning : https://tuning-encoder.github.io/
Pre/Post-processors
Not implemented - to my knowledge
"Reduce, Reuse, Recycle: Compositional Generation with Energy-Based Diffusion Models and MCMC "
Cycle diffusion
IMAGIC - "complex (e.g., non-rigid) text-guided semantic edits to a single real image" : combines (if I understood well this How to make image inversion more precise? bloc97/CrossAttentionControl#20 (comment)) latents inversion with the inversion of prompt embeddings and a specific model "fine-tuned on the inverted embeddings" that help reconstruct the image better...
Diffusion CLIP
Training-Free Structured Diffusion Guidance (TFSDG)
LPIPS guidance (Learned Perceptual Image Patch Similarity)
StyleCLIP - based on GAN but 3 methods can be drawn from it (some may already be present in A1111, IDK) :
Image segmentation for inpainting
faster incremental inpainting : "get 3x to 7.5x faster inpainting with this one weird trick" AUTOMATIC1111/stable-diffusion-webui#4266
patch-batch init mode for outpainting / inpainting : [Feature Request]: PatchMatch init mode for inpainting / outpainting AUTOMATIC1111/stable-diffusion-webui#4681
fourier-shaped noise INpainting (similar to mk2 outpainting but for inpainting) : [Feature Request]: fourier-shaped noise IN-painting ? (mk2 inpainting) AUTOMATIC1111/stable-diffusion-webui#4739
Who knows ?
Depth-map and transparent background
DiffEdit: Diffusion-based semantic image editing with mask guidance
Pose transfer
Hand-fixer ? Brainstorming: ideas on how to better control subjects and contexts AUTOMATIC1111/stable-diffusion-webui#3615 (comment)
Crazy ideas ?
Best (subjective) Competing models
OpenAI
NVIDIA
Midjourney (MJ)
Google's text-2-image hype models :
BlueWillow (free)
DeepFloyd (highly anticipated / mega hype) - a Stability AI team (Originating from ShonenkovAI, linked to RuDALLE-e)
Meta
- I-JEPA (June 13, 2023) : https://github.com/facebookresearch/ijepa
- CM3leon (July 14, 2023) :
- Emu (September 27, 2023) : https://ai.meta.com/research/publications/emu-enhancing-image-generation-models-using-photogenic-needles-in-a-haystack/
Feel free to add what is missing and / correct the list if necessary
Kind of related but not really :
text-to-3D :
DreamFields :
DreamFusion (Google AI) :
Point-E (OpenAI) : https://github.com/openai/point-e
Magic3D (NVIDIA) : https://research.nvidia.com/labs/dir/magic3d/
3DFuse "Let 2D Diffusion Model Know 3D-Consistency
for Robust Text-to-3D Generation" : https://ku-cvlab.github.io/3DFuse/
text-to-4D (3D coherent video) :
text-to-video (2D animation) :
AI video inpainting :
- Video object removal : https://runwayml.com/inpainting/
Video to Video / Video Editing :
Image-driven video editing : "paint on video"
text/image-driven video editing :
Beta Was this translation helpful? Give feedback.
All reactions