All about SD and other image generator AI #2

Ehplodor started this conversation in General

Ehplodor
Nov 17, 2023
Maintainer

Precious newsfeed : https://rentry.org/sdupdates3

img2img

"basic" img2img "using a gaussian-diffusion denoising mechanism as first proposed by SDEdit"
- uses a forward pass gaussian noise nice property to "1-jump" directly to full gaussian-noise-encoded latents, then decode gradually into the "new" image (see here for short summary)
img2img Variations
- A bit of history : Variations are not working properly AUTOMATIC1111/stable-diffusion-webui#305
- other implementations :
  - https://github.com/justinpinkney/stable-diffusion
img2img alternative script - identical to bare img2img but using other noise-encoding samplers that CAN'T 1-jump so multi-step encoding is nesserary.
- personal short study
- Why multi-step noise encoding ? answer
- Example of inpainting vs img2img alt script (and original)
- A bit of history :
  - https://www.reddit.com/r/StableDiffusion/comments/xapbn8/comment/inv5cdg/ (so this was linked to prompt2prompt initially at least (see history of p2p)
  - https://www.reddit.com/r/StableDiffusion/comments/xboy90/a_better_way_of_doing_img2img_by_finding_the/
  - [Feature Request] A better (?) way of doing img2img by finding the noise which reconstructs the original image AUTOMATIC1111/stable-diffusion-webui#288
  - amazing modification to k_euler resynthesizing original image in img2img! AUTOMATIC1111/stable-diffusion-webui#291
  - alternative img2img with DDIM sampler is broken AUTOMATIC1111/stable-diffusion-webui#561 (report of broken DDIM denoising after Euler inverse sampling)
  - negative encode CFG ? link1, link2
  - normalization (divide by sigmas)
  - some difficulties reported here
- inverse samplers - to find latent noise from image
  - inverse K-Euler (all links above reference this "inverted" euler sampling)
  - inverse DDIM : "NEW"(commit in bloc97 repo from 13 Oct. 2022) - NOT implemented YET in A1111 ([Feature Request]: new inverse DDIM / DPM fast / DPM adaptative samplers available for img2img alternative AUTOMATIC1111/stable-diffusion-webui#4213)
  - reverse DPM-fast / DPM-adaptative - NOT implemented yet in A1111 : crowsonkb/k-diffusion@8413eb2
  - bloc97 implementation (based on inverse DDIM -> see above)
    - @bloc97 made a link to Google's Imagic that combines (if I understood well this comment) latents inversion with the inversion of prompt embeddings and a specific model "fine-tuned on the inverted embeddings" that help reconstruct the image better...
- Pivotal inversion x null text optimization :

Prompt manipulation (i.e. prompt-to-prompt but NO Cross Attention Control (see unimplemented section for further links on C.A.C.)

Prompt editing
- syntax : [from:to:when]
- A bit of history :
Alternating words
- syntax : [cow|horse] in a field
Composable diffusion
- Multi cond guidance !WIP! [WIP] Implement multi-cond guidance for Composable Diffusion AUTOMATIC1111/stable-diffusion-webui#1695
  - AUTOMATIC1111/stable-diffusion-webui@c26732f (TY @ClashSAN) - "AND" documentation
  - Multi-Cond Guidance integration AUTOMATIC1111/stable-diffusion-webui#1325
  - originating from "sequential token weighting"
- Negation has been implemented early in A1111 in the form of negative prompt and should be treated as an integral part of composable diffusion according to https://github.com/energy-based-model/Compositional-Visual-Generation-with-Composable-Diffusion-Models-PyTorch news from 10/10/2022
  - [Bug]: Negation Prompts -- AND NOT operator AUTOMATIC1111/stable-diffusion-webui#3747
Google's Prompt-to-prompt with Cross Attention Control
- Google Prompt to Prompt AUTOMATIC1111/stable-diffusion-webui#2884
- Promt-to-Prompt implementation AUTOMATIC1111/stable-diffusion-webui#2863
- Prompt-to-Prompt Image Editing with Cross Attention Control AUTOMATIC1111/stable-diffusion-webui#2725
- Implemented in
  - Diffusers : https://github.com/bloc97/CrossAttentionControl
  - SD : https://github.com/sunwoo76/CrossAttentionControl-stablediffusion
- Other issues / discussions about CAC in A1111
- Optimization available in https://github.com/cccntu/efficient-prompt-to-prompt (TY @matrix4767)
- EDIT : actually, This may be equivalent to A1111's PROMPT EDITING main feature (PR [Feature] Implementation of prompt2prompt by Doggettx AUTOMATIC1111/stable-diffusion-webui#483) BUT it should be verified to allow for the same degree of manipulations as that of google ??? IHDK
Instruct-pix-to-pix
- https://github.com/timothybrooks/instruct-pix2pix
- implemented in webui

Attention manipulation

Attention emphasis
- old syntax : () [] -> how to use
- new syntax : (word:1.05) (word:0.975)
Cross Attention Control : NOT implemented yet (see relevant section down below)
Attend & Excite (not implemented as of 13/02/2023)
Self-Attention Guidance (SAG) - better looking images overall -> Feature request

Latents manipulation

Operations on latents, conditionings and sigmas mid-sampling by @dfaker - merged PR Add mid-kdiffusion cfgdenoiser script callback - access latents, conditionings and sigmas mid-sampling AUTOMATIC1111/stable-diffusion-webui#4021
- Let user perform symmetry / rotation on latents :
  - Symmetry implementation custom script AUTOMATIC1111/stable-diffusion-webui#2441 (link to script)
Latent upscaling
- highres-fix with "scale latent" enabled (on first pass ? don't know what that means exactly...)
  - Feature Request: Restore "Scale Latent" option in Hires Fix AUTOMATIC1111/stable-diffusion-webui#2613
  - Restore Scale Latent for improved sharpness, details and color science. AUTOMATIC1111/stable-diffusion-webui#2668
  - AUTOMATIC1111/stable-diffusion-webui@f7ca639
  - request for new latent vector interpolation methods (beyond "bilinear") : [Feature Request]: improving A1111's highres-fix's "scale latent" AUTOMATIC1111/stable-diffusion-webui#4737
- latent upscaler model
  - [Feature Request]: Latent diffusion upscaler for the Stable Diffusion autoencoder AUTOMATIC1111/stable-diffusion-webui#4446 (including comparison study by @arpitest !)
  - [Feature Request]: @rivershavewings upscaler AUTOMATIC1111/stable-diffusion-webui#4565
- Justin Pinkney's Super Resolution SD model :
  - https://github.com/justinpinkney/stable-diffusion
  - https://huggingface.co/lambdalabs/stable-diffusion-super-res
Image blending / Latent Interpolation / Multi-image prompt :
- https://github.com/lunarring/latentblending
- Interpolate script for A1111 : https://github.com/DiceOwl/StableDiffusionStuff/blob/main/interpolate.py
- [Feature request] Blending two images latents and conditionning script : [Feature Request]: Add ability to merge images ad hoc AUTOMATIC1111/stable-diffusion-webui#4455
- ReMix from Midjourney V4 : So, does anyone working on Remix mode for SD? AUTOMATIC1111/stable-diffusion-webui#4595
- [Feature Request] Multi-image prompt : [Feature Request]: Multi-image prompt AUTOMATIC1111/stable-diffusion-webui#4792
- PFG (Prompt Free Generation) : https://github.com/laksjdjf/pfg-webui (also, some complementary info and links at [Feature Request]: Use image as conditioning AUTOMATIC1111/stable-diffusion-webui#8028 )
Special section about Justin Pinkney's image variations model "fine-tuned from CompVis/stable-diffusion-v1-3-original to accept CLIP image embedding rather than text embeddings" (model card)[https://huggingface.co/lambdalabs/stable-diffusion-image-conditioned]
- https://github.com/justinpinkney/stable-diffusion
- Image Mixer : https://huggingface.co/spaces/lambdalabs/image-mixer-demo
- Image Variations : https://huggingface.co/spaces/lambdalabs/stable-diffusion-image-variations
A1111 extension for image variation via finetuned model, similar to justin pinkney's Image Variation PoC, incoming ! Add cond and uncond hidden states to CFGDenoiserParams AUTOMATIC1111/stable-diffusion-webui#8064
SD Remix using SD2.1 unCLIP : https://github.com/unishift/stable-diffusion-remix

CFG manipulation

CFG scheduler (with scripts) : [Feature Request]: CFG scheduler AUTOMATIC1111/stable-diffusion-webui#4566

Inpainting

img2img inpainting
- good advices for adding new elements : Ongoing Issue with inpaint AUTOMATIC1111/stable-diffusion-webui#4278 (comment)
- tools
  - mask drawing UI
  - prompt2mask custom script
RunwayML Inpainting : SD v1.5 fine-tuned and slightly modified architecture
- first report (to my knowledge) in A1111 : New inpainting checkpoint from RunwayML AUTOMATIC1111/stable-diffusion-webui#3096
- PR Add support for RunwayML In-painting Model AUTOMATIC1111/stable-diffusion-webui#3192
- history : closed PR Support for runwayml In-painting SD model. AUTOMATIC1111/stable-diffusion-webui#3140
Unmasked conditioning image for inpainting model
- PR : Added option to use unmasked conditioning image for inpainting model. AUTOMATIC1111/stable-diffusion-webui#3669
- Very excited with Added option to use unmasked conditioning image for inpainting model. AUTOMATIC1111/stable-diffusion-webui#3778

Outpainting

Poor man's outpainting (script)
MK2 outpainting (fourier-shaped noise init for outpainting)
The "Stuff Trick" : Outpainting 0.3 - thru img2img alt without a custom model, was gonna test it more but looks like i got scooped AUTOMATIC1111/stable-diffusion-webui#3232
Infinite canvas (WIP)
- further links
- Script "AlphaCanvas" : https://github.com/TKoestlerx/sdexperiments / Outpainting in the inpainting Tab AUTOMATIC1111/stable-diffusion-webui#4186
Mat Primer :
- [Feature Request]: MAT Outpaint AUTOMATIC1111/stable-diffusion-webui#7682
- https://huggingface.co/spaces/Rothfeld/stable-diffusion-mat-outpainting-primer

Artistic img2img

Partly implemented

Depth map
- script for depth map generation from a single image using MiDAS model: Depth Maps, Stereo Image, 3D Mesh and Video generator extension AUTOMATIC1111/stable-diffusion-webui#4252
- https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Custom-Scripts#depth-maps
- feature request (WIP?): 2d-to-3d photos by https://github.com/vt-vl-lab/3d-photo-inpainting
  - feature request : Request for new extension call "3D video/gif"? AUTOMATIC1111/stable-diffusion-webui#4268
  - suggested also in Depth Maps, Stereo Image, 3D Mesh and Video generator extension AUTOMATIC1111/stable-diffusion-webui#4252 (comment)

WIP

CLIP guidance
- PR (WIP - very slow and much VRAM needed) : CLIP Guidance PoC AUTOMATIC1111/stable-diffusion-webui#5902
  - CLIP guidance AUTOMATIC1111/stable-diffusion-webui#2865
  - Interesting idea worth implementing: use CLIP guidance to enhance the quality and coherency of images AUTOMATIC1111/stable-diffusion-webui#2738
- See also :
  - implemented in dreamstudio : https://discordapp.com/channels/1002292111942635562/1010300608575709234/1032385496963354725
  - CLIP guided Stable Diffusion AUTOMATIC1111/stable-diffusion-webui#735 (further links here)
  - Implemented in Birch-san/stable-diffusion@34556bc...1f13594
    - tweet : https://twitter.com/Birchlabs/status/1578141960249876482
  - How CLIP guided sampling works crowsonkb/k-diffusion#20
MagicMix: Semantic Mixing with Diffusion Models : https://magicmix.github.io/
- re-implementations :
  - https://github.com/cloneofsimo/magicmix (found in https://rentry.org/sdupdates2#newsfeed of 11/3)
  - https://github.com/mpaepper/stablediffusion_magicmix
- feature request (with WIP script) : [Feature Request]: Magicmix support AUTOMATIC1111/stable-diffusion-webui#4538
Paint with words SD (similar to Nvidia's eDiffi functionality)
- [Feature Request]: Paint with words AUTOMATIC1111/stable-diffusion-webui#4406
Dynamic thresholding - better images at high cfg
- Try dynamic thresholding AUTOMATIC1111/stable-diffusion-webui#3962
- [Feature Request]: Dynamic colors range fix at high cfg scale through latent thresholding at every step AUTOMATIC1111/stable-diffusion-webui#3268
- linked ?: ref to high cfg scale fix in bloc97/CrossAttentionControl@7482fb2 ("High CFG values do not work well unless using the provided finite difference gradient descent method included in the notebook that corrects for high CFG")
- another implementation of latent threshold : Added perlin noise and thresholding feature invoke-ai/InvokeAI#395
- extension for [Dynamic thresholding] : https://github.com/mcmonkeyprojects/sd-dynamic-thresholding
Update and rescale CFG denoising scale
- Update and rescale CFG denoising step AUTOMATIC1111/stable-diffusion-webui#3738
Noise & Seed
- Seed combination : [Feature Request]: Stacking or mixing variation seeds to refine a result AUTOMATIC1111/stable-diffusion-webui#3745
- Noise scaling : variable-scale noise and noise operations AUTOMATIC1111/stable-diffusion-webui#2163
  - hacky code for noise saving by @timntorres : https://github.com/timntorres/stable-diffusion-webui/tree/hacky-noise-saving-for-Ehplodor
- Other noise is possible and can be combined => an example with perlin noise :
  - Added perlin noise and thresholding feature invoke-ai/InvokeAI#395
  - https://github.com/Sygil-Dev/sygil-webui/blob/master/scripts/perlin.py
- Latent perturbation : [Feature Request]: Latent Perturbation AUTOMATIC1111/stable-diffusion-webui#4164
Samplers
- listed in What are all the different samplers AUTOMATIC1111/stable-diffusion-webui#4384 (comment)
- UniPC (seems the best at low step count) : Implement UniPC sampler AUTOMATIC1111/stable-diffusion-webui#7710
- Two new samplers from NovelAI : smea and smea-dyn
  - Emulate NovelAI AUTOMATIC1111/stable-diffusion-webui#2017 (comment)
  - https://blog.novelai.net/introducing-nai-smea-higher-image-generation-resolutions-9b0034ffdc4b
  - based on Euler-A but sinusoïdal scheduler combined with multi-pass sampling
  - supposedly better composition at higher resolutions
  - but more compute needed...
  - obviously in competition with A1111's highres-fix
  - Feature request Sinusoidal Multipass for samplers crowsonkb/k-diffusion#55
Schedulers
- Alternate sampler noise schedules (TY @dfaker for this great script)
- related issue : DPM2 ancestral produces odd nosiy/sharpened output during final step AUTOMATIC1111/stable-diffusion-webui#1435
- related PR : Update K-Diffusion and include noise scheduler script AUTOMATIC1111/stable-diffusion-webui#1560
- modified karras scheduler (unimplemented) : Fix Karras scheduler doesn't start from the actual max to min AUTOMATIC1111/stable-diffusion-webui#4373

Fine-tuning methods

Can be combined to enhance results :

Dreambooth * Aesthetic Gradient : Using Aesthetic Images Embeddings to improve Dreambooth or TI results AUTOMATIC1111/stable-diffusion-webui#3350
HyperNetwork * Aesthetic Gradient : Hypernetwork Style Training, a tiny guide AUTOMATIC1111/stable-diffusion-webui#2670
Hypernetwork * TI : interesting question
Textual Inversion (word embedding optimization) - style/object/person integration
- https://github.com/rinongal/textual_inversion
- Useful discussion : Textual Inversion AUTOMATIC1111/stable-diffusion-webui#1528
- DreamArtist extension (COntrastive prompt tuning - formerly ADVANCED PROMPT TUNING) for Textual Inversion embedding training (down to only 1 image training TI) :
  - propose an Contrastive Prompt Tuning method (DreamArtist), can super dramatically improve the image quality and diversity AUTOMATIC1111/stable-diffusion-webui#2945
  - https://github.com/7eu7d7/DreamArtist-sd-webui-extension
Faster Textual Inversion via specific "TI-model" : SD Leap Booster
Hard Prompts made easy :
Text prompt inversion from 1-n images
- https://github.com/YuxinWenRick/hard-prompts-made-easy
- https://huggingface.co/spaces/tomg-group-umd/pez-dispenser
- A1111 extension: Unprompted : https://github.com/ThereforeGames/unprompted
Dreambooth - whole model fine tuning
- https://github.com/JoePenna/Dreambooth-Stable-Diffusion
- https://github.com/XavierXiao/Dreambooth-Stable-Diffusion
- PRs :
  - Dreambooth AUTOMATIC1111/stable-diffusion-webui#2002
  - replaces PR#2002 : Dreambooth: Ready to go! AUTOMATIC1111/stable-diffusion-webui#3995 (10GB VRAM OR {8GB VRAM + 24GB RAM})
- https://github.com/ShivamShrirao/diffusers/tree/main/examples/dreambooth (9.92 VRAM usage)
Aesthetics gradient - style refining
HyperNetworks (NAI-version) - (mostly) style transfert (?)
multi-concepts partial fine-tuning (75MB) (Adobe Research)
- https://github.com/adobe-research/custom-diffusion
Low Rank Adaptation (LoRA) : dreambooth-like results but <5MB
- https://huggingface.co/blog/lora
- https://github.com/cloneofsimo/lora
Control Nets : Conditionning on anything through fine-tuned "helper" models
- (Updated 02/17) New research: ControlNet - Adding Conditional Control to Text-to-Image Diffusion Models AUTOMATIC1111/stable-diffusion-webui#7732
- Impemented in the "unprompted" extension : ControlNet now available in the WebUI! AUTOMATIC1111/stable-diffusion-webui#7784
- implemented too in : https://github.com/Mikubill/sd-webui-controlnet
- control nets being fine tuned from original model, only "work" for this very specific model (i.e. SD 1.5 for C.Nets available on hugginface) (and possibly close variants) however it is still possible to use it with further variants of SD 1.5 following a procedure described in : [Experiment] Transfer Control to Other SD1.X Models lllyasviel/ControlNet#12
T2I-Adapter : "simple and small (~70M parameters, ~300M storage space) network that can provide extra guidance to pre-trained text-to-image models while freezing the original large text-to-image models"
- https://github.com/TencentARC/T2I-Adapter
Whole model fine-tuning :
- https://github.com/justinpinkney/stable-diffusion
  - Example : https://huggingface.co/spaces/lambdalabs/text-to-pokemon
  - How-to : https://github.com/LambdaLabsML/examples
Tuning encoder : single image fine tuning : https://tuning-encoder.github.io/
- status : code not released
- hype level : very high expectations, judging from the numerous examples and comparisons provided.
Pre/Post-processors
- VAE Variational Auto Encoder
  - VAE selector commit AUTOMATIC1111/stable-diffusion-webui@675b51e
  - VAE selector PR with study VAE Selector AUTOMATIC1111/stable-diffusion-webui#3986
  - VAE "mse" tailored for SD v1.5 (some comparison with/without VAE) https://huggingface.co/stabilityai/sd-vae-ft-mse-original
  - VAE "mse" vs GFPGAN : VAE (vae-ft-mse-840000-ema-pruned.ckpt) Documentation invoke-ai/InvokeAI#1279
- face restoration
  - GFPGAN "neural network that fixes faces"
  - CodeFormer "face restoration tool as an alternative to GFPGAN"
  - DFDNET (not Impl.): [Feature Request]: DFDNet AUTOMATIC1111/stable-diffusion-webui#4317

Not implemented - to my knowledge

"Reduce, Reuse, Recycle: Compositional Generation with Energy-Based Diffusion Models and MCMC "
- https://energy-based-model.github.io/reduce-reuse-recycle/
- https://twitter.com/_akhaliq/status/1628572876092604416?s=20
Cycle diffusion
- Cycle Diffusion. img to img composition in SD AUTOMATIC1111/stable-diffusion-webui#2792
- reference article
IMAGIC - "complex (e.g., non-rigid) text-guided semantic edits to a single real image" : combines (if I understood well this How to make image inversion more precise? bloc97/CrossAttentionControl#20 (comment)) latents inversion with the inversion of prompt embeddings and a specific model "fine-tuned on the inverted embeddings" that help reconstruct the image better...
- first reported in A1111's discussions here : Now this is exciting! Text based real image editing paper! AUTOMATIC1111/stable-diffusion-webui#3070
- seems quite similar to Cycle Diffusion ? img2img alternative too
- SD implementation (>24 GB VRAM) :
  - justinpinkney/stable-diffusion@abdbb0d
  - https://github.com/justinpinkney/stable-diffusion/blob/main/notebooks/imagic.ipynb
- Diffusers implementation (>11 GB VRAM) : (I honestly don't know the difference between diffusers and stable-diffusion) ShivamShrirao/diffusers@d34dcaa
Diffusion CLIP
- Diffusion clip AUTOMATIC1111/stable-diffusion-webui#2485
- Implementation of DiffusionCLIP as an extension AUTOMATIC1111/stable-diffusion-webui#3770
Training-Free Structured Diffusion Guidance (TFSDG)
- implemented in https://github.com/shunk031/training-free-structured-diffusion-guidance
- related tweet
- Implement Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis AUTOMATIC1111/stable-diffusion-webui#1417
- OpenReview - very interesting reading : https://openreview.net/forum?id=PUIqjT4rzq7
LPIPS guidance (Learned Perceptual Image Patch Similarity)
- implemented in https://github.com/aicrumb/doohickey
- Probably useful only for img2img functionalities
StyleCLIP - based on GAN but 3 methods can be drawn from it (some may already be present in A1111, IDK) :
- "Latent vector optimization"
- "Latent mapper, trained to manipulate latent vectors according to a specific text description"
- "Global directions in the StyleSpace"
Image segmentation for inpainting
- Apply Image segmentation to mask a specific area for inpainting alternate to manual masking. AUTOMATIC1111/stable-diffusion-webui#3222
- Segment-Anything (Facebook Research) : https://github.com/facebookresearch/segment-anything
faster incremental inpainting : "get 3x to 7.5x faster inpainting with this one weird trick" AUTOMATIC1111/stable-diffusion-webui#4266
patch-batch init mode for outpainting / inpainting : [Feature Request]: PatchMatch init mode for inpainting / outpainting AUTOMATIC1111/stable-diffusion-webui#4681
fourier-shaped noise INpainting (similar to mk2 outpainting but for inpainting) : [Feature Request]: fourier-shaped noise IN-painting ? (mk2 inpainting) AUTOMATIC1111/stable-diffusion-webui#4739

Who knows ?

Depth-map and transparent background
- monocular depth map estimation is used in scn2img script in Scene-to-Image Prompt Layering System Sygil-Dev/sygil-webui#1179 for layered image generation
- img2img - Depth/segmentation guidance similar to color correction or clip-guidance AUTOMATIC1111/stable-diffusion-webui#1757
- Transparent images AUTOMATIC1111/stable-diffusion-webui#2364
- Image segmentation and background removal - further links and ideas : Transparent images AUTOMATIC1111/stable-diffusion-webui#2364
- deforum's 3D animation uses monocular estimated depthmap for diverse 3D operations
  - A1111 script : https://github.com/deforum-art/deforum-for-automatic1111-webui
- 2D-to-3D workflow : https://www.youtube.com/watch?v=JtRDVmqqn7c
DiffEdit: Diffusion-based semantic image editing with mask guidance
- examples : https://twitter.com/bigblueboo/status/1585761916718383110
Pose transfer
- https://github.com/yoyo-nb/Thin-Plate-Spline-Motion-Model
- [Feature Request]: img2img pose transfer AUTOMATIC1111/stable-diffusion-webui#4219
Hand-fixer ? Brainstorming: ideas on how to better control subjects and contexts AUTOMATIC1111/stable-diffusion-webui#3615 (comment)
Crazy ideas ?
- Alternating models :
  - [Feature Request]: give a way to switch a model mid-prompt AUTOMATIC1111/stable-diffusion-webui#4468
  - Actually that's possible ! Multi-UNet Cond. Guidance by @brkirch : https://twitter.com/Birchlabs/status/1599208472481972225?s=20

Best (subjective) Competing models

OpenAI
- Dall-E 2 : https://openai.com/dall-e-2/
- Dall-E 3 :
NVIDIA
- DeepImagination https://deepimagination.cc/eDiffi/
Midjourney (MJ)
- https://www.midjourney.com/
- current version = V4 ; generation via discord
Google's text-2-image hype models :
- IMAGEN : https://imagen.research.google/
- PARTI : https://parti.research.google/
- Muse : https://muse-model.github.io/
BlueWillow (free)
- https://www.bluewillow.ai/
- current status : beta ; generation via discord
DeepFloyd (highly anticipated / mega hype) - a Stability AI team (Originating from ShonenkovAI, linked to RuDALLE-e)
- IF model : https://huggingface.co/DeepFloyd (yes it is empty...)

Meta
- I-JEPA (June 13, 2023) : https://github.com/facebookresearch/ijepa
- CM3leon (July 14, 2023) :
- Emu (September 27, 2023) : https://ai.meta.com/research/publications/emu-enhancing-image-generation-models-using-photogenic-needles-in-a-haystack/

SD models and fine-tunes / embeddings repositories
- CivitAI : https://civitai.com/
- HugginFace : https://huggingface.co/models?pipeline_tag=text-to-image

Feel free to add what is missing and / correct the list if necessary

Kind of related but not really :

text-to-3D :
- DreamFields :
- DreamFusion (Google AI) :
- Point-E (OpenAI) : https://github.com/openai/point-e
- Magic3D (NVIDIA) : https://research.nvidia.com/labs/dir/magic3d/
- 3DFuse "Let 2D Diffusion Model Know 3D-Consistency
  for Robust Text-to-3D Generation" : https://ku-cvlab.github.io/3DFuse/
text-to-4D (3D coherent video) :
- Make a Video 3D (Meta AI) : https://make-a-video3d.github.io/
text-to-video (2D animation) :
- Phenaki - long text-driven videos : https://phenaki.video/
- Imagen video (Google AI) : https://imagen.research.google/video/
- Make a video (Meta AI - September 29, 2022 (source)) : https://makeavideo.studio/
- Pika Labs : https://www.pika.art/
- RunwayML Gen-2 : https://research.runwayml.com/gen2

AI video inpainting :
- Video object removal : https://runwayml.com/inpainting/

Video to Video / Video Editing :
- Text2Live : https://text2live.github.io/
- FateZero : https://github.com/ChenyangQiQi/FateZero
- Tune a video : https://github.com/showlab/Tune-A-Video
- RunwayML Gen-1 : https://research.runwayml.com/gen1
Image-driven video editing : "paint on video"
- EbSynth
  - https://ebsynth.com/
  - https://github.com/jamriska/ebsynth
  - https://github.com/s9roll7/ebsynth_utility (A1111 extension integrating ebsynth)
text/image-driven video editing :
- Dreamix : https://dreamix-video-editing.github.io/
- Emu Video (Meta - November 16, 2023) : https://emu-video.metademolab.com/

Replies: 0 comments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment