Fine-Tune model extraction #56
Replies: 10 comments 36 replies
-
@brian6091 showed in his discussion post here that lora trains a subset of the dreambooth parameters whose size is based on lora rank, so I'm not sure the same model could be easily transformed from dreambooth into a lora. Reproducing a dreambooth model by training lora with the same dataset and similar parameters seems effective so far. |
Beta Was this translation helpful? Give feedback.
-
Hmm. This might actually work. We might low-rank approximate the "difference" of a model and start from there. Not sure how this would work on "heavily fine-tuned models". |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
ok the reason i ask for the code snippet for SVD test is: i'm a career marketer and it's been 20 years since college-level maths, so this is slow learning for something that's peripheral to my objective (which is performant custom art tools)... and the other reason is, it seems like this SVD method is required beyond approximating existing fine-tunes, because...
EDIT: i had a bunch of confusing questions and a code sample here that went the wrong direction, which would not contribute value to future readers. so i'm replacing it with net learnings from my further tinkering, and cloneofsimo's replies below:
|
Beta Was this translation helpful? Give feedback.
-
Just merged a PR #98 that allow you to do this with CLIs! |
Beta Was this translation helpful? Give feedback.
-
The CLI file that does the difference decomposition is cli_svd.py on the develop branch. I'm not sure if there's anything for mixing multiple decompisitions together. |
Beta Was this translation helpful? Give feedback.
-
Sharing some svd distilled result of prompthero/openjourney Lora safetensors are created with: I used scale 1.0 for both unet / text encoder in distilled model inference. |
Beta Was this translation helpful? Give feedback.
-
I am utilizing an extension of this SVD-distillation approach in a paper I am soon releasing. Do we know who pioneered SVD distillation and how can I cite this ? Do I just point to this thread? |
Beta Was this translation helpful? Give feedback.
-
loving this work!
so, wondering, hoping folks smarter than i can weigh in: the concept of LoRA is that it's wrapping/replacing some layers in the model, and basically applying some extra weights of its own to the existing params/state dict, right? W + delta-W from the front page... so basically, existing model + LoRA method with weights = new model of the same effective structure?
so my question is, and i know i'm oversimplifying, can we not "extract" the differences between fine-tuned models and the starting points, save that difference as a LoRA? the "training" process would not need images or the text/vision processing, just 2 sets of checkpoints and some loops to get the weights right, such that original model + LoRA = fine-tuned model. i.e., plug in SD1.4 as original, @nitrosocke's (flippin awesome btw) nitro-diffusion as the target, run the maths, and then you have a LoRA compression representing all the learnings of arcane, mo-di, archer, but in a few MBs. do the same for your other favorite fine-tunes, mix/match from there.
if this works it would also save the facehuggers all those PBs of storage being taken up by everyone and their dog's fine-tune models (so many, just on butterflies alone)...
Beta Was this translation helpful? Give feedback.
All reactions