Algorithm explanation

Basic Idea

Linear Layer:

$Y = W \cdot X$

finetuning is to get the $W'$

$Y = W \cdot X + W' \cdot X$

and normally $shape(W') = shape(W)$

Conventional Methods

Ref

Linear:

$Y_{out \times batch} = W_{out \times in} \cdot X_{in \times batch}$

$\xrightarrow{} Y_{out \times batch} = W_{out \times in} \cdot X_{in \times batch} + Wa_{out \times dim} \cdot Wb_{dim \times in} \cdot X_{in \times batch}$

LoRA for Convolution: Consider im2col of matmul first:

$X:[channel, width, height]$

$\xrightarrow{reorder}[c \times kw \times kh, outw \times outh]$

$Kernels: [out, c, kw, kh] \xrightarrow{reshape} [out, c \times kw \times kh]$

$Conv(X, Kernels) = Kernels \times X \xrightarrow{reshape} [out, outw, outh]$

and then write down this conventional LoRA for conv layer $Conv(in, out, ksize, padding, stride)$

$\xrightarrow{}Conv(dim, out, 1)\circ Conv(in, dim, ksize, padding, stride)$

In this method, we can get that $W' = Wa \cdot Wb$ with $rank(W') \le dim$

Hadamard Product

Ref

consider $W' = Wa \odot Wb$, we can get $rank(W') \le rank(Wa) \times rank(Wb)$. And then we use conventional method on $Wa$ and $Wb$. Which means it can use 2x dim to get square rank.

Rank != Information capacity, but they are relative

based on the experiment result from the paper, it seems like although the rank(Wa) * rank(Wb) is just upper bound, but almost everytime it will produce dW with rank = rank(Wa)*rank(Wb).

Why custom backward

with $dW = (Wa_1 \cdot Wa_2) \odot (Wb_1 \cdot Wb_2)$, when you need to calc the backpropogation, you will need $\Delta{dW}$ and $Wa$ to calc $\Delta{Wb}$, also $Wb$ for $\Delta{Wa}$.

With pytorch's autograd, this kind of operation will cache the $Wa$ and $Wb$ for calc the backward, which means it will cache 2x size of weight for backward.

To avoid this terrible situation, I impl a custom backward which will reconstruct $Wa$ and $Wb$ when actually needed, this method saved tons of memory.

Special method for convolution kernels

Todo...

Sparse Bias

Todo...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Algo.md

Algo.md

Algorithm explanation

Basic Idea

Conventional Methods

Hadamard Product

Why custom backward

Special method for convolution kernels

Sparse Bias

Files

Algo.md

Latest commit

History

Algo.md

File metadata and controls

Algorithm explanation

Basic Idea

Conventional Methods

Hadamard Product

Why custom backward

Special method for convolution kernels

Sparse Bias