Quantization

4/3/2 bit instructions and repo can be found here.

Since transformers now has LLaMA compatibility, you can skip to the Run section and continue from there. 4 bit is the best performing but you can also try setting the candidate bits to 2 3 4 or 3 4 to have a mix of precisions.

Make sure you merge your adapter before quantizing!