Skip to content

Latest commit

 

History

History
6 lines (4 loc) · 461 Bytes

quantization.md

File metadata and controls

6 lines (4 loc) · 461 Bytes

Quantization

4/3/2 bit instructions and repo can be found here.

Since transformers now has LLaMA compatibility, you can skip to the Run section and continue from there. 4 bit is the best performing but you can also try setting the candidate bits to 2 3 4 or 3 4 to have a mix of precisions.

Make sure you merge your adapter before quantizing!