4/3/2 bit instructions and repo can be found here.
Since transformers now has LLaMA compatibility, you can skip to the Run
section and continue from there. 4 bit is the best performing but you can also try setting the candidate bits to 2 3 4
or 3 4
to have a mix of precisions.
Make sure you merge your adapter before quantizing!