v0.11.0: New PEFT methods BOFT, VeRA, PiSSA, quantization with HQQ and EETQ, and more
Highlights
New methods
BOFT
Thanks to @yfeng95, @Zeju1997, and @YuliangXiu, PEFT was extended with BOFT: Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization (#1326, BOFT paper link). In PEFT v0.7.0, we already added OFT, but BOFT is even more parameter efficient. Check out the included BOFT controlnet and BOFT dreambooth examples.
VeRA
If the parameter reduction of LoRA is not enough for your use case, you should take a close look at VeRA: Vector-based Random Matrix Adaptation (#1564, VeRA paper link). This method resembles LoRA but adds two learnable scaling vectors to the two LoRA weight matrices. However, the LoRA weights themselves are shared across all layers, considerably reducing the number of trainable parameters.
The bulk of this PR was implemented by contributor @vvvm23 with the help of @dkopi.
PiSSA
PiSSA, Principal Singular values and Singular vectors Adaptation, is a new initialization method for LoRA, which was added by @fxmeng (#1626, PiSSA paper link). The improved initialization promises to speed up convergence and improve the final performance of LoRA models. When using models quantized with bitsandbytes, PiSSA initialization should reduce the quantization error, similar to LoftQ.
Quantization
HQQ
Thanks to @fahadh4ilyas, PEFT LoRA linear layers now support Half-Quadratic Quantization, HQQ (#1618, HQQ repo). HQQ is fast and efficient (down to 2 bits), while not requiring calibration data.
EETQ
Another new quantization method supported in PEFT is Easy & Efficient Quantization for Transformers, EETQ (#1675, EETQ repo). This 8 bit quantization method works for LoRA linear layers and should be faster than bitsandbytes.
Show adapter layer and model status
We added a feature to show adapter layer and model status of PEFT models in #1663. With the newly added methods, you can easily check what adapters exist on your model, whether gradients are active, whether they are enabled, which ones are active or merged. You will also be informed if irregularities have been detected.
To use this new feature, call model.get_layer_status()
for layer-level information, and model.get_model_status()
for model-level information. For more details, check out our docs on layer and model status.
Changes
Edge case of how we deal with modules_to_save
We had the issue that when we were using classes such as PeftModelForSequenceClassification, we implicitly added the classifier layers to model.modules_to_save
. However, this would only add a new ModulesToSaveWrapper
instance for the first adapter being initialized. When initializing a 2nd adapter via model.add_adapter
, this information was ignored. Now, peft_config.modules_to_save
is updated explicitly to add the classifier layers (#1615). This is a departure from how this worked previously, but it reflects the intended behavior better.
Furthermore, when merging together multiple LoRA adapters using model.add_weighted_adapter
, if these adapters had modules_to_save
, the original parameters of these modules would be used. This is unexpected and will most likely result in bad outputs. As there is no clear way to merge these modules, we decided to raise an error in this case (#1615).
What's Changed
- Bump version to 0.10.1.dev0 by @BenjaminBossan in #1578
- FIX Minor issues in docs, re-raising exception by @BenjaminBossan in #1581
- FIX / Docs: Fix doc link for layer replication by @younesbelkada in #1582
- DOC: Short section on using transformers pipeline by @BenjaminBossan in #1587
- Extend PeftModel.from_pretrained() to models with disk-offloaded modules by @blbadger in #1431
- [feat] Add
lru_cache
toimport_utils
calls that did not previously have it by @tisles in #1584 - fix deepspeed zero3+prompt tuning bug. word_embeddings.weight shape i… by @sywangyi in #1591
- MNT: Update GH bug report template by @BenjaminBossan in #1600
- fix the torch_dtype and quant_storage_dtype by @pacman100 in #1614
- FIX In the image classification example, Change the model to the LoRA… by @changhwa in #1624
- Remove duplicated import by @nzw0301 in #1622
- FIX: bnb config wrong argument names by @BenjaminBossan in #1603
- FIX Make DoRA work with Conv1D layers by @BenjaminBossan in #1588
- FIX: Send results to correct channel by @younesbelkada in #1628
- FEAT: Allow ignoring mismatched sizes when loading by @BenjaminBossan in #1620
- itemsize is torch>=2.1, use element_size() by @winglian in #1630
- FIX Multiple adapters and modules_to_save by @BenjaminBossan in #1615
- FIX Correctly call element_size by @BenjaminBossan in #1635
- fix: allow load_adapter to use different device by @yhZhai in #1631
- Adalora deepspeed by @sywangyi in #1625
- Adding BOFT: Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization by @yfeng95 in #1326
- Don't use deprecated
Repository
anymore by @Wauplin in #1641 - FIX Errors in the transformers integration docs by @BenjaminBossan in #1629
- update figure assets of BOFT by @YuliangXiu in #1642
- print_trainable_parameters - format
%
to be sensible by @stas00 in #1648 - FIX: Bug with handling of active adapters by @BenjaminBossan in #1659
- Remove
dreambooth
Git link by @charliermarsh in #1660 - add safetensor load in multitask_prompt_tuning by @sywangyi in #1662
- Adds Vera (Vector Based Random Matrix Adaption) #2 by @BenjaminBossan in #1564
- Update deepspeed.md by @sanghyuk-choi in #1679
- ENH: Add multi-backend tests for bnb by @younesbelkada in #1667
- FIX / Workflow: Fix Mac-OS CI issues by @younesbelkada in #1680
- FIX Use trl version of tiny random llama by @BenjaminBossan in #1681
- FIX: Don't eagerly import bnb for LoftQ by @BenjaminBossan in #1683
- FEAT: Add EETQ support in PEFT by @younesbelkada in #1675
- FIX / Workflow: Always notify on slack for docker image workflows by @younesbelkada in #1682
- FIX: upgrade autoawq to latest version by @younesbelkada in #1684
- FIX: Initialize DoRA weights in float32 if float16 is being used by @BenjaminBossan in #1653
- fix bf16 model type issue for ia3 by @sywangyi in #1634
- FIX Issues with AdaLora initialization by @BenjaminBossan in #1652
- FEAT Show adapter layer and model status by @BenjaminBossan in #1663
- Fixing the example by providing correct tokenized seq length by @jpodivin in #1686
- TST: Skiping AWQ tests for now .. by @younesbelkada in #1690
- Add LayerNorm tuning model by @DTennant in #1301
- FIX Use different doc builder docker image by @BenjaminBossan in #1697
- Set experimental dynamo config for compile tests by @BenjaminBossan in #1698
- fix the fsdp peft autowrap policy by @pacman100 in #1694
- Add LoRA support to HQQ Quantization by @fahadh4ilyas in #1618
- FEAT Helper to check if a model is a PEFT model by @BenjaminBossan in #1713
- support Cambricon MLUs device by @huismiling in #1687
- Some small cleanups in docstrings, copyright note by @BenjaminBossan in #1714
- Fix docs typo by @NielsRogge in #1719
- revise run_peft_multigpu.sh by @abzb1 in #1722
- Workflow: Add slack messages workflow by @younesbelkada in #1723
- DOC Document the PEFT checkpoint format by @BenjaminBossan in #1717
- FIX Allow DoRA init on CPU when using BNB by @BenjaminBossan in #1724
- Adding PiSSA as an optional initialization method of LoRA by @fxmeng in #1626
New Contributors
- @tisles made their first contribution in #1584
- @changhwa made their first contribution in #1624
- @yhZhai made their first contribution in #1631
- @yfeng95 made their first contribution in #1326
- @YuliangXiu made their first contribution in #1642
- @charliermarsh made their first contribution in #1660
- @sanghyuk-choi made their first contribution in #1679
- @jpodivin made their first contribution in #1686
- @DTennant made their first contribution in #1301
- @fahadh4ilyas made their first contribution in #1618
- @huismiling made their first contribution in #1687
- @NielsRogge made their first contribution in #1719
- @abzb1 made their first contribution in #1722
- @fxmeng made their first contribution in #1626
Full Changelog: v0.10.0...v0.11.0