Releases: huggingface/peft
Version 0.14.0: EVA, Context-aware Prompt Tuning, Bone, and more
Highlights
New Methods
Context-aware Prompt Tuning
@tsachiblau added a new soft prompt method called Context-aware Prompt Tuning (CPT) which is a combination of In-Context Learning and Prompt Tuning in the sense that, for each training sample, it builds a learnable context from training examples in addition to the single training sample. Allows for sample- and parameter-efficient few-shot classification and addresses recency-bias.
Explained Variance Adaptation
@sirluk contributed a new LoRA initialization method called Explained Variance Adaptation (EVA). Instead of randomly initializing LoRA weights, this method uses SVD of the base layer weights to initialize the LoRA weights and is also able to re-allocate the ranks of the adapter based on the explained variance ratio (derived from SVD). Thus, this initialization method can yield better initial values and better rank distribution.
Bone
@JL-er added an implementation for Block Affine (Bone) Adaptation which utilizes presumed sparsity in the base layer weights to divide them into multiple sub-spaces that share a single low-rank matrix for updates. Compared to LoRA, Bone has the potential to significantly reduce memory usage and achieve faster computation.
Enhancements
PEFT now supports LoRAs for int8
torchao quantized models (check this and this notebook) . In addition, VeRA can now be used with 4 and 8 bit bitsandbytes quantization thanks to @ZiadHelal.
Hot-swapping of LoRA adapters is now possible using the hotswap_adapter
function. Now you are able to load one LoRA and replace its weights in-place with the LoRA weights of another adapter which, in general, should be faster than deleting one adapter and loading the other adapter in its place. The feature is built so that no re-compilation of the model is necessary if torch.compile
was called on the model (right now, this requires ranks and alphas to be the same for the adapters).
LoRA and IA³ now support Conv3d
layers thanks to @jsilter, and @JINO-ROHIT added a notebook showcasing PEFT model evaluation using lm-eval-harness toolkit.
With the target_modules
argument, you can specify which layers to target with the adapter (e.g. LoRA). Now you can also specify which modules not to target by using the exclude_modules
parameter (thanks @JINO-ROHIT).
Changes
- There have been made several fixes to the OFT implementation, among other things, to fix merging, which makes adapter weights trained with PEFT versions prior to this release incompatible (see #1996 for details).
- Adapter configs are now forward-compatible by accepting unknown keys.
- Prefix tuning was fitted to the
DynamicCache
caching infrastructure of transformers (see #2096). If you are using this PEFT version and a recent version of transformers with an old prefix tuning checkpoint, you should double check that it still works correctly and retrain it if it doesn't. - Added
lora_bias
parameter to LoRA layers to enable bias on LoRA B matrix. This is useful when extracting LoRA weights from fully fine-tuned parameters with bias vectors so that these can be taken into account. - #2180 provided a couple of bug fixes to LoKr (thanks @yaswanth19). If you're using LoKr, your old checkpoints should still work but it's recommended to retrain your adapter.
from_pretrained
now warns the user if PEFT keys are missing.- Attribute access to modules in
modules_to_save
is now properly and transparently handled. - PEFT supports the changes to bitsandbytes 8bit quantization from the recent v0.45.0 release. To benefit from these improvements, we thus recommend to upgrade bitsandbytes if you're using QLoRA. Expect slight numerical differences in model outputs if you're using QLoRA with 8bit bitsandbytes quantization.
What's Changed
- Bump version to 0.13.1.dev0 by @BenjaminBossan in #2094
- Support Conv3d layer in LoRA and IA3 by @jsilter in #2082
- Fix Inconsistent Missing Keys Warning for Adapter Weights in PEFT by @yaswanth19 in #2084
- FIX: Change check if past_key_values is empty by @BenjaminBossan in #2106
- Update install.md by @Salehbigdeli in #2110
- Update OFT to fix merge bugs by @Zeju1997 in #1996
- ENH: Improved attribute access for modules_to_save by @BenjaminBossan in #2117
- FIX low_cpu_mem_usage consolidates devices by @BenjaminBossan in #2113
- TST Mark flaky X-LoRA test as xfail by @BenjaminBossan in #2114
- ENH: Warn when from_pretrained misses PEFT keys by @BenjaminBossan in #2118
- FEAT: Adding exclude modules param(#2044) by @JINO-ROHIT in #2102
- fix merging bug / update boft conv2d scaling variable by @Zeju1997 in #2127
- FEAT: Support quantization for VeRA using bitsandbytes (#2070) by @ZiadHelal in #2076
- Bump version to 0.13.2.dev0 by @BenjaminBossan in #2137
- FEAT: Support torchao by @BenjaminBossan in #2062
- FIX: Transpose weight matrix based on fan_in_fan_out condition in PiSSA initialization (#2103) by @suyang160 in #2104
- FIX Type annoations in vera/bnb.py by @BenjaminBossan in #2139
- ENH Make PEFT configs forward compatible by @BenjaminBossan in #2038
- FIX Raise an error when performing mixed adapter inference and passing non-existing adapter names by @BenjaminBossan in #2090
- FIX Prompt learning with latest transformers error by @BenjaminBossan in #2140
- adding peft lora example notebook for ner by @JINO-ROHIT in #2126
- FIX TST: NaN issue with HQQ GPU test by @BenjaminBossan in #2143
- FIX: Bug in target module optimization if child module name is suffix of parent module name by @BenjaminBossan in #2144
- Bump version to 0.13.2.dev0 by @BenjaminBossan in #2145
- FIX Don't assume past_key_valus for encoder models by @BenjaminBossan in #2149
- Use
SFTConfig
instead ofSFTTrainer
keyword args by @qgallouedec in #2150 - FIX: Sft train script FSDP QLoRA embedding mean resizing error by @BenjaminBossan in #2151
- Optimize DoRA in
eval
andno dropout
by @ariG23498 in #2122 - FIX Missing low_cpu_mem_usage argument by @BenjaminBossan in #2156
- MNT: Remove version pin of diffusers by @BenjaminBossan in #2162
- DOC: Improve docs for layers_pattern argument by @BenjaminBossan in #2157
- Update HRA by @DaShenZi721 in #2160
- fix fsdp_auto_wrap_policy by @eljandoubi in #2167
- MNT Remove Python 3.8 since it's end of life by @BenjaminBossan in #2135
- Improving error message when users pass layers_to_transform and layers_pattern by @JINO-ROHIT in #2169
- FEAT Add hotswapping functionality by @BenjaminBossan in #2120
- Fix to prefix tuning to fit transformers by @BenjaminBossan in #2096
- MNT: Enable Python 3.12 on CI by @BenjaminBossan in #2173
- MNT: Update docker nvidia base image to 12.4.1 by @BenjaminBossan in #2176
- DOC: Extend modules_to_save doc with pooler example by @BenjaminBossan in #2175
- FIX VeRA failure on multiple GPUs by @BenjaminBossan in #2163
- FIX: Import location of HF hub errors by @BenjaminBossan in #2178
- DOC: fix broken link in the README of loftq by @dennis2030 in #2183
- added checks for layers to transforms and layer pattern in lora by @JINO-ROHIT in #2159
- ENH: Warn when loading PiSSA/OLoRA together with other adapters by @BenjaminBossan in #2186
- TST: Skip AQLM test that is incompatible with torch 2.5 by @BenjaminBossan in #2187
- FIX: Prefix tuning ...
v0.13.2: Small patch release
This patch release contains a small bug fix for an issue that prevented some LoRA checkpoints to be loaded correctly (mostly concerning stable diffusion checkpoints not trained with PEFT when loaded in diffusers, #2144).
Full Changelog: v0.13.1...v0.13.2
v0.13.1: Small patch release
This patch release contains a small bug fix for the low_cpu_mem_usage=True
option (#2113).
Full Changelog: v0.13.0...v0.13.1
v0.13.0: LoRA+, VB-LoRA, and more
Highlights
New methods
LoRA+
@kallewoof added LoRA+ to PEFT (#1915). This is a function that allows to initialize an optimizer with settings that are better suited for training a LoRA adapter.
VB-LoRA
@leo-yangli added a new method to PEFT called VB-LoRA (#2039). The idea is to have LoRA layers be composed from a single vector bank (hence "VB") that is shared among all layers. This makes VB-LoRA extremely parameter efficient and the checkpoints especially small (comparable to the VeRA method), while still promising good fine-tuning performance. Check the VB-LoRA docs and example.
Enhancements
New Hugging Face team member @ariG23498 added the helper function rescale_adapter_scale
to PEFT (#1951). Use this context manager to temporarily increase or decrease the scaling of the LoRA adapter of a model. It also works for PEFT adapters loaded directly into a transformers or diffusers model.
@ariG23498 also added DoRA support for embedding layers (#2006). So if you're using the use_dora=True
option in the LoraConfig
, you can now also target embedding layers.
For some time now, we support inference with batches that are using different adapters for different samples, so e.g. sample 1-5 use "adapter1" and samples 6-10 use "adapter2". However, this only worked for LoRA layers so far. @saeid93 extended this to also work with layers targeted by modules_to_save
(#1990).
When loading a PEFT adapter, you now have the option to pass low_cpu_mem_usage=True
(#1961). This will initialize the adapter with empty weights ("meta" device) before loading the weights instead of initializing on CPU or GPU. This can speed up loading PEFT adapters. So use this option especially if you have a lot of adapters to load at the same time or if these adapters are very big. Please let us know if you encounter issues with this option, as we may make this the default in the future.
Changes
Safe loading of PyTorch weights
Unless indicated otherwise, PEFT adapters are saved and loaded using the secure safetensors
format. However, we also support the PyTorch format for checkpoints, which relies on the inherently insecure pickle protocol from Python. In the future, PyTorch will be more strict when loading these files to improve security by making the option weights_only=True
the default. This is generally recommended and should not cause any trouble with PEFT checkpoints, which is why with this release, PEFT will enable this by default. Please open an issue if this causes trouble.
What's Changed
- Bump version to 0.12.1.dev0 by @BenjaminBossan in #1950
- CI Fix Windows permission error on merge test by @BenjaminBossan in #1952
- Check if past_key_values is provided when using prefix_tuning in peft_model by @Nidhogg-lyz in #1942
- Add lora+ implementation by @kallewoof in #1915
- FIX: New bloom changes breaking prompt learning by @BenjaminBossan in #1969
- ENH Update VeRA preconfigured models by @BenjaminBossan in #1941
- fix: lora+: include lr in optimizer kwargs by @kallewoof in #1973
- FIX active_adapters for transformers models by @BenjaminBossan in #1975
- FIX Loading adapter honors offline mode by @BenjaminBossan in #1976
- chore: Update CI configuration for workflows by @XciD in #1985
- Cast to fp32 if using bf16 weights on cpu during
merge_and_unload
by @snarayan21 in #1978 - AdaLora: Trigger warning when user uses 'r' inplace of 'init_r' by @bhargavyagnik in #1981
- [Add] scaling LoRA adapter weights with a context manager by @ariG23498 in #1951
- DOC Small fixes for HQQ and section title by @BenjaminBossan in #1986
- Add docs and examples for X-LoRA by @EricLBuehler in #1970
- fix: fix docker build gpus by @XciD in #1987
- FIX: Adjust transformers version check for bloom by @BenjaminBossan in #1992
- [Hotfix] Fix BOFT mixed precision by @Edenzzzz in #1925
- [Suggestions] Updates suggested for
helper.rescale_adapter_scale
by @ariG23498 in #1989 - MAINT: Default to loading weights only for torch.load by @BenjaminBossan in #1993
- BOFT bug fix when saving by @Zeju1997 in #1994
- FIX Import error in BOFT half precision test by @BenjaminBossan in #1995
- Update lora.md (typos) by @nir-sh-automat-it in #2003
- TST Add LNTuningConfig and LoKrConfig to tests by @BenjaminBossan in #2005
- ENH: Warn when a user provided model name in the config renamed by @BenjaminBossan in #2004
- FIX CI Correctly report outcome of bnb import test by @BenjaminBossan in #2007
- Update docs for X-LoRA and some bugfixes by @EricLBuehler in #2002
- TST: Potentially Skip 8bit bnb regression test if compute capability is too low by @BenjaminBossan in #1998
- CI Activate single core multi backend bnb tests by @BenjaminBossan in #2008
- Fix usage of deprecated parameters/functions in X-LoRA by @EricLBuehler in #2010
- [tests] enable
test_vera_dtypes
on XPU by @faaany in #2017 - CI Remove regression tests from BNB CI by @BenjaminBossan in #2024
- [tests] enable regression tests on XPU by @faaany in #2019
- ENH: Better error msg for replace_lora_weights_loftq when using a local model. by @BenjaminBossan in #2022
- [tests] make cuda-only cases in
TestModelAndLayerStatus
device-agnostic by @faaany in #2026 - [tests] enable
test_mixed_adapter_batches_lora_opt_timing
on XPU by @faaany in #2021 - MAINT: Update ruff version to ~0.6.1 by @BenjaminBossan in #1965
- ENH Raise error when applying modules_to_save on tuner layer by @BenjaminBossan in #2028
- FIX: Don't target the classification head when using target_modules="all-linear" by @BenjaminBossan in #2033
- [tests] enable cuda-only tests in
test_common_gpu.py
to work on XPU by @faaany in #2031 - [Add] DoRA Embedding by @ariG23498 in #2006
- [tests] enable
test_gpu_examples.py
on XPU by @faaany in #2036 - Bug: set correct pre-commit-hooks version by @ltoniazzi in #2034
- Warn if using tied target module with
tie_word_embeddings
by @ltoniazzi in #2025 - ENH: Faster adapter loading if there are a lot of target modules by @BenjaminBossan in #2045
- FIX: Error with OLoRA init when using bnb by @BenjaminBossan in #2011
- FIX: Small numerical discrepancy for p-tuning after loading the model by @BenjaminBossan in #2047
- Add VB-LoRA by @leo-yangli in #2039
- Fixing scalings logging test by @EricLBuehler in #2042
- TST: Fewer inference steps for stable diffusion tests by @BenjaminBossan in #2051
- TST Speed up vision model tests by @BenjaminBossan in #2058
- TST: Make X-LoRA tests faster by @BenjaminBossan in #2059
- Update permissions for githubtoken stale.yml by @glegendre01 in #2061
- MAINT: Give stale bot permissions for PRs too by @BenjaminBossan in #2064
- avoid saving boft_P in adapter model by @sywangyi in #2050
- fix arguments for PiSSA preprocess by @keakon in #2053
- Apply deprecated
evaluation_strategy
by @muellerzr in #1664 - fixing multiple LoRA in the same batch or vit by @saeid93 in https://gi...
v0.12.0: New methods OLoRA, X-LoRA, FourierFT, HRA, and much more
Highlights
New methods
OLoRA
@tokenizer-decode added support for a new LoRA initialization strategy called OLoRA (#1828). With this initialization option, the LoRA weights are initialized to be orthonormal, which promises to improve training convergence. Similar to PiSSA, this can also be applied to models quantized with bitsandbytes. Check out the accompanying OLoRA examples.
X-LoRA
@EricLBuehler added the X-LoRA method to PEFT (#1491). This is a mixture of experts approach that combines the strength of multiple pre-trained LoRA adapters. Documentation has yet to be added but check out the X-LoRA tests for how to use it.
FourierFT
@Phoveran, @zqgao22, @Chaos96, and @DSAILatHKUST added discrete Fourier transform fine-tuning to PEFT (#1838). This method promises to match LoRA in terms of performance while reducing the number of parameters even further. Check out the included FourierFT notebook.
HRA
@DaShenZi721 added support for Householder Reflection Adaptation (#1864). This method bridges the gap between low rank adapters like LoRA on the one hand and orthogonal fine-tuning techniques such as OFT and BOFT on the other. As such, it is interesting for both LLMs and image generation models. Check out the HRA example on how to perform DreamBooth fine-tuning.
Enhancements
- IA³ now supports merging of multiple adapters via the
add_weighted_adapter
method thanks to @alexrs (#1701). - Call
peft_model.get_layer_status()
andpeft_model.get_model_status()
to get an overview of the layer/model status of the PEFT model. This can be especially helpful when dealing with multiple adapters or for debugging purposes. More information can be found in the docs (#1743). - DoRA now supports FSDP training, including with bitsandbytes quantization, aka QDoRA ()#1806).
- VeRA has been extended by @dkopi to support targeting layers with different weight shapes (#1817).
- @kallewoof added the possibility for ephemeral GPU offloading. For now, this is only implemented for loading DoRA models, which can be sped up considerably for big models at the cost of a bit of extra VRAM (#1857).
- Experimental: It is now possible to tell PEFT to use your custom LoRA layers through dynamic dispatching. Use this, for instance, to add LoRA layers for thus far unsupported layer types without the need to first create a PR on PEFT (but contributions are still welcome!) (#1875).
Examples
- @shirinyamani added a script and a notebook to demonstrate DoRA fine-tuning.
- @rahulbshrestha contributed a notebook that shows how to fine-tune a DNA language model with LoRA.
Changes
Casting of the adapter dtype
Important: If the base model is loaded in float16 (fp16) or bfloat16 (bf16), PEFT now autocasts adapter weights to float32 (fp32) instead of using the dtype of the base model (#1706). This requires more memory than previously but stabilizes training, so it's the more sensible default. To prevent this, pass autocast_adapter_dtype=False
when calling get_peft_model
, PeftModel.from_pretrained
, or PeftModel.load_adapter
.
Adapter device placement
The logic of device placement when loading multiple adapters on the same model has been changed (#1742). Previously, PEFT would move all adapters to the device of the base model. Now, only the newly loaded/created adapter is moved to the base model's device. This allows users to have more fine-grained control over the adapter devices, e.g. allowing them to offload unused adapters to CPU more easily.
PiSSA
- Calling
save_pretrained
with theconvert_pissa_to_lora
argument is deprecated, the argument was renamed topath_initial_model_for_weight_conversion
(#1828). Also, calling this no longer deletes the original adapter (#1933). - Using weight conversion (
path_initial_model_for_weight_conversion
) while also usinguse_rslora=True
andrank_pattern
oralpha_pattern
now raises an error (#1930). This used not to raise but inference would return incorrect outputs. We also warn about this setting during initialization.
Call for contributions
We are now making sure to tag appropriate issues with the contributions welcome
label. If you are looking for a way to contribute to PEFT, check out these issues.
What's Changed
- Bump version to 0.11.1.dev0 by @BenjaminBossan in #1736
- save and load base model with revision by @mnoukhov in #1658
- Autocast adapter weights if fp16/bf16 by @BenjaminBossan in #1706
- FIX BOFT setting env vars breaks C++ compilation by @BenjaminBossan in #1739
- Bump version to 0.11.2.dev0 by @BenjaminBossan in #1741
- TST: torch compile tests by @BenjaminBossan in #1725
- Add add_weighted_adapter to IA3 adapters by @alexrs in #1701
- ENH Layer/model status shows devices now by @BenjaminBossan in #1743
- Fix warning messages about
config.json
when the basemodel_id
is local. by @elementary-particle in #1668 - DOC TST Document and test reproducibility with models using batch norm by @BenjaminBossan in #1734
- FIX Use correct attribute name for HQQ in merge by @BenjaminBossan in #1791
- fix docs by @pacman100 in #1793
- FIX Allow same layer adapters on different devices by @BenjaminBossan in #1742
- TST Install bitsandbytes for compile tests by @BenjaminBossan in #1796
- FIX BOFT device error after PR 1742 by @BenjaminBossan in #1799
- TST Add regression test for DoRA, VeRA, BOFT, LN Tuning by @BenjaminBossan in #1792
- Docs / LoRA: Add more information on
merge_and_unload
docs by @younesbelkada in #1805 - TST: Add simple BNB regression tests by @BenjaminBossan in #1602
- CI Make torch compile tests run on GPU by @BenjaminBossan in #1808
- MNT Remove deprecated use of load_in_8bit by @BenjaminBossan in #1811
- Refactor to make DoRA and QDoRA work with FSDP by @BenjaminBossan in #1806
- FIX CI: Remove potentially problematic git command by @BenjaminBossan in #1820
- ENH / Workflow: Notify on slack about peft + transformers main test results by @younesbelkada in #1821
- FIX CI: Install pytest-reportlog package by @BenjaminBossan in #1822
- ENH / Workflow: Use repository variable by @younesbelkada in #1823
- Patch for Cambricon MLUs test by @huismiling in #1747
- Fix a documentation typo by @sparsh2 in #1833
- FIX Failing Llama tests due to new kv cache by @BenjaminBossan in #1832
- Workflow / Bnb: Add a mechanism to inform us if the import fails by @younesbelkada in #1830
- Workflow: Fix broken messages by @younesbelkada in #1842
- feat(ci): add trufflehog secrets detection by @McPatate in #1841
- DOC Describe torch_device argument in from_pretrained docstring by @BenjaminBossan in #1843
- Support for different layer shapes for VeRA by @dkopi in #1817
- CI Activate env to prevent bnb import error by @BenjaminBossan in #1845
- Fixed PeftMixedModel docstring example #1824 by @namanvats in #1850
- MNT Upgrade ruff version to ~0.4.8 by @BenjaminBossan in #1851
- Adding support for an optional initialization strategy OLoRA by @tokenizer-decode in #1828
- FIX: Adalora ranknum loaded on wrong device by @BenjaminBossan in #1852
- Workflow / FIX: Fix red status on our CI by @younesbelkada in #1854
- DOC FIX Comment about init of LoRA Embedding by @BenjaminBossan in https://gi...
v0.11.1
Patch release v0.11.1
Fix a bug that could lead to C++ compilation errors after importing PEFT (#1738 #1739).
Full Changelog: v0.11.0...v0.11.1
v0.11.0: New PEFT methods BOFT, VeRA, PiSSA, quantization with HQQ and EETQ, and more
Highlights
New methods
BOFT
Thanks to @yfeng95, @Zeju1997, and @YuliangXiu, PEFT was extended with BOFT: Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization (#1326, BOFT paper link). In PEFT v0.7.0, we already added OFT, but BOFT is even more parameter efficient. Check out the included BOFT controlnet and BOFT dreambooth examples.
VeRA
If the parameter reduction of LoRA is not enough for your use case, you should take a close look at VeRA: Vector-based Random Matrix Adaptation (#1564, VeRA paper link). This method resembles LoRA but adds two learnable scaling vectors to the two LoRA weight matrices. However, the LoRA weights themselves are shared across all layers, considerably reducing the number of trainable parameters.
The bulk of this PR was implemented by contributor @vvvm23 with the help of @dkopi.
PiSSA
PiSSA, Principal Singular values and Singular vectors Adaptation, is a new initialization method for LoRA, which was added by @fxmeng (#1626, PiSSA paper link). The improved initialization promises to speed up convergence and improve the final performance of LoRA models. When using models quantized with bitsandbytes, PiSSA initialization should reduce the quantization error, similar to LoftQ.
Quantization
HQQ
Thanks to @fahadh4ilyas, PEFT LoRA linear layers now support Half-Quadratic Quantization, HQQ (#1618, HQQ repo). HQQ is fast and efficient (down to 2 bits), while not requiring calibration data.
EETQ
Another new quantization method supported in PEFT is Easy & Efficient Quantization for Transformers, EETQ (#1675, EETQ repo). This 8 bit quantization method works for LoRA linear layers and should be faster than bitsandbytes.
Show adapter layer and model status
We added a feature to show adapter layer and model status of PEFT models in #1663. With the newly added methods, you can easily check what adapters exist on your model, whether gradients are active, whether they are enabled, which ones are active or merged. You will also be informed if irregularities have been detected.
To use this new feature, call model.get_layer_status()
for layer-level information, and model.get_model_status()
for model-level information. For more details, check out our docs on layer and model status.
Changes
Edge case of how we deal with modules_to_save
We had the issue that when we were using classes such as PeftModelForSequenceClassification, we implicitly added the classifier layers to model.modules_to_save
. However, this would only add a new ModulesToSaveWrapper
instance for the first adapter being initialized. When initializing a 2nd adapter via model.add_adapter
, this information was ignored. Now, peft_config.modules_to_save
is updated explicitly to add the classifier layers (#1615). This is a departure from how this worked previously, but it reflects the intended behavior better.
Furthermore, when merging together multiple LoRA adapters using model.add_weighted_adapter
, if these adapters had modules_to_save
, the original parameters of these modules would be used. This is unexpected and will most likely result in bad outputs. As there is no clear way to merge these modules, we decided to raise an error in this case (#1615).
What's Changed
- Bump version to 0.10.1.dev0 by @BenjaminBossan in #1578
- FIX Minor issues in docs, re-raising exception by @BenjaminBossan in #1581
- FIX / Docs: Fix doc link for layer replication by @younesbelkada in #1582
- DOC: Short section on using transformers pipeline by @BenjaminBossan in #1587
- Extend PeftModel.from_pretrained() to models with disk-offloaded modules by @blbadger in #1431
- [feat] Add
lru_cache
toimport_utils
calls that did not previously have it by @tisles in #1584 - fix deepspeed zero3+prompt tuning bug. word_embeddings.weight shape i… by @sywangyi in #1591
- MNT: Update GH bug report template by @BenjaminBossan in #1600
- fix the torch_dtype and quant_storage_dtype by @pacman100 in #1614
- FIX In the image classification example, Change the model to the LoRA… by @changhwa in #1624
- Remove duplicated import by @nzw0301 in #1622
- FIX: bnb config wrong argument names by @BenjaminBossan in #1603
- FIX Make DoRA work with Conv1D layers by @BenjaminBossan in #1588
- FIX: Send results to correct channel by @younesbelkada in #1628
- FEAT: Allow ignoring mismatched sizes when loading by @BenjaminBossan in #1620
- itemsize is torch>=2.1, use element_size() by @winglian in #1630
- FIX Multiple adapters and modules_to_save by @BenjaminBossan in #1615
- FIX Correctly call element_size by @BenjaminBossan in #1635
- fix: allow load_adapter to use different device by @yhZhai in #1631
- Adalora deepspeed by @sywangyi in #1625
- Adding BOFT: Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization by @yfeng95 in #1326
- Don't use deprecated
Repository
anymore by @Wauplin in #1641 - FIX Errors in the transformers integration docs by @BenjaminBossan in #1629
- update figure assets of BOFT by @YuliangXiu in #1642
- print_trainable_parameters - format
%
to be sensible by @stas00 in #1648 - FIX: Bug with handling of active adapters by @BenjaminBossan in #1659
- Remove
dreambooth
Git link by @charliermarsh in #1660 - add safetensor load in multitask_prompt_tuning by @sywangyi in #1662
- Adds Vera (Vector Based Random Matrix Adaption) #2 by @BenjaminBossan in #1564
- Update deepspeed.md by @sanghyuk-choi in #1679
- ENH: Add multi-backend tests for bnb by @younesbelkada in #1667
- FIX / Workflow: Fix Mac-OS CI issues by @younesbelkada in #1680
- FIX Use trl version of tiny random llama by @BenjaminBossan in #1681
- FIX: Don't eagerly import bnb for LoftQ by @BenjaminBossan in #1683
- FEAT: Add EETQ support in PEFT by @younesbelkada in #1675
- FIX / Workflow: Always notify on slack for docker image workflows by @younesbelkada in #1682
- FIX: upgrade autoawq to latest version by @younesbelkada in #1684
- FIX: Initialize DoRA weights in float32 if float16 is being used by @BenjaminBossan in #1653
- fix bf16 model type issue for ia3 by @sywangyi in #1634
- FIX Issues with AdaLora initialization by @BenjaminBossan in #1652
- FEAT Show adapter layer and model status by @BenjaminBossan in #1663
- Fixing the example by providing correct tokenized seq length by @jpodivin in #1686
- TST: Skiping AWQ tests for now .. by @younesbelkada in #1690
- Add LayerNorm tuning model by @DTennant in #1301
- FIX Use different doc builder docker image by @BenjaminBossan in #1697
- Set experimental dynamo config for compile tests by @BenjaminBossan in #1698
- fix the fsdp peft autowrap policy by @pacman100 in #1694
- Add LoRA support to HQQ Quantization by @fahadh4ilyas in #1618
- FEAT Helper to check if a model is a PEFT model by @BenjaminBossan in #1713
- support Cambricon MLUs device by @huismiling in #1687
- Some small cleanups in docstrings, copyright note by @BenjaminBossan in #1714
- Fix docs typo by @NielsRogge in #1719
- revise run_peft_multigpu.sh by @abzb1 in #1722
- Workflow: Add slack messages workflow by @younesbelkada in #1723
- DOC Document the PEFT checkpoint for...
v0.10.0: Fine-tune larger QLoRA models with DeepSpeed and FSDP, layer replication, enhance DoRA
Highlights
Support for QLoRA with DeepSpeed ZeRO3 and FSDP
We added a couple of changes to allow QLoRA to work with DeepSpeed ZeRO3 and Fully Sharded Data Parallel (FSDP). For instance, this allows you to fine-tune a 70B Llama model on two GPUs with 24GB memory each. Besides the latest version of PEFT, this requires bitsandbytes>=0.43.0
, accelerate>=0.28.0
, transformers>4.38.2
, trl>0.7.11
. Check out our docs on DeepSpeed and FSDP with PEFT, as well as this blogpost from answer.ai, for more details.
Layer replication
First time contributor @siddartha-RE added support for layer replication with LoRA. This allows you to duplicate layers of a model and apply LoRA adapters to them. Since the base weights are shared, this costs only very little extra memory, but can lead to a nice improvement of model performance. Find out more in our docs.
Improving DoRA
Last release, we added the option to enable DoRA in PEFT by simply adding use_dora=True
to your LoraConfig
. However, this only worked for non-quantized linear layers. With this PEFT release, we now also support Conv2d
layers, as well as linear layers quantized with bitsandbytes.
Mixed LoRA adapter batches
If you have a PEFT model with multiple LoRA adapters attached to it, it's now possible to apply different adapters (or, in fact, no adapter) on different samples in the same batch. To do this, pass a list of adapter names as an additional argument. For example, if you have a batch of three samples:
output = model(**inputs, adapter_names=["adapter1", "adapter2", "__base__"])`
Here, "adapter1"
and "adapter2"
should be the same name as your corresponding LoRA adapters and "__base__"
is a special name that refers to the base model without any adapter. Find more details in our docs.
Without this feature, if you wanted to run inference with different LoRA adapters, you'd have to use single samples or try to group batches with the same adapter, then switch between adapters using set_adapter
-- this is inefficient and inconvenient. Therefore, it is recommended to use this new, faster method from now on when encountering this scenario.
New LoftQ initialization function
We added an alternative way to initialize LoRA weights for a quantized model using the LoftQ method, which can be more convenient than the existing method. Right now, using LoftQ requires you to go through multiple steps as shown here. Furthermore, it's necessary to keep a separate copy of the quantized weights, as those are not identical to the quantized weights from the default model.
Using the new replace_lora_weights_loftq
function, it's now possible to apply LoftQ initialization in a single step and without the need for extra copies of the weights. Check out the docs and this example notebook to see how it works. Right now, this method only supports 4bit quantization with bitsandbytes, and the model has to be stored in the safetensors format.
Deprecations
The function prepare_model_for_int8_training
was deprecated for quite some time and is now removed completely. Use prepare_model_for_kbit_training
instead.
What's Changed
Besides these highlights, we added many small improvements and fixed a couple of bugs. All these changes are listed below. As always, we thank all the awesome contributors who helped us improve PEFT.
- Bump version to 0.9.1.dev0 by @BenjaminBossan in #1517
- Fix for "leaf Variable that requires grad" Error in In-Place Operation by @DopeorNope-Lee in #1372
- FIX [
CI
/Docker
] Follow up from #1481 by @younesbelkada in #1487 - CI: temporary disable workflow by @younesbelkada in #1534
- FIX [
Docs
/bnb
/DeepSpeed
] Add clarification on bnb + PEFT + DS compatibilities by @younesbelkada in #1529 - Expose bias attribute on tuner layers by @BenjaminBossan in #1530
- docs: highlight difference between
num_parameters()
andget_nb_trainable_parameters()
in PEFT by @kmehant in #1531 - fix: fail when required args not passed when
prompt_tuning_init==TEXT
by @kmehant in #1519 - Fixed minor grammatical and code bugs by @gremlin97 in #1542
- Optimize
levenshtein_distance
algorithm inpeft_lora_seq2seq_accelera…
by @SUNGOD3 in #1527 - Update
prompt_based_methods.md
by @insist93 in #1548 - FIX Allow AdaLoRA rank to be 0 by @BenjaminBossan in #1540
- FIX: Make adaptation prompt CI happy for transformers 4.39.0 by @younesbelkada in #1551
- MNT: Use
BitsAndBytesConfig
asload_in_*
is deprecated by @BenjaminBossan in #1552 - Add Support for Mistral Model in Llama-Adapter Method by @PrakharSaxena24 in #1433
- Add support for layer replication in LoRA by @siddartha-RE in #1368
- QDoRA: Support DoRA with BnB quantization by @BenjaminBossan in #1518
- Feat: add support for Conv2D DoRA by @sayakpaul in #1516
- TST Report slowest tests by @BenjaminBossan in #1556
- Changes to support fsdp+qlora and dsz3+qlora by @pacman100 in #1550
- Update style with ruff 0.2.2 by @BenjaminBossan in #1565
- FEAT Mixing different LoRA adapters in same batch by @BenjaminBossan in #1558
- FIX [
CI
] Fix test docker CI by @younesbelkada in #1535 - Fix LoftQ docs and tests by @BenjaminBossan in #1532
- More convenient way to initialize LoftQ by @BenjaminBossan in #1543
New Contributors
- @DopeorNope-Lee made their first contribution in #1372
- @kmehant made their first contribution in #1531
- @gremlin97 made their first contribution in #1542
- @SUNGOD3 made their first contribution in #1527
- @insist93 made their first contribution in #1548
- @PrakharSaxena24 made their first contribution in #1433
- @siddartha-RE made their first contribution in #1368
Full Changelog: v0.9.0...v0.10.0
v0.9.0: Merging LoRA weights, new quantization options, DoRA support, and more
Highlights
New methods for merging LoRA weights together
With PR #1364, we added new methods for merging LoRA weights together. This is not about merging LoRA weights into the base model. Instead, this is about merging the weights from different LoRA adapters into a single adapter by calling add_weighted_adapter
. This allows you to combine the strength from multiple LoRA adapters into a single adapter, while being faster than activating each of these adapters individually.
Although this feature has already existed in PEFT for some time, we have added new merging methods that promise much better results. The first is based on TIES, the second on DARE and a new one inspired by both called Magnitude Prune. If you haven't tried these new methods, or haven't touched the LoRA weight merging feature at all, you can find more information here:
AWQ and AQLM support for LoRA
Via #1394, we now support AutoAWQ in PEFT. This is a new method for 4bit quantization of model weights.
Similarly, we now support AQLM via #1476. This method allows to quantize weights to as low as 2 bits. Both methods support quantizing nn.Linear
layers. To find out more about all the quantization options that work with PEFT, check out our docs here.
Note these integrations do not support merge_and_unload()
yet, meaning for inference you need to always attach the adapter weights into the base model
DoRA support
We now support Weight-Decomposed Low-Rank Adaptation aka DoRA via #1474. This new method is builds on top of LoRA and has shown very promising results. Especially at lower ranks (e.g. r=8
), it should perform much better than LoRA. Right now, only non-quantized nn.Linear
layers are supported. If you'd like to give it a try, just pass use_dora=True
to your LoraConfig
and you're good to go.
Documentation
Thanks to @stevhliu and many other contributors, there have been big improvements to the documentation. You should find it more organized and more up-to-date. Our DeepSpeed and FSDP guides have also been much improved.
Check out our improved docs if you haven't already!
Development
If you're implementing custom adapter layers, for instance a custom LoraLayer
, note that all subclasses should now implement update_layer
-- unless they want to use the default method by the parent class. In particular, this means you should no longer use different method names for the subclass, like update_layer_embedding
. Also, we generally don't permit ranks (r
) of 0 anymore. For more, see this PR.
Developers should have an easier time now since we fully embrace ruff. If you're the type of person who forgets to call make style
before pushing to a PR, consider adding a pre-commit hook. Tests are now a bit less verbose by using plain asserts and generally embracing pytest features more fully. All of this comes thanks to @akx.
What's Changed
On top of these changes, we have added a lot of small changes since the last release, check out the full changes below. As always, we had a lot of support by many contributors, you're awesome!
- Release patch version 0.8.2 by @pacman100 in #1428
- [docs] Polytropon API by @stevhliu in #1422
- Fix
MatMul8bitLtBackward
view issue by @younesbelkada in #1425 - Fix typos by @szepeviktor in #1435
- Fixed saving for models that don't have _name_or_path in config by @kovalexal in #1440
- [docs] README update by @stevhliu in #1411
- [docs] Doc maintenance by @stevhliu in #1394
- [
core
/TPLinear
] Fix breaking change by @younesbelkada in #1439 - Renovate quality tools by @akx in #1421
- [Docs] call
set_adapters()
after add_weighted_adapter by @sayakpaul in #1444 - MNT: Check only selected directories with ruff by @BenjaminBossan in #1446
- TST: Improve test coverage by skipping fewer tests by @BenjaminBossan in #1445
- Update Dockerfile to reflect how to compile bnb from source by @younesbelkada in #1437
- [docs] Lora-like guides by @stevhliu in #1371
- [docs] IA3 by @stevhliu in #1373
- Add docstrings for set_adapter and keep frozen by @EricLBuehler in #1447
- Add new merging methods by @pacman100 in #1364
- FIX Loading with AutoPeftModel.from_pretrained by @BenjaminBossan in #1449
- Support
modules_to_save
config option when using DeepSpeed ZeRO-3 with ZeRO init enabled. by @pacman100 in #1450 - FIX Honor HF_HUB_OFFLINE mode if set by user by @BenjaminBossan in #1454
- [docs] Remove iframe by @stevhliu in #1456
- [docs] Docstring typo by @stevhliu in #1455
- [
core
/get_peft_state_dict
] Ignore all exceptions to avoid unexpected errors by @younesbelkada in #1458 - [
Adaptation Prompt
] Fix llama rotary embedding issue with transformers main by @younesbelkada in #1459 - [
CI
] Add CI tests on transformers main to catch early bugs by @younesbelkada in #1461 - Use plain asserts in tests by @akx in #1448
- Add default IA3 target modules for Mixtral by @arnavgarg1 in #1376
- add
magnitude_prune
merging method by @pacman100 in #1466 - [docs] Model merging by @stevhliu in #1423
- Adds an example notebook for showing multi-adapter weighted inference by @sayakpaul in #1471
- Make tests succeed more on MPS by @akx in #1463
- [
CI
] Fix adaptation prompt CI on transformers main by @younesbelkada in #1465 - Update docstring at peft_types.py by @eduardozamudio in #1475
- FEAT: add awq suppot in PEFT by @younesbelkada in #1399
- Add pre-commit configuration by @akx in #1467
- ENH [
CI
] Run tests only when relevant files are modified by @younesbelkada in #1482 - FIX [
CI
/bnb
] Fix failing bnb workflow by @younesbelkada in #1480 - FIX [
PromptTuning
] Simple fix for transformers >= 4.38 by @younesbelkada in #1484 - FIX: Multitask prompt tuning with other tuning init by @BenjaminBossan in #1144
- previous_dtype is now inferred from F.linear's result output type. by @MFajcik in #1010
- ENH: [
CI
/Docker
]: Create a workflow to temporarly build docker images in case dockerfiles are modified by @younesbelkada in #1481 - Fix issue with unloading double wrapped modules by @BenjaminBossan in #1490
- FIX: [
CI
/Adaptation Prompt
] Fix CI on transformers main by @younesbelkada in #1493 - Update peft_bnb_whisper_large_v2_training.ipynb: Fix a typo by @martin0258 in #1494
- covert SVDLinear dtype by @PHOSPHENES8 in #1495
- Raise error on wrong type for to modules_to_save by @BenjaminBossan in #1496
- AQLM support for LoRA by @BlackSamorez in #1476
- Allow trust_remote_code for tokenizers when loading AutoPeftModels by @OfficialDelta in https://...
Release v0.8.2
What's Changed
- Release v0.8.2.dev0 by @pacman100 in #1416
- Add IA3 Modules for Phi by @arnavgarg1 in #1407
- Update custom_models.md by @boyufan in #1409
- Add positional args to PeftModelForCausalLM.generate by @SumanthRH in #1393
- [Hub] fix: subfolder existence check by @sayakpaul in #1417
- FIX: Make merging of adapter weights idempotent by @BenjaminBossan in #1355
- [
core
] fix critical bug in diffusers by @younesbelkada in #1427
New Contributors
Full Changelog: v0.8.1...v0.8.2