mosaicml · danbider · Jul 6, 2023 · Jul 6, 2023 · Jul 6, 2023 · Jul 6, 2023
@@ -338,11 +338,29 @@ lora:
     r: 16
     lora_alpha: 32
     lora_dropout: 0.05
-    target_modules: ['Wqkv']
+    target_modules: ['Wqkv', 'out_proj', 'down_proj', 'up_proj']
 ```
+You can train LoRA models either using FSDP for further memory savings. in your `.yaml`, specify:
+<!--pytest.mark.skip-->
+```yaml
+fsdp_config:
+  use_orig_params: true
+  sharding_strategy: FULL_SHARD
+  mixed_precision: PURE
+  activation_checkpointing: true
+  activation_checkpointing_reentrant: false
+  activation_cpu_offload: false
+  limit_all_gathers: true
+```
+or default to DDP, as follows:
+<!--pytest.mark.skip-->
+```yaml
+fsdp:
+  {}
+```
+
 - In the current release, these features have Beta support.
 - For efficiency, The MPT model concatenates the `Q`, `K`, and `V` matrices in each attention block into a single `Wqkv` matrix that is three times wider. Currently, LoRA supports a low-rank approximation to this `Wqkv` matrix.
-- Known issue: PEFT / LoRA do not directly work with FSDP.
 
 ### Can I quantize these models and/or run on CPU?
 - The LLM Foundry codebase does not directly have examples of quantization or limited-resource inference. But you can check out [GGML](https://github.com/ggerganov/ggml) (same library that powers llama.cpp) which has built support for efficiently running MPT models on CPU! You _can_ load your model in 8-bit precision for inference using the [bitsandbytes library](https://github.com/TimDettmers/bitsandbytes) and Hugging Face's [accelerate](https://huggingface.co/docs/accelerate/index) via `load model = AutoModelForCausalLM.from_pretrained(model_name, load_in_8bit=True, device_map="auto", trust_remote_code=True)`, although we have not extensively benchmarked the performance (see the Hugging Face [quantization documentation](https://huggingface.co/docs/transformers/main/main_classes/quantization) for more detail).

@@ -5,12 +5,20 @@
 # which is MIT licensed
 
 import functools
+import warnings
 from typing import Any, Iterable, List
 
 import torch
 from transformers import PreTrainedModel
 from transformers.models.opt.modeling_opt import OPTDecoder
 
+try:
+    from peft import LoraModel
+    lora_model_type = LoraModel
+except ImportError:
+    lora_model_type = None
+    warnings.warn('peft is not installed, LoraModel will not be available')
+
 
 # helper functions
 def rhasattr(obj: Any, attr: str):
@@ -182,6 +190,16 @@ def prepare_hf_causal_lm_model_for_fsdp(model: PreTrainedModel,
             tied_embeddings._fsdp_wrap = False  # type: ignore
             lm_head._fsdp_wrap = False  # type: ignore
 
+    # applying ._fsdp_wrap = True for the LoRA modules
+    # this is needed because added LoRA modules have requires_grad=True,
+    # while the rest of the modules have requires_grad=False
+    if lora_model_type is not None:  # peft is installed
+        if isinstance(model.base_model,
+                      lora_model_type):  # we have builR a LoraModel
+            for name, module in model_block.named_modules():
+                if 'lora' in name:  # peft adds modules named with lora
+                    module._fsdp_wrap = True
+
     # FSDP Wrap and Activation Checkpoint every model block
     model.fsdp_wrap_fn = lambda module: isinstance(module, block_type)
     model.activation_checkpointing_fn = lambda module: isinstance(