-
Notifications
You must be signed in to change notification settings - Fork 532
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/fsdp lora #435
Feature/fsdp lora #435
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks good I approve to merge if tests pass
|
Hey @danbider , With this PR can we do fsdp lora on llama models? Basically any other model but MPT? |
should support any model including MPT. |
I'm a fan of the re-design -- in particular moving things into the One area that gives me pause is integration with HF for models that have been LoRA-fied. Are there any gotchas when it comes to converting to HF (and uploading to HF) from a Composer checkpoint? Similarly, are there any gotchas when it comes to working with a HF model that is already LoRA-fied? My sense is that, with the latter, everything should be OK as long as the right things are installed, but I'd like to get a sanity check on that. The former seems like it will still be missing support. @dakinggg can probably add some insight here, because I'm worried that the code that actually modifies the model (in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please also add some basic tests for the lora addition?
TUTORIAL.md
Outdated
<!--pytest.mark.skip--> | ||
```yaml | ||
fsdp_config: | ||
use_orig_params: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we confirm if this is necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will verify this tomorrow AM, good point
TUTORIAL.md
Outdated
``` | ||
or default to DDP, as follows: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think to default DDP just leaving out the FSDP section entirely is a bit cleaner?
scripts/train/train.py
Outdated
'lora', | ||
must_exist=False, | ||
default_value=None) | ||
if lora_config is not None: | ||
if lora_config.get('rank', None) is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is supposed to happen if lora config is provided but rank is none? Should that be an error?
edit from daniel Co-authored-by: Daniel King <43149077+dakinggg@users.noreply.github.com>
Co-authored-by: Daniel King <43149077+dakinggg@users.noreply.github.com>
will add those now. |
Any chance we could include max_seq_len: 2048
global_seed: 17
dist_timeout: 5400
# Run Name
run_name: # If left blank, will be read from env var $RUN_NAME
model:
name: hf_causal_lm
pretrained: true
pretrained_model_name_or_path: mosaicml/mpt-7b
config_overrides:
attn_config:
attn_impl: triton
attn_uses_sequence_id: false
lora:
# UPDATE these as needed
args:
r: 1
lora_alpha: 32
# target_modules: ["Wqkv", "out_proj", "up_proj", "down_proj"]
target_modules: ["up_proj", "down_proj"]
lora_dropout: 0.05
bias: none
task_type: "CAUSAL_LM"
# Tokenizer
tokenizer:
name: mosaicml/mpt-7b
kwargs:
model_max_length: ${max_seq_len}
# Dataloaders
train_loader:
name: finetuning
dataset:
hf_name: danbider/codegen
split: train
max_seq_len: ${max_seq_len}
allow_pad_trimming: false
decoder_only_format: true
packing_ratio: 19.6 # concat examples
shuffle: true
drop_last: true
num_workers: 8
pin_memory: false
prefetch_factor: 2
persistent_workers: true
timeout: 0
eval_loader:
name: finetuning
dataset:
hf_name: danbider/codegen
split: test
max_seq_len: ${max_seq_len}
allow_pad_trimming: false
decoder_only_format: true
packing_ratio: 19.6
shuffle: true
drop_last: true
num_workers: 8
pin_memory: false
prefetch_factor: 2
persistent_workers: true
timeout: 0
# Optimization
# Based on MPT pretraining
scheduler:
name: cosine_with_warmup
t_warmup: 50ba
alpha_f: 0.0
optimizer:
name: decoupled_lionw
lr: 1.0e-4 # lora needs higher LR
betas:
- 0.9
- 0.95
eps: 1.0e-8
weight_decay: 1.0e-4
algorithms:
gradient_clipping:
clipping_type: norm
clipping_threshold: 1.0
max_duration: 2ep
eval_interval: 1ep
eval_first: true
global_train_batch_size: 48
# System
seed: ${global_seed}
device_eval_batch_size: 4
device_train_microbatch_size: 1
precision: amp_bf16
# FSDP
# fsdp_config:
# sharding_strategy: FULL_SHARD
# mixed_precision: PURE
# activation_checkpointing: true
# activation_checkpointing_reentrant: false
# activation_cpu_offload: false
# limit_all_gathers: true
# verbose: false
# leave out fsdp_config for DDP or single GPU
# Logging
progress_bar: false
log_to_console: true
console_log_interval: 1ba
callbacks:
speed_monitor:
window_size: 10
lr_monitor: {}
memory_monitor: {}
runtime_estimator: {}
# loggers:
# wandb: {}
# Checkpoint to local filesystem or remote object store
save_interval: 5000ba
save_num_checkpoints_to_keep: 1 # Important, this cleans up checkpoints saved to DISK
save_folder: ./{run_name}/checkpoints
# save_folder: s3://my-bucket/my-folder/{run_name}/checkpoint Any example we could point users to would be great, even if we plan on refining it later |
Hi, i tested your branch and got some bugs:
I'm using your suggestion from I'm using LLaMA-2 LorA
So, let's start with bug №1: You define Here you build the model, but lora section alraedy not here. So, It happens, because you poping this lora section here and not using anywhere, only using for
You need not to pop, or use function from About bug №2: We build Later, you making reinit of your LoRA model, so we have error here cause of FSDP here For I think we need to rename this for Lora Llama maybe or refactor this in other way Thanks! I hope soon we will can train |
thanks for this. fixed the first one, we think. will take care of the second as well. |
…oundry into feature/fsdp-lora
For Jose: Two main issues with this PR currently that I know of: |
1351637
to
5b905b0
Compare
Closing in favor of #886 |
add a fix to wrap trainable lora modules with fsdp.
verified successful training on 8 gpus.
related, but not affecting this PR: @bcui19 and I discussed a small enhancement of composer to accompany this PR which will spare big untrained modules from being fetched by fsdp