-
Notifications
You must be signed in to change notification settings - Fork 27.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
problem on finetuning llama and baichuan with new version transformers #26816
Comments
I finetuned all the models above with the code in FastChat repository on A100-80G. torchrun --nproc_per_node=8 --master_port=20001 fastchat/train/train_xformers.py \
--model_name_or_path llama-7b \
--data_path fschat.json \
--bf16 True \
--output_dir output\
--num_train_epochs 3 \
--per_device_train_batch_size 2 \
--gradient_accumulation_steps 8 \
--save_strategy "epoch" \
--learning_rate 2e-5 \
--weight_decay 0. \
--warmup_ratio 0.04 \
--lr_scheduler_type "cosine" \
--logging_steps 1 \
--fsdp "full_shard auto_wrap" \
--fsdp_transformer_layer_cls_to_wrap 'LlamaDecoderLayer' \
--model_max_length 4096 \
--gradient_checkpointing True \
--lazy_preprocess True \
--report_to wandb |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
I was using transformers 4.33.2 (along with fsdp implemented in pytorch and the accelerate package from HF) and also observed the issue when pretraining llama from scratch: a quickly failing loss when using fsdp+bf16. There's no issue with fsdp+fp32 or ddp+bf16. I upgraded to 4.35.2 and the issue seems to be resolved. I don't know the exact reason behind this though. Before upgrading transformers, I incorporated many tips from #26498 but they didn't help much in my case. |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
When I tried to finetune llama model with sharegpt dataset, I got these loss curves:
the green loss curve is trained with transformers 4.33.2 version and the orange loss curve is trained with transformers 4.28.1.
obviously, the green one is abnormal and the orange one is correct. I wonder why this happens? The only thing I do is changing the Transformers version. Is this some bugs in transformers or I made something wrong?
The text was updated successfully, but these errors were encountered: