Llama-2 loss and learning rate is always 0 after first step #2072

jerryjalapeno · 2023-07-25T03:54:47Z

The log appears like this:

{'loss': 1.8709, 'learning_rate': 0.0, 'epoch': 0.0}

{'loss': 0.0, 'learning_rate': 0.0, 'epoch': 0.01}

{'loss': 0.0, 'learning_rate': 0.0, 'epoch': 0.02}

{'loss': 0.0, 'learning_rate': 0.0, 'epoch': 0.03}

{'loss': 0.0, 'learning_rate': 0.0, 'epoch': 0.04}

{'loss': 0.0, 'learning_rate': 0.0, 'epoch': 0.05}:

Script:

deepspeed --include="localhost:0,1,2,3" --master_port=20001 fastchat/train/train_mem.py
--deepspeed playground/deepspeed_config_s6.json
--model_name_or_path NousResearch/Redmond-Puffin-13B
--data_path data/dummy_conversation.json
--output_dir PUFFIN_ON_ZOOTIEZ
--num_train_epochs 2
--per_device_train_batch_size 2
--per_device_eval_batch_size 2
--gradient_accumulation_steps 8
--evaluation_strategy epoch
--save_strategy "steps"
--save_steps 1200
--save_total_limit 10
--learning_rate 1e-4
--weight_decay 0.
--warmup_ratio 0.1
--lr_scheduler_type "cosine"
--logging_steps 1
--fp16
--cache_dir "/tmp"
--model_max_length 4096
--gradient_checkpointing True
--lazy_preprocess True

CCCarloooo · 2023-09-19T02:01:59Z

I also have a similar issue.

CohenQU · 2023-09-25T20:54:52Z

Hi @jerryjalapeno @CCCarloooo , I also encountered the same problem when I finetuning a 7B model, do you fix it now?

jerryjalapeno · 2023-09-25T20:56:09Z

Hi @jerryjalapeno @CCCarloooo , I also encountered the same problem when I finetuning a 7B model, do you fix it now?

No. I ended up switching to axolotl repo, which works fine for me.

karthik19967829 · 2023-09-30T22:56:41Z

This PR addresses this issue #2423

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Llama-2 loss and learning rate is always 0 after first step #2072

Llama-2 loss and learning rate is always 0 after first step #2072

jerryjalapeno commented Jul 25, 2023 •

edited

Loading

CCCarloooo commented Sep 19, 2023

CohenQU commented Sep 25, 2023

jerryjalapeno commented Sep 25, 2023

karthik19967829 commented Sep 30, 2023

Llama-2 loss and learning rate is always 0 after first step #2072

Llama-2 loss and learning rate is always 0 after first step #2072

Comments

jerryjalapeno commented Jul 25, 2023 • edited Loading

CCCarloooo commented Sep 19, 2023

CohenQU commented Sep 25, 2023

jerryjalapeno commented Sep 25, 2023

karthik19967829 commented Sep 30, 2023

jerryjalapeno commented Jul 25, 2023 •

edited

Loading