Bugs when fine-tune tiny-llama with instructions using tiny-llama's conversation template #2992

hychaochao · 2024-01-31T14:30:35Z

Thanks for your great work! I met some problems when using train_with_template.py to fine-tune the tinyllama by using tinyllama's conversation template.
This is my script:

torchrun --nproc_per_node=1 --master_port=20001 train_with_template.py \
    --model_name_or_path .../tinyllama  \
    --data_path /data/dummy_conversation.json \
    --bf16 True \
    --output_dir/tinyllama-test \
    --num_train_epochs 1 \
    --per_device_train_batch_size 2 \
    --per_device_eval_batch_size 2 \
    --gradient_accumulation_steps 16 \
    --evaluation_strategy "steps" \
    --eval_steps 1500 \
    --save_strategy "steps" \
    --save_total_limit 8 \
    --learning_rate 2e-5 \
    --weight_decay 0. \
    --warmup_ratio 0.04 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \
    --fsdp "full_shard auto_wrap" \
    --fsdp_transformer_layer_cls_to_wrap 'LlamaDecoderLayer' \
    --tf32 True \
    --model_max_length 2048 \
    --gradient_checkpointing True \
    --lazy_preprocess True

However, the tokenization mismatch warning was reported, and the training loss was always 0.

WARNING: tokenization mismatch: 132 vs. 124. (ignored)
some of the output:
{'loss': 0.0, 'learning_rate': 1.8477344278896708e-05, 'epoch': 0.21}
 21%|██        | 26/123 [00:25<01:27,  1.11it/s]
 22%|██▏       | 27/123 [00:26<01:26,  1.12it/s]
{'loss': 0.0, 'learning_rate': 1.833313919082515e-05, 'epoch': 0.22}
 22%|██▏       | 27/123 [00:26<01:26,  1.12it/s]
 23%|██▎       | 28/123 [00:27<01:24,  1.12it/s]
{'loss': 0.0, 'learning_rate': 1.818302775908169e-05, 'epoch': 0.23}

just like #2871
I've confirmed that I'm using tinyllama's template when training, but it still doesn't work.

The text was updated successfully, but these errors were encountered:

congchan · 2024-02-01T05:02:49Z

Hi, hychaochao could you help to confirm if this pr #2996 fix the issue?

hychaochao · 2024-02-01T12:01:35Z

Hi, hychaochao could you help to confirm if this pr #2996 fix the issue?

Yes, it works!!!Thanks for your great work again!
this is my training arguments:

torchrun --nproc_per_node=1 --master_port=20001 train.py \
    --model_name_or_path .../tinyllama  \
    --data_path FastChat-main/data/dummy_conversation.json \
    --bf16 True \
    --output_dir .../tinyllama-test \
    --num_train_epochs 1 \
    --per_device_train_batch_size 1 \
    --per_device_eval_batch_size 1 \
    --gradient_accumulation_steps 1 \
    --evaluation_strategy "steps" \
    --eval_steps 1500 \
    --save_strategy "steps" \
    --save_total_limit 8 \
    --learning_rate 2e-5 \
    --weight_decay 0. \
    --warmup_ratio 0.04 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \
    --fsdp "full_shard auto_wrap" \
    --fsdp_transformer_layer_cls_to_wrap 'LlamaDecoderLayer' \
    --tf32 True \
    --model_max_length 2048 \
    --gradient_checkpointing True

and this is part of the output:

0%|          | 1/500 [00:01<11:28,  1.38s/it]
                                               
{'loss': 3.9269, 'learning_rate': 1.0000000000000002e-06, 'epoch': 0.0}

  0%|          | 1/500 [00:01<11:28,  1.38s/it]
  0%|          | 2/500 [00:01<07:11,  1.15it/s]
                                               
{'loss': 3.0636, 'learning_rate': 2.0000000000000003e-06, 'epoch': 0.0}

  0%|          | 2/500 [00:01<07:11,  1.15it/s]
  1%|          | 3/500 [00:02<05:55,  1.40it/s]

hychaochao · 2024-02-02T06:34:41Z

The script I used for testing was wrong. I was testing the train.py file instead of the train_with_template.py file. Now I retested and found that it still doesn't work. This is my training arguments:

torchrun --nproc_per_node=4 --master_port=20001 train_with_template.py \
    --model_name_or_path /home/bingxing2/home/scx6203/luckychao/tinyllama  \
    --data_path /data/dummy_conversation.json \
    --bf16 True \
    --output_dir /weight/tinyllama-test \
    --num_train_epochs 4 \
    --per_device_train_batch_size 2 \
    --per_device_eval_batch_size 2 \
    --gradient_accumulation_steps 2 \
    --evaluation_strategy "steps" \
    --eval_steps 1500 \
    --save_strategy "steps" \
    --save_total_limit 8 \
    --learning_rate 2e-5 \
    --weight_decay 0. \
    --warmup_ratio 0.04 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \
    --fsdp "full_shard auto_wrap" \
    --fsdp_transformer_layer_cls_to_wrap 'LlamaDecoderLayer' \
    --tf32 True \
    --model_max_length 2048 \
    --gradient_checkpointing True \
    --resume_from_checkpoint /tinyllama-chat

and this is part of the output:

WARNING: tokenization mismatch: 80 vs. 74. (ignored)
WARNING: tokenization mismatch: 136 vs. 128. (ignored)
WARNING: tokenization mismatch: 61 vs. 57. (ignored)
.........
1%|          | 1/124 [00:03<07:00,  3.42s/it]
                                               
{'loss': 0.0, 'learning_rate': 4.000000000000001e-06, 'epoch': 0.03}

  1%|          | 1/124 [00:03<07:00,  3.42s/it]
  2%|▏         | 2/124 [00:05<05:35,  2.75s/it]
                                               
{'loss': 0.0, 'learning_rate': 8.000000000000001e-06, 'epoch': 0.06}

  2%|▏         | 2/124 [00:05<05:35,  2.75s/it]
  2%|▏         | 3/124 [00:07<05:07,  2.54s/it]

hychaochao · 2024-02-02T06:53:40Z

@congchan and I met the same error when I use train_with_template.py to fine-tune the llama-2 by using llama-2's conversation template.
this is my script:

torchrun --nproc_per_node=4 --master_port=20001 train_with_template.py \
    --model_name_or_path .../llama-2  \
    --data_path  /data/dummy_conversation.json \
    --bf16 True \
    --output_dir /weight/llama2-test \
    --num_train_epochs 4 \
    --per_device_train_batch_size 4 \
    --per_device_eval_batch_size 4 \
    --gradient_accumulation_steps 8 \
    --evaluation_strategy "steps" \
    --eval_steps 1500 \
    --save_strategy "steps" \
    --save_total_limit 8 \
    --learning_rate 2e-5 \
    --weight_decay 0. \
    --warmup_ratio 0.04 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \
    --tf32 True \
    --model_max_length 2048 \
    --gradient_checkpointing True \
    --resume_from_checkpoint /llama2-chat \
    --deepspeed '/home/bingxing2/home/scx6203/luckychao/stanford_alpaca/configs/default_offload_opt_param.json' \

this is part of the output:

WARNING: tokenization mismatch: 55 vs. 52. (ignored)
WARNING: tokenization mismatch: 103 vs. 99. (ignored)
WARNING: tokenization mismatch: 44 vs. 42. (ignored)
WARNING: tokenization mismatch: 42 vs. 40. (ignored)
WARNING: tokenization mismatch: 98 vs. 94. (ignored)
......
8%|▊         | 1/12 [01:02<11:23, 62.12s/it]
                                              
{'loss': 0.0, 'learning_rate': 0.0, 'epoch': 0.26}

  8%|▊         | 1/12 [01:02<11:23, 62.12s/it]
 17%|█▋        | 2/12 [02:02<10:11, 61.16s/it]
                                              
{'loss': 0.0, 'learning_rate': 2e-05, 'epoch': 0.52}

congchan · 2024-02-03T09:13:05Z

Hi, @hychaochao I just testing with llama 2 and tinyllama and they both works with this fix:
#3006

Fee free to confirm the results on your data, and let me know if it works. Thank you.

hychaochao · 2024-02-03T12:56:55Z

@congchan Very happy to see that you have fixed the bug！ I have tried it on my data and it works. Thank you again for such a great job and such efficiency！
btw I noticed that the conv template is selected by the name of the model path, which is a bit inconvenient for local models. Maybe you can add a "model_id" parameter to determine the template, just like training vicuna, but just a little suggestion.

congchan · 2024-02-03T13:47:52Z

@congchan Very happy to see that you have fixed the bug！ I have tried it on my data and it works. Thank you again for such a great job and such efficiency！ btw I noticed that the conv template is selected by the name of the model path, which is a bit inconvenient for local models. Maybe you can add a "model_id" parameter to determine the template, just like training vicuna, but just a little suggestion.

Thanks for your suggestions!

congchan mentioned this issue Feb 1, 2024

fix: tokenization mismatch when training with different templates #2996

Merged

3 tasks

hychaochao closed this as completed Feb 1, 2024

hychaochao reopened this Feb 2, 2024

hychaochao mentioned this issue Feb 2, 2024

feat: train with template #2951

Merged

3 tasks

congchan mentioned this issue Feb 3, 2024

fix: inconsistent tokenization by llama tokenizer #3006

Merged

3 tasks

hychaochao closed this as completed Feb 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bugs when fine-tune tiny-llama with instructions using tiny-llama's conversation template #2992

Bugs when fine-tune tiny-llama with instructions using tiny-llama's conversation template #2992

hychaochao commented Jan 31, 2024

congchan commented Feb 1, 2024

hychaochao commented Feb 1, 2024

hychaochao commented Feb 2, 2024

hychaochao commented Feb 2, 2024

congchan commented Feb 3, 2024

hychaochao commented Feb 3, 2024

congchan commented Feb 3, 2024

Bugs when fine-tune tiny-llama with instructions using tiny-llama's conversation template #2992

Bugs when fine-tune tiny-llama with instructions using tiny-llama's conversation template #2992

Comments

hychaochao commented Jan 31, 2024

congchan commented Feb 1, 2024

hychaochao commented Feb 1, 2024

hychaochao commented Feb 2, 2024

hychaochao commented Feb 2, 2024

congchan commented Feb 3, 2024

hychaochao commented Feb 3, 2024

congchan commented Feb 3, 2024