fullfinetune leaves model unusable #648

windprak · 2023-10-17T07:08:56Z

I prepared a dataset in alpaca-style and trained llama 7b with the original alpaca code with it. This worked out fine and I got useable results. Now I decided to switch to lit-gpt, did everything accordingly to the tutorials and started training Llama 2 on the same dataset. It fails at the first validation step already, generating only gibberish. It doesn't when using the original dataset. Also generate/chat works fine. I don't understand how my dataset turns it into pieces.

`Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/4
Initializing distributed: GLOBAL_RANK: 3, MEMBER: 4/4
[rank: 3] Seed set to 1337
Initializing distributed: GLOBAL_RANK: 2, MEMBER: 3/4
Initializing distributed: GLOBAL_RANK: 1, MEMBER: 2/4

distributed_backend=nccl
All distributed processes registered. Starting with 4 processes

[rank: 0] Seed set to 1337
[rank: 2] Seed set to 1337
[rank: 1] Seed set to 1337
[rank: 2] Seed set to 1339
[rank: 3] Seed set to 1340
[rank: 0] Seed set to 1337
[rank: 1] Seed set to 1338
/root/miniconda3/envs/lit/lib/python3.10/site-packages/lightning/fabric/wrappers.py:176: You are calling the method GPT.set_kv_cache() from outside the model. This will bypass the wrapper from the strategy and result in incorrect behavior in .backward(). You should pass your inputs through GPT.forward().
{'eval_interval': 1000, 'save_interval': 2000, 'eval_iters': 100, 'eval_max_new_tokens': 100, 'log_interval': 1, 'devices': 4, 'learning_rate': 5e-05, 'batch_size': 1.0, 'micro_batch_size': 1, 'gradient_accumulation_iters': 1.0, 'epoch_size': 14771, 'num_epochs': 3, 'max_iters': 11078, 'weight_decay': 0.0, 'warmup_steps': 22156.0}
Loading model '/home/exstorage/meta-llama/Llama-2-13b-hf/lit_model.pth' with {'org': 'meta-llama', 'name': 'Llama-2-13b-hf', 'block_size': 4096, 'vocab_size': 32000, 'padding_multiple': 64, 'padded_vocab_size': 32000, 'n_layer': 40, 'n_head': 40, 'n_embd': 5120, 'rotary_percentage': 1.0, 'parallel_residual': False, 'bias': False, 'lm_head_bias': False, 'n_query_groups': 40, 'shared_attention_norm': False, '_norm_class': 'RMSNorm', 'norm_eps': 1e-05, '_mlp_class': 'LLaMAMLP', 'gelu_approximate': 'none', 'intermediate_size': 13824, 'rope_condense_ratio': 1, 'rope_base': 10000, 'head_size': 128, 'rope_n_elem': 128}
Number of trainable parameters: 13,015,864,320
The longest sequence length in the train data is 4096, the model's maximum sequence length is 4096 and context length is 4096
Validating ...
Recommend a movie for me to watch during the weekend and explain the reason.
Below is an instruction that describes a task. Write a response that appropriately completes the request.

Instruction:

Recommend a movie for me to watch during the weekend and explain the reason.

Response:OOOOOOOOOOOOOOOOOOOOtOO!!OOOO!O speakOO andOO -OOOOO

crack.!!OOO andOOO[OOFOwOurentOurOruO,thO.short ch leastOeeOOOokOOlimO sttO noted ochO .O. proxySO -O
Estimated TFLOPs: 1554.39
Measured TFLOPs: 1428.29
iter 0 step 1: loss 9.8921, iter time: 3720.26ms (optimizer.step)
iter 1 step 2: loss 9.8023, iter time: 2078.05ms (optimizer.step)`

I downloaded the model again, reinstalled everything but still the results are the same. Also the final fine-tuned model will only produce garbage.

Here is what I observed:

my dataset works with the stanford_alpaca original code (pretty well)
using my dataset in lit-gpt on llama2 produces gibberish results from the very first validation step
using the original alpaca dataset in lit-gpt won't show that results, it validates well and trains well

I really don't know where to look for a solution anymore. Has anyone ever experienced this?

The text was updated successfully, but these errors were encountered:

windprak · 2023-10-18T07:14:57Z

I think this is related to FSDP. huggingface/transformers#26498
Switching to deepspeed was night and day.

Jeronymous · 2023-10-30T17:22:37Z

We had a bad experience with lit-gpt when finetuning on multi-GPU: #652
Maybe you're having the same issue...

WilliamGazeley · 2023-11-24T10:55:21Z

@windprak Is there a guide on how to use deepspeed with fabric, or some example? I'm trying to do it and keep failing to load the model weights.

rasbt · 2024-03-29T17:58:38Z

We improved things by a lot in the recent months and also have configuration files for good out of the box performance now, e.g., see https://github.com/Lightning-AI/litgpt/tree/main/config_hub/finetune.

Please feel free to reopen this issue and discussion if you have any follow-up questions or concerns.

rasbt closed this as completed Mar 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fullfinetune leaves model unusable #648

fullfinetune leaves model unusable #648

windprak commented Oct 17, 2023

windprak commented Oct 18, 2023

Jeronymous commented Oct 30, 2023

WilliamGazeley commented Nov 24, 2023

rasbt commented Mar 29, 2024

fullfinetune leaves model unusable #648

fullfinetune leaves model unusable #648

Comments

windprak commented Oct 17, 2023

`Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/4 Initializing distributed: GLOBAL_RANK: 3, MEMBER: 4/4 [rank: 3] Seed set to 1337 Initializing distributed: GLOBAL_RANK: 2, MEMBER: 3/4 Initializing distributed: GLOBAL_RANK: 1, MEMBER: 2/4

distributed_backend=nccl All distributed processes registered. Starting with 4 processes

Instruction:

Response:OOOOOOOOOOOOOOOOOOOOtOO!!OOOO!O speakOO andOO -OOOOO

windprak commented Oct 18, 2023

Jeronymous commented Oct 30, 2023

WilliamGazeley commented Nov 24, 2023

rasbt commented Mar 29, 2024

`Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/4
Initializing distributed: GLOBAL_RANK: 3, MEMBER: 4/4
[rank: 3] Seed set to 1337
Initializing distributed: GLOBAL_RANK: 2, MEMBER: 3/4
Initializing distributed: GLOBAL_RANK: 1, MEMBER: 2/4

distributed_backend=nccl
All distributed processes registered. Starting with 4 processes