Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fullfinetune leaves model unusable #648

Closed
windprak opened this issue Oct 17, 2023 · 4 comments
Closed

fullfinetune leaves model unusable #648

windprak opened this issue Oct 17, 2023 · 4 comments

Comments

@windprak
Copy link
Contributor

I prepared a dataset in alpaca-style and trained llama 7b with the original alpaca code with it. This worked out fine and I got useable results. Now I decided to switch to lit-gpt, did everything accordingly to the tutorials and started training Llama 2 on the same dataset. It fails at the first validation step already, generating only gibberish. It doesn't when using the original dataset. Also generate/chat works fine. I don't understand how my dataset turns it into pieces.

`Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/4
Initializing distributed: GLOBAL_RANK: 3, MEMBER: 4/4
[rank: 3] Seed set to 1337
Initializing distributed: GLOBAL_RANK: 2, MEMBER: 3/4
Initializing distributed: GLOBAL_RANK: 1, MEMBER: 2/4

distributed_backend=nccl
All distributed processes registered. Starting with 4 processes

[rank: 0] Seed set to 1337
[rank: 2] Seed set to 1337
[rank: 1] Seed set to 1337
[rank: 2] Seed set to 1339
[rank: 3] Seed set to 1340
[rank: 0] Seed set to 1337
[rank: 1] Seed set to 1338
/root/miniconda3/envs/lit/lib/python3.10/site-packages/lightning/fabric/wrappers.py:176: You are calling the method GPT.set_kv_cache() from outside the model. This will bypass the wrapper from the strategy and result in incorrect behavior in .backward(). You should pass your inputs through GPT.forward().
{'eval_interval': 1000, 'save_interval': 2000, 'eval_iters': 100, 'eval_max_new_tokens': 100, 'log_interval': 1, 'devices': 4, 'learning_rate': 5e-05, 'batch_size': 1.0, 'micro_batch_size': 1, 'gradient_accumulation_iters': 1.0, 'epoch_size': 14771, 'num_epochs': 3, 'max_iters': 11078, 'weight_decay': 0.0, 'warmup_steps': 22156.0}
Loading model '/home/exstorage/meta-llama/Llama-2-13b-hf/lit_model.pth' with {'org': 'meta-llama', 'name': 'Llama-2-13b-hf', 'block_size': 4096, 'vocab_size': 32000, 'padding_multiple': 64, 'padded_vocab_size': 32000, 'n_layer': 40, 'n_head': 40, 'n_embd': 5120, 'rotary_percentage': 1.0, 'parallel_residual': False, 'bias': False, 'lm_head_bias': False, 'n_query_groups': 40, 'shared_attention_norm': False, '_norm_class': 'RMSNorm', 'norm_eps': 1e-05, '_mlp_class': 'LLaMAMLP', 'gelu_approximate': 'none', 'intermediate_size': 13824, 'rope_condense_ratio': 1, 'rope_base': 10000, 'head_size': 128, 'rope_n_elem': 128}
Number of trainable parameters: 13,015,864,320
The longest sequence length in the train data is 4096, the model's maximum sequence length is 4096 and context length is 4096
Validating ...
Recommend a movie for me to watch during the weekend and explain the reason.
Below is an instruction that describes a task. Write a response that appropriately completes the request.

Instruction:

Recommend a movie for me to watch during the weekend and explain the reason.

Response:OOOOOOOOOOOOOOOOOOOOtOO!!OOOO!O speakOO andOO -OOOOO

  1. crack.!!OOO andOOO[OOFOwOurentOurOruO,thO.short ch leastOeeOOOokOOlimO sttO noted ochO .O. proxySO -O
    Estimated TFLOPs: 1554.39
    Measured TFLOPs: 1428.29
    iter 0 step 1: loss 9.8921, iter time: 3720.26ms (optimizer.step)
    iter 1 step 2: loss 9.8023, iter time: 2078.05ms (optimizer.step)`

I downloaded the model again, reinstalled everything but still the results are the same. Also the final fine-tuned model will only produce garbage.

Here is what I observed:

  • my dataset works with the stanford_alpaca original code (pretty well)
  • using my dataset in lit-gpt on llama2 produces gibberish results from the very first validation step
  • using the original alpaca dataset in lit-gpt won't show that results, it validates well and trains well

I really don't know where to look for a solution anymore. Has anyone ever experienced this?

@windprak
Copy link
Contributor Author

I think this is related to FSDP. huggingface/transformers#26498
Switching to deepspeed was night and day.

@Jeronymous
Copy link

We had a bad experience with lit-gpt when finetuning on multi-GPU: #652
Maybe you're having the same issue...

@WilliamGazeley
Copy link

@windprak Is there a guide on how to use deepspeed with fabric, or some example? I'm trying to do it and keep failing to load the model weights.

@rasbt
Copy link
Collaborator

rasbt commented Mar 29, 2024

We improved things by a lot in the recent months and also have configuration files for good out of the box performance now, e.g., see https://github.com/Lightning-AI/litgpt/tree/main/config_hub/finetune.

Please feel free to reopen this issue and discussion if you have any follow-up questions or concerns.

@rasbt rasbt closed this as completed Mar 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants