Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A tutorial to help you finetune LLama-2-7b using this repository full of garbarge code with ZeRO2/3 enabled. #430

Open
LLMChild opened this issue Jul 25, 2024 · 1 comment

Comments

@LLMChild
Copy link

LLMChild commented Jul 25, 2024

Setup Environment

Firstly, make sure that everything works well in https://github.com/microsoft/Megatron-DeepSpeed/tree/main/examples_deepspeed/finetune_hf_llama. This make sure that you have solved all environment issue and you can start to convert the huggingface checkpoint into a zero enabled ckpt.

Checkpoint Conversion

The simplest idea is using the script hf2megads_weight_converter.py and disable pipeline parallel to get a Deepspeed ZeRO Checkpoint.
Ah! But it can not be done when you are using this script of https://github.com/microsoft/Megatron-DeepSpeed/tree/main/examples_deepspeed/finetune_hf_llama.
When you are trying to do such a thing, you will get error.

if args.deepspeed and not args.no_pipeline_parallel:
model = GPTModelPipe(config, num_tokentypes=0, parallel_output=True)
else:
raise NotImplementedError("Not implemented")

Then you may think universal_checkpointing technique may help you to achieve such a conversion.
Ah! You wish
universal_checkpointing can help you to achive conversion between ZeRO1/2/3 checkpoints with different world size and TP/PP/ZeRO1 checkpoints with different parallel size. But it can not achieve conversion between TP/PP/ZeRO1 and ZeRO2/3.
So there is only one way left, to figure out how to achive a ZeRO2/3 checkpoint conversion method based on this script hf2megads_weight_converter.py.

Finetune script

After getting a ZeRO checkpoint, everything else is quite easy.
But since this tutorial https://github.com/microsoft/Megatron-DeepSpeed/tree/main/examples_deepspeed/finetune_hf_llama do not expect you will finetune llama using ZeRO and without pipeline-parallel, there is still a little effort to get there.

Detail modification , please refer to this fix-zero-load.
and it should work well.

@LLMChild
Copy link
Author

LLMChild commented Jul 29, 2024

In my case, another problem arises when I specify --untie-embeddings-and-output-weights in the script. The whole program gets stuck in an NCCL all-gather operation. Surprisingly, it gets stuck at a random iteration, making reproduction quite difficult. If you encounter the same situation, try modifying the code in language_model.py to forcefully disable tensor parallel (TP) linear.
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant