A tutorial to help you finetune LLama-2-7b using this repository full of garbarge code with ZeRO2/3 enabled. #430

LLMChild · 2024-07-25T08:25:29Z

Setup Environment

Firstly, make sure that everything works well in https://github.com/microsoft/Megatron-DeepSpeed/tree/main/examples_deepspeed/finetune_hf_llama. This make sure that you have solved all environment issue and you can start to convert the huggingface checkpoint into a zero enabled ckpt.

Checkpoint Conversion

The simplest idea is using the script hf2megads_weight_converter.py and disable pipeline parallel to get a Deepspeed ZeRO Checkpoint.
Ah! But it can not be done when you are using this script of https://github.com/microsoft/Megatron-DeepSpeed/tree/main/examples_deepspeed/finetune_hf_llama.
When you are trying to do such a thing, you will get error.

Megatron-DeepSpeed/tools/hf2megads_weight_converter.py

Lines 288 to 291 in 3afd267

    
           if args.deepspeed and not args.no_pipeline_parallel: 
        
               model = GPTModelPipe(config, num_tokentypes=0, parallel_output=True) 
        
           else: 
        
               raise NotImplementedError("Not implemented")

Then you may think universal_checkpointing technique may help you to achieve such a conversion.
Ah! You wish
universal_checkpointing can help you to achive conversion between ZeRO1/2/3 checkpoints with different world size and TP/PP/ZeRO1 checkpoints with different parallel size. But it can not achieve conversion between TP/PP/ZeRO1 and ZeRO2/3.
So there is only one way left, to figure out how to achive a ZeRO2/3 checkpoint conversion method based on this script hf2megads_weight_converter.py.

Finetune script

After getting a ZeRO checkpoint, everything else is quite easy.
But since this tutorial https://github.com/microsoft/Megatron-DeepSpeed/tree/main/examples_deepspeed/finetune_hf_llama do not expect you will finetune llama using ZeRO and without pipeline-parallel, there is still a little effort to get there.

Detail modification , please refer to this fix-zero-load.
and it should work well.

The text was updated successfully, but these errors were encountered:

LLMChild · 2024-07-29T09:17:59Z

In my case, another problem arises when I specify --untie-embeddings-and-output-weights in the script. The whole program gets stuck in an NCCL all-gather operation. Surprisingly, it gets stuck at a random iteration, making reproduction quite difficult. If you encounter the same situation, try modifying the code in language_model.py to forcefully disable tensor parallel (TP) linear.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A tutorial to help you finetune LLama-2-7b using this repository full of garbarge code with ZeRO2/3 enabled. #430

A tutorial to help you finetune LLama-2-7b using this repository full of garbarge code with ZeRO2/3 enabled. #430

LLMChild commented Jul 25, 2024 •

edited

Loading

LLMChild commented Jul 29, 2024 •

edited

Loading

A tutorial to help you finetune LLama-2-7b using this repository full of garbarge code with ZeRO2/3 enabled. #430

A tutorial to help you finetune LLama-2-7b using this repository full of garbarge code with ZeRO2/3 enabled. #430

Comments

LLMChild commented Jul 25, 2024 • edited Loading

Setup Environment

Checkpoint Conversion

Finetune script

LLMChild commented Jul 29, 2024 • edited Loading

LLMChild commented Jul 25, 2024 •

edited

Loading

LLMChild commented Jul 29, 2024 •

edited

Loading