Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add example Multi-GPU training script using pyreft #143

Merged
merged 12 commits into from
Dec 19, 2024

Conversation

ramvenkat98
Copy link
Contributor

Context

Create a version of the example Alpaca training script that supports training on multiple GPUs.

Implementation

The main changes from the original script are the distributed training set-up, data sampler change, logging changes, and saving/loading changes. Also referred to this multiGPU script when writing this.

Distributed training is done with DDP, and training is done using torchrun.

Training

Check that the existing training script works after the changes
Command:

python train.py --model_name_or_path yahma/llama-7b-hf --data_path ./alpaca_data.json --output_dir ./test_single_gpu_v1/ --layers "8;19" --rank 4 --position "f1+l1" --num_train_epochs 2 --per_device_train_batch_size 4 --per_device_eval_batch_size 4 --gradient_accumulation_steps 8 --evaluation_strategy "no" --save_strategy "no" --learning_rate 2e-5 --weight_decay 0. --warmup_ratio 0.03 --lr_scheduler_type "cosine" --logging_steps 1     --max_n_train_example 10000

Output:
ram1998-job-3144778.txt
Wandb Report:
https://api.wandb.ai/links/ramvenkat98/5vqqkau8

Loss curves look reasonable, training completes successfully.

Check that the new script works as expected
Command:

torchrun --nproc_per_node 4 train_multigpu.py --model_name_or_path yahma/llama-7b-hf --data_path ./alpaca_data.json --output_dir ./test_multi_gpu_v1/ --layers "8;19" --rank 4 --position "f1+l1" --num_train_epochs 2 --per_device_train_batch_size 4 --per_device_eval_batch_size 4 --gradient_accumulation_steps 8 --evaluation_strategy "no" --save_strategy "no" --learning_rate 2e-5 --weight_decay 0. --warmup_ratio 0.03 --lr_scheduler_type "cosine" --logging_steps 1     --max_n_train_example 10000

Output:
ram1998-job-4376567.txt
Wandb Report:
https://api.wandb.ai/links/ramvenkat98/tgsh7ru8

Testing

Compared the outputs of the original model, single-GPU REFT trained model, and multi-GPU REFT trained model. The two trained ones give a reasonable answer (note this is a Bento notebook exported and saved as a html file though the extension is txt):
local_test_output.txt

@frankaging frankaging merged commit 28e6e0a into stanfordnlp:main Dec 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants