Skip to content

shenao-zhang/reward-augmented-preference

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Reward-Augmented Data Enhances Direct Preference Alignment of LLMs

Code for Reward-Augmented Data Enhances Direct Preference Alignment of LLMs.

Authors: Shenao Zhang¹, Zhihan Liu¹, Boyi Liu², Yufeng Zhang, Yingxiang Yang², Yongfei Liu², Liyu Chen², Tao Sun², Zhaoran Wang¹.

¹Northwestern University, ²ByteDance

⚡ Augment Preference Data with Zero Cost — No Need to Change Algorithms! ⚡

How it works: illustration.jpg

Performance:results.jpg

Installation

Install the package dependencies as follows:

 python -m pip install .

To fine-tune Gemma-2-9b-it, upgrade transformers by pip install --upgrade transformers.

Preference Data Augmentation

Replace USERNAME in scripts/preprocess.py and scripts/reward_augmentation.py with your huggingface username.

First, preprocess the UltraFeedback dataset following this script, while keeping the quality scores of the responses:

python scripts/preprocess.py

Then the reward-augmented preference data can be obtained by running:

python scripts/reward_augmentation.py

DPO Training

Replace USERNAME in config_full.yaml with your huggingface username.

Then run standard DPO on the reward-augmented preference data, e.g., on the Qwen2-7B-Instruct model:

ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/deepspeed_zero3.yaml scripts/run_dpo.py recipes/qwen2-7b-instruct-dpo-ra/dpo/config_full.yaml

Citation

@article{zhang2024reward,
  title={Reward-Augmented Data Enhances Direct Preference Alignment of LLMs},
  author={Zhang, Shenao and Liu, Zhihan and Liu, Boyi and Zhang, Yufeng and Yang, Yingxiang and Liu, Yongfei and Chen, Liyu and Sun, Tao and Wang, Zhaoran},
  journal={arXiv preprint arXiv:2410.08067},
  year={2024}
}

Acknowledgement

This repo is built upon The Alignment Handbook. We thank the authors for their great work.

About

The official implementation of Preference Data Reward-Augmentation.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published