Research Code for the Offline Experiments of "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"
Yifei Zhou, Andrea Zanette, Jiayi Pan, Aviral Kumar, Sergey Levine
This repo supports the following methods:
- Offline ArCHer
- Offline Filtered BC
- Offline BC
And the following environments
conda create -n archer python==3.10
conda activate archer
git clone https://github.com/andreazanette/OfflineArcher
cd OfflineArcher
python -m pip install -e .
Offline datasets and Oracles checkpoints used in the paper can be found here. You will need to create an "oracles" and "datasets" folder and put the oracle and dataset in such folders. The oracle for Twenty Questions should be named 20q_t5_oracle.pt and the dataset should be called "twenty_questions.json".
You can directly run experiments by runnig the launch scripts. For example, in order to lauch Offline Archer on Twenty Question simply run
. submit_OfflineArcher_TwentyQuestions.sh
The code uses the torch lightning framework. Please refer to the documentation of torch lightning (https://lightning.ai/docs/pytorch/stable/) for additional information, such as using different flags when launching the code. For example, in order to run on GPU 0 please add --trainer.devices=[0] to the launch script.
@misc{zhou2024archer,
title={ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL},
author={Yifei Zhou and Andrea Zanette and Jiayi Pan and Sergey Levine and Aviral Kumar},
year={2024},
eprint={2402.19446},
archivePrefix={arXiv},
primaryClass={cs.LG}
}