Language-Integrated Value Iteration

Code for How Can LLM Guide RL? A Value-Based Approach.

Authors: Shenao Zhang*, Sirui Zheng*, Shuqi Ke, Zhihan Liu, Wanxin Jin, Jianbo Yuan, Yingxiang Yang, Hongxia Yang, Zhaoran Wang (* indicates equal contribution)

ALFWorld

Environment setup

Clone the repository:

git clone https://github.com/agentification/Language-Integrated-VI.git
cd Language-Integrated-VI/alfworld

Create a virtual environment and install the required packages:

pip install -r requirements.txt

Install the ALFWorld environment. Please refer to https://github.com/alfworld/alfworld.
Set OPENAI_API_KEY environment variable to your OpenAI API key:

export OPENAI_API_KEY=<your key>

Run the code

./run.sh

InterCode

Steps to run our algorithm in the InterCode environment.

Environment setup

Clone the repository, create a virtual environment, and install necessary dependencies:

git clone https://github.com/agentification/Language-Integrated-VI.git
cd Language-Integrated-VI/intercode
conda env create -f environment.yml
conda activate intercode

Run setup.sh to create the docker images for the InterCode Bash, SQL, and CTF environments.
Set OPENAI_API_KEY environment variable to your OpenAI API key:

export OPENAI_API_KEY=<your key>

Run the code

For InterCode-SQL, run

./scripts/expr_slinvit_sql.sh

For InterCode-Bash, run

./scripts/expr_slinvit_bash.sh

BlocksWorld

Environment setup

Our experiments are conducted with Vicuna-13B/33B (v1.3). The required packages can be installed by
```
pip install -r requirements.txt
```

Run the code

To run the RAP experiments, here is a shell script of the script

CUDA_VISIBLE_DEVICES=0,1,2 nohup python -m torch.distributed.run --master_port 1034 --nproc_per_node 1 run_mcts.py --task mcts --model_name Vicuna --verbose False --data data/blocksworld/step_6.json --max_depth 6 --name m6ct_roll60 --rollouts 60 --model_path lmsys/vicuna-33b-v1.3 --num_gpus 3

To run the SLINVIT experiments, here is a shell script example

CUDA_VISIBLE_DEVICES=3,4,5 nohup python -m torch.distributed.run --master_port 39855 --nproc_per_node 1 run.py \
--model_name Vicuna \
--name planning_step6_13b \
--data data/blocksworld/step_6.json \
--horizon 6 \
--search_depth 5 \
--alpha 0 \
--sample_per_node 2 \
--model_path lmsys/vicuna-13b-v1.3 \
--num_gpus 3 \
--use_lang_goal

Citation

@article{zhang2024can,
  title={How Can LLM Guide RL? A Value-Based Approach},
  author={Zhang, Shenao and Zheng, Sirui and Ke, Shuqi and Liu, Zhihan and Jin, Wanxin and Yuan, Jianbo and Yang, Yingxiang and Yang, Hongxia and Wang, Zhaoran},
  journal={arXiv preprint arXiv:2402.16181},
  year={2024}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Language-Integrated Value Iteration

ALFWorld

Environment setup

Run the code

InterCode

Environment setup

Run the code

BlocksWorld

Environment setup

Run the code

Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

Language-Integrated Value Iteration

ALFWorld

Environment setup

Run the code

InterCode

Environment setup

Run the code

BlocksWorld

Environment setup

Run the code

Citation