Table of Contents
This repository contains the implementation of ROSMO, a Regularized One-Step Model-based algorithm for Offline-RL, introduced in our paper "Efficient Offline Policy Optimization with a Learned Model". We provide the training codes for both Atari and BSuite experiments, and have made the reproduced results on Atari MsPacman
publicly available at W&B.
Please follow the installation guide.
To run the BSuite experiments, please ensure you have downloaded the datasets and placed them at the directory defined by CONFIG.data_dir
in experiment/bsuite/config.py
.
- Debug run.
python experiment/bsuite/main.py -exp_id test -env cartpole
- Enable W&B logger and start training.
python experiment/bsuite/main.py -exp_id test -env cartpole -nodebug -use_wb -user ${WB_USER}
The following commands are examples to train 1) a ROSMO agent, 2) its sampling variant, and 3) a MZU agent on the game MsPacman
.
- Train ROSMO with exact policy target.
python experiment/atari/main.py -exp_id rosmo -env MsPacman -nodebug -use_wb -user ${WB_USER}
- Train ROSMO with sampled policy target (N=4).
python experiment/atari/main.py -exp_id rosmo-sample-4 -sampling -env MsPacman -nodebug -use_wb -user ${WB_USER}
- Train MuZero unplugged for benchmark (N=20).
python experiment/atari/main.py -exp_id mzu-sample-20 -algo mzu -num_simulations 20 -env MsPacman -nodebug -use_wb -user ${WB_USER}
If you find this work useful for your research, please consider citing
@inproceedings{
liu2023rosmo,
title={Efficient Offline Policy Optimization with a Learned Model},
author={Zichen Liu and Siyi Li and Wee Sun Lee and Shuicheng Yan and Zhongwen Xu},
booktitle={International Conference on Learning Representations},
year={2023},
url={https://arxiv.org/abs/2210.05980}
}
ROSMO
is distributed under the terms of the Apache2 license.
We thank the following projects which provide great references:
This is not an official Sea Limited or Garena Online Private Limited product.