This is the source code of RPBT, which is proposed in the paper "Learning Diverse Risk Preferences in Population-based Self-play"(http://arxiv.org/abs/2305.11476). This repository provide a single file implementation of RPPO(Risk-sensitive PPO) in toyexample/rppo.py
, and a lightweight, scalable implementation of RPBT(Population based self-play with RPPO). All the experiments were conducted with one AMD EPYC 7702 64-Core Processor and one GeForce RTX 3090 GPU.
-
single-agent setting
- Toy example in the paper
- classic gym environment
-
multi-agent competitive setting
The videos are available at https://sites.google.com/view/rpbt.
We use python 3.8
pip install -r requirements.txt
We provide a single file implementation of RPPO for toyexample.
run
python toyexample/rppo.py --env-id ToyEnv-v0 --num-steps 200 --tau 0.2
--tau
is the value of risk level --risk False
, we recover the original PPO.
The hyperparameter configs are in config.py
. We provide 2 training scripts:
For Slimevolley, bash train_vb.sh
For Sumoants, bash train_sumo.sh
If not to use PBT, set --population-size 1
, we recover the RPPO.
We appreciate the following repos for their valuable code base implementations: