Implementation of RPPO(Risk-sensitive PPO) and RPBT(Population-based self-play with RPPO)
competition
ppo
population-based-training
self-play
multi-agent-reinforcement-learning
risk-sensitive-preferences
reinforcment-learning
-
Updated
May 22, 2023 - Python