This reposistory contains the code to train agents on any Gym, pyBullet, or MuJoCo environment using an Evolution Strategy (ES) algorithm. It's adapted from this OpenAI implementation of the distributed Evolution-Strategy (ES) introduced in Evolution Strategies as Scalable Alternative to Reinforcement Learning, Salimans et al. 2017.
This code was used to create the non-plastic baselines for our paper Meta-Learning through Hebbian Plasticity in Random Networks.
First, install dependencies. Use Python >= 3.8
:
# clone project
git clone https://github.com/enajx/ES
# install dependencies
cd ES
pip install -r requirements.txt
Next, use train_static.py
to train an agent. You can train any of OpenAI Gym's or pyBullet environments:
# train agent to solve the racing car
python train_static.py --environment CarRacing-v0
# train agent specifying evolution parameters, eg.
python train_static.py --environment CarRacing-v0 --generations 300 --popsize 200 --print_every 1 --lr 0.2 --sigma 0.1 --decay 0.995 --threads -1
Use python train_static.py --help
to display all the training options:
train_static.py [--environment] [--popsize] [--print_every] [--lr] [--decay] [--sigma] [--generations] [--folder] [--threads]
arguments:
--environment Environment: any OpenAI Gym or pyBullet environment may be used
--popsize Population size.
--print_every Print and save every N steps.
--lr ES learning rate.
--decay ES decay.
--sigma ES sigma: modulates the amount of noise used to populate each new generation
--generations Number of generations that the ES will run.
--folder folder to store the evolved weights
--threads Number of threads used to run evolution in parallel.
Once trained, use evaluate_static.py
to test the evolved agent:
python evaluate_static.py --environment CarRacing-v0 --path_weights weights.dat
When running on a headless server some environments will require a virtual display to run -eg. CarRacing-v0-, in this case run:
xvfb-run -a -s "-screen 0 1400x900x24 +extension RANDR" -- python train_static.py --environment CarRacing-v0
If you use the code for academic or commecial use, please cite the associated paper:
@inproceedings{Najarro2020,
title = {{Meta-Learning through Hebbian Plasticity in Random Networks}},
author = {Najarro, Elias and Risi, Sebastian},
booktitle = {Advances in Neural Information Processing Systems},
year = {2020},
url = {https://arxiv.org/abs/2007.02686}
}
In the paper we have tested the CarRacing-v0 and AntBulletEnv-v0 environments. For both of them we have written custom functions to bound the action activations; the rest of the environments have a simple clipping mechanism to bound their actions. Environments with a continuous action space (ie. Box) are likely to benefit from a continous scaling -rather than clipping- of their action spaces, either with a custom activation function or with Gym's RescaleAction wrapper.
Another element that greatly affects performance -if you have bounded computational resources- is the choice of a suitable early stop meachanism such that less CPU cycles are wasted, eg. for the CarRacing-v0 environment we use 20 consecutive steps with negative reward as an early stop signal.
Finally, some pixel-based environments would likely benefit from using grayscaling + stacked frames approach rather than feeding the network the three RGB channels as we do in our implementation, eg. by using Gym's Frame stack wrapper or the Atari preprocessing wrapper.