Predictive Probabilistic Merging of Policies

Predictive Probabilistic Merging of Policies is a deep reinforcement learning framework where corrective feedback is used to improve sample efficiency. This code was made for the study 'Deep Reinforcement Learning with Feedback-based Exploration'. In this repository, PPMP is demonstrated in combination with DDPG, as in the paper.

Getting Started

Setting up the environment

We've tested this code using Ubuntu 18.10 (64 bit) and Python 3.6. The most important dependencies of this code are gym, tflearn, seaborn and tensorflow. If you do not have these around yet, you may use conda+pip to quickly setup an appropriate environment by

conda create --name ppmp tensorflow seaborn
source activate ppmp
pip install tflearn gym

and you should be ready. If not, the specific testing environment is reproduced with the ppmp_env.txt environment file.

To quickly setup remote intances, we've used the server_setup.sh file. It is not recommended to run this on your personal computer, but feel free to have a look at the installation procedures.

Running

The main file is ppmp.py. If you've activated your environment, you may invoke it from the root directory in terminal by

python ppmp.py

and it should output a csv (batch mode: for the header, use the --header argument). By default, PPMP learns the Pendulum-v0 problem using synthesised feedback. If you'd like to correct yourself, try

python ppmp.py --algorithm ppmp_human

It should start rendering the environments, but the problem is paused at startup. Press spacebar first, and then the arrow keys to provide feedback.

Different environments are called with the env argument, e.g. python ppmp.py --env MountainCarContinuous-v0. For other arguments (hyperparameters, environment settings, ...), see python ppmp.py --help or open the code.

To record single runs, you may like to navigate to the testbench and call a script that saves the results for you:

cd single_analysis/pendulum
./run_pendulum --heads 7

A live plot is then available in live_view.pdf.

Acknowledgement

This algorithm was developed by Jan Scholten under the supervision of Jens Kober and Carlos Celemin. We especially thank Daan Wout for countless fruitful discussions. We acknowledge Patrick Emami for providing the DDPG baseline code.

Licence

You are free to use, share and adapt this work under the conditions stipulated in LICENCE.md.

Reference

@Misc{Scholten2019arXiv,
  author = {Scholten, Jan and Wout, Daan and Celemin, Carlos and Kober, Jens},
  title  = {Deep Reinforcement Learning with Feedback-based Exploration},
  year   = {2019},
  note   = {arXiv:1903.06151 [cs.LG]},
  file   = {https://arxiv.org/pdf/1903.06151.pdf},
  url    = {https://arxiv.org/abs/1903.06151},
}

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
analysis_batch		analysis_batch
human_analysis		human_analysis
oracles		oracles
single_analysis		single_analysis
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
csvddpg.py		csvddpg.py
oracles.py		oracles.py
parts_from_ddpg.py		parts_from_ddpg.py
parts_from_ppmp.py		parts_from_ppmp.py
ppmp.py		ppmp.py
ppmp_env.txt		ppmp_env.txt
server_setup.sh		server_setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Predictive Probabilistic Merging of Policies

Getting Started

Setting up the environment

Running

Acknowledgement

Licence

Reference

About

Releases

Packages

Languages

License

janscholten/ppmp

Folders and files

Latest commit

History

Repository files navigation

Predictive Probabilistic Merging of Policies

Getting Started

Setting up the environment

Running

Acknowledgement

Licence

Reference

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages