PPO

PPO implementation for OpenAI gym environment based on Unity ML Agents: https://github.com/Unity-Technologies/ml-agents

Notable changes include:

Ability to continuously display progress with non-stochastic policy during training
Works with OpenAI environments
Option to record episodes
State normalization for given number of frames
Frame skip
Faster reward discounting etc.