Skip to content

Latest commit

 

History

History

ppo

Proximal Policy Optimization

In Progress ⚠️ blocked on gorgonia/gorgonia#373

Implementation of the Proximal Policy Optimization algorithm.

How it works

PPO is an on-policy method that aims to solve the step size issue with policy gradients. Typically policy gradient algorithms are very sensitive to step size, too large a step and the agent can fall into an unrecoverable state, to small a size and the agent takes a very long time to train. PPO solves this issue by ensuring that an agents policy never deviates too far from the previous policy.

eq
A ratio is taken of the old policy to the new policy and the delta is clipped to ensure policy changes remain within a bounds.

Examples

See the experiments folder for example implementations.

Roadmap

References