PyTorch Reinforcement Learning

This repo contains tutorials covering reinforcement learning using PyTorch 1.3 and Gym 0.15.4 using Python 3.7.

If you find any mistakes or disagree with any of the explanations, please do not hesitate to submit an issue. I welcome any feedback, positive or negative!

Getting Started

To install PyTorch, see installation instructions on the PyTorch website.

To install Gym, see installation instructions on the Gym GitHub repo.

Tutorials

All tutorials use Monte Carlo methods to train the CartPole-v1 environment with the goal of reaching a total episode reward of 475 averaged over the last 25 episodes. There are also alternate versions of some algorithms to show how to use those algorithms with other environments.

0 - Introduction to Gym
1 - Vanilla Policy Gradient (REINFORCE)

This tutorial covers the workflow of a reinforcement learning project. We'll learn how to: create an environment, initialize a model to act as our policy, create a state/action/reward loop and update our policy. We update our policy with the vanilla policy gradient algorithm, also known as REINFORCE.
2 - Actor Critic

This tutorial introduces the family of actor-critic algorithms, which we will use for the next few tutorials.
3 - Advantage Actor Critic (A2C)

We cover an improvement to the actor-critic framework, the A2C (advantage actor-critic) algorithm.
4 - Generalized Advantage Estimation (GAE)

We improve on A2C by adding GAE (generalized advantage estimation).
5 - Proximal Policy Evaluation

We cover another improvement on A2C, PPO (proximal policy optimization).

Potential algorithms covered in future tutorials: DQN, ACER, ACKTR.

References

'Reinforcement Learning: An Introduction' - http://incompleteideas.net/sutton/book/the-book-2nd.html
'Algorithms for Reinforcement Learning' - https://sites.ualberta.ca/~szepesva/papers/RLAlgsInMDPs.pdf
List of key papers in deep reinforcement learning - https://spinningup.openai.com/en/latest/spinningup/keypapers.html

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
.gitignore		.gitignore
0 - Introduction to Gym.ipynb		0 - Introduction to Gym.ipynb
1 - Vanilla Policy Gradient (REINFORCE) [CartPole].ipynb		1 - Vanilla Policy Gradient (REINFORCE) [CartPole].ipynb
1_policy_gradient.ipynb		1_policy_gradient.ipynb
2 - Actor Critic [CartPole].ipynb		2 - Actor Critic [CartPole].ipynb
2_q_learning.ipynb		2_q_learning.ipynb
3 - Advantage Actor Critic (A2C) [CartPole].ipynb		3 - Advantage Actor Critic (A2C) [CartPole].ipynb
3_advantage_actor_critic.ipynb		3_advantage_actor_critic.ipynb
3a - Advantage Actor Critic (A2C) [LunarLander].ipynb		3a - Advantage Actor Critic (A2C) [LunarLander].ipynb
4 - Generalized Advantage Estimation (GAE) [CartPole].ipynb		4 - Generalized Advantage Estimation (GAE) [CartPole].ipynb
4a - Generalized Advantage Estimation (GAE) [LunarLander].ipynb		4a - Generalized Advantage Estimation (GAE) [LunarLander].ipynb
5 - Proximal Policy Optimization (PPO) [CartPole].ipynb		5 - Proximal Policy Optimization (PPO) [CartPole].ipynb
5a - Proximal Policy Optimization (PPO) [LunarLander].ipynb		5a - Proximal Policy Optimization (PPO) [LunarLander].ipynb
8 - n step A2C.ipynb		8 - n step A2C.ipynb
LICENSE		LICENSE
README.md		README.md
checkpoint_viz.ipynb		checkpoint_viz.ipynb
dqn_working.ipynb		dqn_working.ipynb
n_step_a2c.py		n_step_a2c.py
q_learning.py		q_learning.py
runner.py		runner.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PyTorch Reinforcement Learning

Getting Started

Tutorials

References

About

Releases

Packages

Languages

License

bentrevett/pytorch-rl

Folders and files

Latest commit

History

Repository files navigation

PyTorch Reinforcement Learning

Getting Started

Tutorials

References

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages