This repository is a ongoing culmination of my efforts to understand the various algorithms used to train reinforcement learning (RL) agents to perform various tasks. It is meant to be gentle guide for others who may wish to explore the world of RL as well. Following the content in the order listed below should be the fastest way to get up to speed. Once done, see here for a medium scale project which combines several rl concepts to solve an interesting problem.
This section uses the k-armed bandit problem discussed in [1] to introduced several important RL concepts. I recreate some of the experiments to show that I can obtain similar results and solve some of the exercises. Concepts covered include:
- Value functions
- Epoch-greedy action selection
- Balancing exploration and exploitation
- Upper confidence bounds
- Stationary vs non-stationary problems
This section covers temporal difference methods for RL and demonstrates their performance using examples and exercises from [1].
- SARSA
- Q-Learning
This section introduces some of the new state of the art techniques for solving RL problems. To compare between them, I've opted to use the CartPole enviroment for its simplicity (from OpenAI's Gym).
- Q-Learning with a neural network
- Deep Q-Networks
- Experience Replay
- Target Networks
- Double Deep Q-Networks
- Prioritized Experience Replay
- Python 3.5
- numpy
- matplotlib
- pandas
- OpenAI Gym
- CNTK**
** I've chosen to implement the more complex algorithms using CNTK because very few of such implementations exist and it would force me to understand the little details.
[1] Reinforcement Learning: An Introduction by R. Sutton and A. Barto