This project was completed as part of the Udacity Deep Reinforcement Learning Nanodegree.
Code within this project is written from scratch (with some inspiration and tips taken from previous DQN homework in the Udacity program).
The goal is to train an agent through Q-Learning (specifically through a Deep Q Network [1]) that is capable of navigating a world containing yellow and blue bananas. Yellow bananas result in a reward of +1, and blue bananas -1. The task is episodic, with the episode ending after a fixed amount of time. An agent that "solves" the environment is defined as achieving a reward of +13 or greater on average over 100 episodes.
The environment is provided by Udacity, and based on Unity's ML-Agents framework.
Observations are delivered in the form of 37-dimensional floating point vectors, and include measurements such as the velocity of the agent. The actions available to the agent are 0
(walk forwards), 1
(walk backwards), 2
(turn left), and 3
(turn right).
Based on the Udacity setup (see here), Conda/virtualenv can be used to install the required dependencies. For example:
virtualenv --python=python3 venv
source venv/bin/activate
pip install -r requirements.txt
To get the environment executable, follow the Udacity instructions listed here.
This contains the core code for training the DQN agent.
A pre-trained model that can be loaded into an Agent.
This contains an example of how to train an agent on the environment and visualize it after training.
Script to launch Jupyter with the dqn.py
module on PYTHONPATH
.
The dqn.py
modules provides classes/helper functions to train an agent. Take a look at Report.ipynb
for an example of how to do this.
- Mnih, Volodymyr, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin A. Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen. King, Dharshan Kumaran, Daan Wierstra, Shane Legg and Demis Hassabis. βHuman-level control through deep reinforcement learning.β Nature 518 (2015): 529-533.