Name		Name	Last commit message	Last commit date
parent directory ..
media		media
README.md		README.md
dqn_agent.py		dqn_agent.py
env_test.ipynb		env_test.ipynb
model.py		model.py
replay_memory.py		replay_memory.py
test.ipynb		test.ipynb
train.ipynb		train.ipynb

README.md

Unity Environment: Basic

Index

Environment
Algorithm
Getting Started
Training
Watch trained Agent
Reference

Environment

A linear movement task where the agent must move left or right to rewarding states. The goal is to move to the most reward state. The environment contains one agent. Benchmark Mean Reward: 0.93

Reward Function

-0.01 at each step
+0.1 for arriving at the suboptimal state.
+1.0 for arriving at the optimal state.

Behavior Parameters

Vector Observation space: One variable corresponding to the current state.
Vector Action space: (Discrete) Two possible actions (Move left, move right).
Visual Observations: None

Algorithm

We'll be using DQN to solve the environment.

DQN

DQN is an extension of Q Learning and is an RL algorithm intended for tasks in which an agent learns an optimal (or near-optimal) behavioral policy by interacting with an environment. Via sequences of state (s) , actions (a), rewards (r) and next_state(s') it employs a (deep) neural network to learn to approximate the state-action function (also known as a Q-function) that maximizes the agent’s future (expected) cumulative reward. When using a nonlinear function approximator, like a neural network, reinforcement learning tends to be unstable. So we'll be adding some features to DQN.

Replay Buffer

The inclusion of a replay memory that stores experiences, e_t = (s_t, a_t, r_t, s_t+1), at each time step, t, in a data set D_t = {e₁, e₂, … e_t}. During learning, mini-batches of experiences, U(D), are randomly sampled to update (i.e., train) the Q-network. The random sampling of the pooled set of experiences removes correlations between observed experiences, thereby smoothing over changes in the distribution of experiences to avoid the agent getting stuck in local minimum or oscillating or diverging from the optimal policy.

Target Network

This target network is only updated periodically, in contrast to the action-value Q-network that is updated at each time step. We'll be not be updating it periodically instead we'll be doing a soft update on the target network. It provides stability to the target network.

DQN with Experience Replay

Getting Started

Files

dqn_agent.py: Implementation of a DQN-Agent. Link
replay_memory.py: Implementation of a DQN-Agent's replay buffer (memory). Link
model.py: Implementation of neural network for vector-based DQN learning using PyTorch. Link
env_test.ipynb: Test if the environment is properly set up or not. Link
train.ipynb: Train DQN Agent on the environment. Link
test.ipynb: Test DQN-agent using a trained model and visualize it. Link

How to Train

Step 1: First, get started with env_test.ipynb to verify that the environment is properly set up.
Step 2: Train DQN Agent using train.ipynb.
Step 3: Test DQN-Agent and visualize using test.ipynb at different checkpoints. You can also skip Step 2 and download the pre-trained model from here and place it inside the basic folder.

Training

Neural Network

Because the agent learnings from vector data (not pixel data), the Q-Network (local and target network) employed here consisted of just 2 hidden, fully connected layers with 64 nodes. The size of the input layer was equal to the dimension of the state size (i.e., 20 nodes) and the size of the output layer was equal to the dimension of the action size (i.e., 3).

Hyperparameters

STATE_SIZE: 20 (Dimension of each state)
ACTION_SIZE: 3 (Two possible actions (Move left, move right))
BUFFER_SIZE: 1e5 (Replay buffer size)
BATCH_SIZE: 32 (Mini-batch size)
GAMMA: 0.99 (Discount factor)
LR: 1e-3 (Learning rate)
Tau: 1e-2 (Soft-parameter update)
UPDATE_EVERY: 5 (How often to update the network)

Training Parameters

BENCHMARK_REWARD = 0.93
epsilon (float): 1.0
epsilon_min: 0.01
epsilon_decay: 200
SCORES_AVERAGE_WINDOW = 100
NUM_EPISODES = 2000

Performance

Training	Testing

Watch Trained Agent

Reference

Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., ... & Petersen, S. (2015). Human-Level control through deep reinforcement learning. Nature, 518(7540), 529.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

01_basic

01_basic

README.md

Unity Environment: Basic

Index

Environment

Reward Function

Behavior Parameters

Algorithm

DQN

Replay Buffer

Target Network

DQN with Experience Replay

Getting Started

Files

How to Train

Training

Neural Network

Hyperparameters

Training Parameters

Performance

Watch Trained Agent

Reference

Files

01_basic

Directory actions

More options

Directory actions

More options

Latest commit

History

01_basic

Folders and files

parent directory

README.md

Unity Environment: Basic

Index

Environment

Reward Function

Behavior Parameters

Algorithm

DQN

Replay Buffer

Target Network

DQN with Experience Replay

Getting Started

Files

How to Train

Training

Neural Network

Hyperparameters

Training Parameters

Performance

Watch Trained Agent

Reference