This is a project that trains a agent that can navigate in the collect bananas task. (details)
-
Download the environment from one of the links below. You need only select the environment that matches your operating system:
- Linux: click here
- Mac OSX: click here
- Windows (32-bit): click here
- Windows (64-bit): click here
-
Place the file in the
p1_navigation/
folder, and unzip (or decompress) the file. -
Packages
- torch==1.4.0
- unityagents==0.4.0
- numpy==1.18.1
- navigation.ipynb - training code
- We implement a Double-DQN algorithm which selects action and evaluates action using seperate networks. More details can be seen in model.py file.
- The network architecture is a 4 layers fully-connected network along with the ReLU activation function, the number of output units are 256, 128, 32, 4 respectively.
- Hyperparameters. The code is similar to the DQN excersise solution, except for the hyperparameter TAU, we set it from 1e-3 to 5e-3 for updating the target network much faster, because this Pick-Banana task need less episodes to convergence (up to 2000).
-
report.ipynb - Provide a description of the implementation and plot average reward curve that using the pretrained model weights.
-
model.py - network and agent using Double-DQN algorithm.