This repository contains code that train an agent to solve the environment proposed in the Value-based methods section of the Udacity Deep Reinforcement Learning (DRL) course.
The environment consists of a single agent that has to pick up yellow bananas while avoiding blue bananas. A reward of +1 is provided for collecting yellow bananas, and a reward of -1 for blue ones.
The state space has 37 dimensions that describe the agent's speed and object perception around the agent's forward direction. Given this information, the agent has to select one of 4 discrete actions: move forward, move backward, turn left or turn right.
The task is episodic, and in order to solve the environment, the agent must get an average score of +13 over 100 consecutive episodes.
Unity doesn't need to be installed since the environment is already available. It can be downloaded from the following links:
When executing the training script, this path should be referenced with the --env-path argument.
The project uses Python 3.6 and relies on the Udacity Value Based Methods repository. This repository should be cloned, and the instructions on the README should be followed to install the necessary dependencies.
The repository contains 2 scripts under the navigation package: and
The script can be used to train the agent. It accepts the following arguments:
- env-path: path pointing to the Unity Bananas environment
- weights-path: path where the agent's NN weights will be stored
- episodes: number of episodes the agent should be trained for
- time-steps-per-episode: timesteps per episode
- eps-start: starting value for epsilon
- eps-end: minimum value for epsilon
- eps-decay: decay factor for epsilon
- gamma: discount rate
- learning-rate: agent's NN learning rate
- batch-size: size of the agent's experience replay buffer
python --env-path /home/carlos/cursos/udacity_rl_2023/repos/deep-reinforcement-learning/p1_navigation/Banana_Linux/Banana.x86_64
--weights-path /home/carlos/cursos/udacity_rl_2023/projects/drl_p1_navigation/weights/agent_weights.pth
-- episodes 600
A trained agent can be used to play! To do so, the script can be used, providing the Unity environment and the agent's weights paths:
python --env-path /home/carlos/cursos/udacity_rl_2023/repos/deep-reinforcement-learning/p1_navigation/Banana_Linux/Banana.x86_64
--weights-path /home/carlos/cursos/udacity_rl_2023/projects/drl_p1_navigation/weights/agent_weights.pth