An implementation of the PPO algorithm written in Python using Pytorch. Recurrence is added to the ActorCritic network to train in the environment with partial observability where obs = [cos(theta), sin(theta)]. An ensemble of 5 critics is used to increase stability.
OpenAI's Gym is a framework for training reinforcement
learning agents. It provides a set of environments and a
standardized interface for interacting with those.
In this project, I used the Pendulum environment from gym.
-
Create the env
conda create a1 python=3.8
-
Activate the env
conda activate a1
-
install torch (steps from pytorch installation guide):
-
if you don't have an nvidia gpu or don't want to bother with cuda installation:
conda install pytorch torchvision torchaudio cpuonly -c pytorch
-
if you have an nvidia gpu and want to use it:
install cuda
install torch with cuda:
conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch
- other dependencies
conda install -c conda-forge matplotlib gym opencv pyglet
python3 -m pip install -r requirements.txt
On terminal, write:
python3 main.py