GitHub - Ezgii/PPO-on-pendulum-extended: Training a PPO to balance a pendulum in a partially observable environment.

Project description

An implementation of the PPO algorithm written in Python using Pytorch. Recurrence is added to the ActorCritic network to train in the environment with partial observability where obs = [cos(theta), sin(theta)]. An ensemble of 5 critics is used to increase stability.

Pseudo code:

Environment

OpenAI's Gym is a framework for training reinforcement learning agents. It provides a set of environments and a standardized interface for interacting with those.
In this project, I used the Pendulum environment from gym.

Installation

Using conda (recommended)

Install Anaconda
Create the env
conda create a1 python=3.8
Activate the env
conda activate a1
install torch (steps from pytorch installation guide):

if you don't have an nvidia gpu or don't want to bother with cuda installation:
conda install pytorch torchvision torchaudio cpuonly -c pytorch
if you have an nvidia gpu and want to use it:
install cuda
install torch with cuda:
conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch

other dependencies
conda install -c conda-forge matplotlib gym opencv pyglet

Using pip

python3 -m pip install -r requirements.txt

How to run the code

On terminal, write:

python3 main.py

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
a3_gym_env		a3_gym_env
results		results
README.md		README.md
main.py		main.py
pseudocode.png		pseudocode.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project description

Pseudo code:

Environment

Installation

Using conda (recommended)

Using pip

How to run the code

Results

Loss functions and learning curve:

Testing Angle vs Time:

About

Releases

Packages

Languages

Ezgii/PPO-on-pendulum-extended

Folders and files

Latest commit

History

Repository files navigation

Project description

Pseudo code:

Environment

Installation

Using conda (recommended)

Using pip

How to run the code

Results

Loss functions and learning curve:

Testing Angle vs Time:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages