Pong with policy gradients

Train an agent to play Pong using OpenAI Gym and policy gradient.

How to use

env setup

wget http://www.atarimania.com/roms/Roms.rar
unrar x Roms.rar -y

conda create -n pong python=3.10.0
conda activate pong

ale-import-roms ROMS/
pip3 install -r requirements.txt

training

# Train in the background
nohup python3 train_pong.py > training.log 2>&1 &

# Use the disown to prevent the process from receiving a SIGHUP (hangup) signal if you close the terminal.
disown

check training process using tensorboard
```
tensorboard --logdir=tensorboard_logs/
```
play pong using the trained agent
```
python3 play_pong.py
```

Training results

6000 episodes were used for training. In the end, mean reward is approaching 0, which means that the trained agent is able to achieve a tie with the environment.

Training was done on CPU (Apple M1 Pro) only, and it took around 14 hours to train the simple 3 layer policy network (linear & relu) using 6000 episodes. Memory consumption

Play Pong before vs after training

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
resources		resources
src		src
tensorboard_logs		tensorboard_logs
.gitignore		.gitignore
README.md		README.md
pg_params.pth		pg_params.pth
play_pong.py		play_pong.py
requirements.txt		requirements.txt
train_pong.py		train_pong.py
training.log		training.log

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pong with policy gradients

How to use

Training results

About

Releases

Packages

Languages

yueying-teng/pong_with_policy_gradients

Folders and files

Latest commit

History

Repository files navigation

Pong with policy gradients

How to use

Training results

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages