Asynchronous Methods for Deep Reinforcement Learning (A3C)

After training for 6 hours

A3C implementation in TensorFlow
There are two versions
- A3C using threads
- A3C using distributed tensorflow
Tested on "Pong-v0" which is a stochastic environment due to random frame skipping
Original paper: https://arxiv.org/abs/1602.01783

Tensorboard

The time length of an episode increases as it starts to play neck and neck against the AI
After an hour, the trained agent begins to beat the AI easily

Episode reward

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
assets		assets
distributed_tf		distributed_tf
traditional_threads		traditional_threads
.gitignore		.gitignore
README.md		README.md