Skip to content

Latest commit

 

History

History
28 lines (19 loc) · 805 Bytes

README.md

File metadata and controls

28 lines (19 loc) · 805 Bytes

Asynchronous Methods for Deep Reinforcement Learning (A3C)


After training for 6 hours
  • A3C implementation in TensorFlow
  • There are two versions
    • A3C using threads
    • A3C using distributed tensorflow
  • Tested on "Pong-v0" which is a stochastic environment due to random frame skipping
  • Original paper: https://arxiv.org/abs/1602.01783

Tensorboard

Episode length

  • The time length of an episode increases as it starts to play neck and neck against the AI
  • After an hour, the trained agent begins to beat the AI easily

Episode reward

  • The maximum score possible is 21