Skip to content

Latest commit

 

History

History
51 lines (48 loc) · 1.8 KB

README.md

File metadata and controls

51 lines (48 loc) · 1.8 KB

Kioku

DQN DQN CartPole episode rewards

Using:

  • DQN
  • Double q networks
  • Epsilon decay

Masters cartpole after only 92 episodes?

DQN CartPole gif
A2C A2C CartPole episode rewards

Using:

  • A2C
  • GAE
  • N-Step Returns (with GAE)

Note that A2C is much less sample efficient than DQN and the SOTA (PPO, TD3, SAC).

Lunar Lander:

A2C Lunar Lander episode rewards A2C Lunar Lander gif
PPO PPO CartPole episode rewards

...did I just successfully implement PPO?

Using:

  • PPO (basically A2C with a few extra steps)
  • GAE
  • N-Step Returns (with GAE)
  • Mini-batch learning
  • Multiple learning iterations per batch

As sample efficient as DQN? (minus the memory)

PPO CartPole gif

Lunar Lander:

PPO Lunar Lander episode rewards PPO Lunar Lander gif