Skip to content
/ kioku Public

"memory; recollection; remembrance"—my RL experiments...

License

Notifications You must be signed in to change notification settings

smnast/kioku

Repository files navigation

Kioku

DQN DQN CartPole episode rewards

Using:

  • DQN
  • Double q networks
  • Epsilon decay

Masters cartpole after only 92 episodes?

DQN CartPole gif
A2C A2C CartPole episode rewards

Using:

  • A2C
  • GAE
  • N-Step Returns (with GAE)

Note that A2C is much less sample efficient than DQN and the SOTA (PPO, TD3, SAC).

Lunar Lander:

A2C Lunar Lander episode rewards A2C Lunar Lander gif
PPO PPO CartPole episode rewards

...did I just successfully implement PPO?

Using:

  • PPO (basically A2C with a few extra steps)
  • GAE
  • N-Step Returns (with GAE)
  • Mini-batch learning
  • Multiple learning iterations per batch

As sample efficient as DQN? (minus the memory)

PPO CartPole gif

Lunar Lander:

PPO Lunar Lander episode rewards PPO Lunar Lander gif

About

"memory; recollection; remembrance"—my RL experiments...

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages