End of Asynchronous Methods
I found the current Atari wrapper I used is not fully compatible with the one in OpenAI baselines, resulting a dropped performance for most games (except for Pong). So I plan to do a major update to fix this issue. (To be more specific, OpenAI baselines track the return of the original episode which usually has more than one lives, however I track the return of the episode that only has one life)
Moreover, asynchronous methods are getting deprecated nowadays, so I will remove them and switch to A2C style algorithms in next version.
I made this tag in case someone may still want some old stuff.
To be more specific, following are implemented algorithms in this release:
- Deep Q-Learning (DQN)
- Double DQN
- Dueling DQN
- (Async) Advantage Actor Critic (A3C / A2C)
- Async One-Step Q-Learning
- Async One-Step Sarsa
- Async N-Step Q-Learning
- Continuous A3C
- Distributed Deep Deterministic Policy Gradient (Distributed DDPG, aka D3PG)
- Parallelized Proximal Policy Optimization (P3O, similar to DPPO)
- Action Conditional Video Prediction
- Categorical DQN (C51, Distributional DQN with KL Distance)
- Quantile Regression DQN (Distributional DQN with Wasserstein Distance)
- N-Step DQN (similar to A2C)
Most of them are compatible with both Python2 and Python3, however almost all the async methods can only work in Python2.