Release End of Asynchronous Methods · ShangtongZhang/DeepRL

I found the current Atari wrapper I used is not fully compatible with the one in OpenAI baselines, resulting a dropped performance for most games (except for Pong). So I plan to do a major update to fix this issue. (To be more specific, OpenAI baselines track the return of the original episode which usually has more than one lives, however I track the return of the episode that only has one life)

Moreover, asynchronous methods are getting deprecated nowadays, so I will remove them and switch to A2C style algorithms in next version.

I made this tag in case someone may still want some old stuff.

To be more specific, following are implemented algorithms in this release:

Deep Q-Learning (DQN)
Double DQN
Dueling DQN
(Async) Advantage Actor Critic (A3C / A2C)
Async One-Step Q-Learning
Async One-Step Sarsa
Async N-Step Q-Learning
Continuous A3C
Distributed Deep Deterministic Policy Gradient (Distributed DDPG, aka D3PG)
Parallelized Proximal Policy Optimization (P3O, similar to DPPO)
Action Conditional Video Prediction
Categorical DQN (C51, Distributional DQN with KL Distance)
Quantile Regression DQN (Distributional DQN with Wasserstein Distance)
N-Step DQN (similar to A2C)

Most of them are compatible with both Python2 and Python3, however almost all the async methods can only work in Python2.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

End of Asynchronous Methods