Clean, Robust, and Unified PyTorch implementation of popular DRL Algorithms
This repository uses the following python dependencies unless explicitly stated:
gymnasium==0.29.1
numpy==1.26.1
pytorch==2.1.0
python==3.11.5
Enter the folder of the algorithm that you want to use, and run the main.py to train from scratch:
python main.py
For more details, please check the README.md file in the corresponding algorithm folder.
- 1.Q-learning
- 2.1DQN/DDQN on Classic Control
- 2.2DQN/DDQN on Atari Game
- 2.3Prioritized Experience Replay(PER) DQN/DDQN on Classic Control
- 3.1Proximal Policy Optimization(PPO) for Discrete Action Space
- 3.2Proximal Policy Optimization(PPO) for Continuous Action Space
- 4.1Deep Deternimistic Policy Gradient(DDPG)
- 4.2Twin Delayed Deep Deterministic Policy Gradient(TD3)
- 5.1Soft Actor Critic(SAC) for Discrete Action Space
- 5.2Soft Actor Critic(SAC) for Continuous Action Space
- 6.Actor-Sharer-Learner(ASL)
- Isaac Gym (NVIDIA’s physics simulation environment; GPU accelerated; Superfast):
- Sparrow (Light Weight Simulator for Mobile Robot; DRL friendly):
- ROS (Popular & Comprehensive physical simulator for robots; Heavy and Slow):
- Webots (Popular physical simulator for robots; Faster than ROS; Less realistic):
- Envpool (Fast Vectorized Env)
- Other Popular Envs
- 《Reinforcement learning: An introduction》--Richard S. Sutton
- 《深度学习入门:基于Python的理论与实现》--斋藤康毅
- RL Courses(bilibili)--李宏毅(Hongyi Li)
- RL Courses(Youtube)--李宏毅(Hongyi Li)
- UCL Course on RL--David Silver
- 动手强化学习--上海交通大学
- OpenAI Spinning Up
- Policy Gradient Theorem --Cangxi
- Policy Gradient Algorithms --Lilian
- Theorem of PPO
- The 37 Implementation Details of Proximal Policy Optimization
- Prioritized Experience Replay
- Soft Actor Critic
- A (Long) Peek into Reinforcement Learning --Lilian
- Introduction to TD3
Pong | Enduro |
---|---|
CartPole | LunarLander |
---|---|
Pendulum | LunarLanderContinuous |
---|---|