-
Advantage Actor critic [1]
-
Parallel Advantage Actor critic [2]
-
Noisy Networks for Exploration [3]
-
Proximal Policy Optimization Algorithms [4]
-
Curiosity-driven Exploration by Self-supervised Prediction [5] (WIP)
- python3.6
- gym-super-mario-bros
- OpenCV Python
- PyTorch
- tensorboardX
Modify the parameters in mario_a2c.py
as you like.
python3 mario_a2c.py
or
python3 mario_ppo.py
Modify the is_load_model
, is_render
parameters in mario_a2c.py
as you like.
python3 mario_a2c.py
or
python3 mario_ppo.py
It use just ICM and no ext reward.(Curiosity-driven)
[1] Actor-Critic Algorithms
[2] Efficient Parallel Methods for Deep Reinforcement Learning
[3] Noisy Networks for Exploration
[4] Proximal Policy Optimization Algorithms
[5] Curiosity-driven Exploration by Self-supervised Prediction