- Pytorch
- Unreal Engine 4
- Airsim Drone
- Gym
- Cherry-RL
This repository includes the following environments, each of them is composed of Sparse
and Dense
reward modes.
- CartPole-v1
- Montezuma Revenge
- VizDoom
- UE4 Airsim Maze Environment
- This environment was derived form
frasergeorgeking/UE4_BP_MazeGen_MIT
which is a free and open source maze generator with various themes for Unreal Engine. TheAirsim Drone
package was attached to it and it is available for modification and download inTroddenSpade/UE4-Airsim-Maze-Environment
- This environment was derived form
First-person | Third-person | Top-down |
---|---|---|
- Advantage Actor Critic (A2C)
- Proximal Policy Optimization (PPO)
- Intrinsic Curiosity Module (ICM)
- Random Network Distillation (RND)
- Universal Value Function Approximators (UVFA)
- Never Give Up (NGU)
The CartPole Environments has been modified, and its time-based reward is supplanted by a sparse reward system that only returns the last reward of each episode.
The goal is to acquire Montezuma’s treasure by making a way through a maze of chambers within the emperor’s fortress. Player must avoid deadly creatures while collecting valuables and tools which can help them escape with the treasure.
VizDoom is a Doom-based AI Research Platform for Reinforcement Learning from Raw Visual Information.
This map is a rectangle with walls, ceiling and floor. Player is spawned along the longer wall, in the center. A circular monster is spawned randomly somewhere along the opposite wall. Player can only go left/right and shoot. One hit is enough to kill the monster and the episode finishes when the monster is killed or on timeout.
basic-vd.mp4
This map is a large circular environment. Player is spawned in the exact center. 5 melee-only, monsters are spawned along the wall. Monsters are killed after a single shot. After dying each monster is respawned after some time. Episode ends when the player dies.
This map is a corridor with shooting monsters on both sides (6 monsters in total). A green vest is placed at the oposite end of the corridor. Reward is proportional (negative or positive) to change of the distance between the player and the vest. If player ignores monsters on the sides and runs straight for the vest he will be killed somewhere along the way.
The goal is to explore through a labyrinth and find the terminal square. Along the way, the agent should avoid colliding with walls; otherwise, the environment will reset the episode, and a -1 reward will be given.
thirdp.mp4
firstp.mp4
- Pathak, D., Agrawal, P., Efros, A. A. & Darrell, T. Curiosity-driven Exploration by Self-supervised Prediction. 34th Int. Conf. Mach. Learn. ICML 2017 6, 4261–4270 (2017).
- Burda, Y., Edwards, H., Storkey, A. & Klimov, O. Exploration by Random Network Distillation. 7th Int. Conf. Learn. Represent. ICLR 2019 1–17 (2018).
- Burda, Y., Storkey, A., Darrell, T. & Efros, A. A. Large-scale study of curiosity-driven learning. 7th Int. Conf. Learn. Represent. ICLR 2019 (2019).
- Badia, A. P. et al. Never Give Up: Learning Directed Exploration Strategies. 1–28 (2020).