- Deep Q Network Agent
- Double Deep Q Network Agent
- Prioritized Experience Replay
- Grad-Cam Visualization (Double DQN)
- Rainbox
- Policy Gradient
- Actor-Critic Algorithms
- Deployment (onnx, opencv4nodejs, nodejs, target FPS for the agent in browser: 50 fps)
Testing in 50 FPS. A higher scores' version in OneDrive.
Using Grad_CAM, an explainable/interpretable AI approach for deep learning, to examine whether the agent treats the game as human. See Grad_CAM Visualization for details.
- Create two directories manually or created by main.py automatically
mkdir result
mkdir weights
- Learning or Testing
python3 main.py -c config1
The most detailed experiments and explaination of Chrome dinosaur in Deep Reinforcement Learning on GitHub
Chome dinosaur is a game very suitable for beginners in deep reinforcement learning because of its easy rules and environment setting. Although the game is easy for human but it is difficult for computer agent to learning it. Through this project, We will not only show the result of baseline DQN, but also compare its results with double DQN, Rainbow, policy gradient and Actor-Critic Algorithms.
We have also implemented a real-time browser demo here. If you are not familar with Q-learing, you can visit a more fundamental project, Q-learning for Tic-Tac-Toe (GitHub Repo) and the real-time interactive streamlit demo.
The following is a detailed explaination of each approach and their environment setting in Chrome Dinosaur.
- No acceleration and no birds in the game for simplicity. If you want them, you can set the acceleration in game.py.
- Two actions only: up and nothing. There are three actions in the game actually, down to evade the birds if there is acceleration.
- Reward:
- Hit the obstacle: -1
- Otherwise: 0.1
- Using selenium in python to capture the images from the game.
- Using the version of Chrome Dinosaur in here:
- Paper: Playing Atari with Deep Reinforcement Learning
- Paper: Prioritized Experience Replay
- Paper: Self-Improving Reactive Agents Based On Reinforcement Learning, Planning and Teaching
- Paper: Human-level control through deep reinforcement learning
- Link: Original implementation of this baseline in Keras
- Reference code: TRAIN A MARIO-PLAYING RL AGENT from Pytorch Official
- Similar project's report: Chrome Dino Run using Reinforcement Learning
- All references in Baseline DQN
- Training GPU: Nvidia RTX 3080 (12GB)
- CPU:
- Memory: 64 GB (if using prioritize replay buffer, should use at least 45 GB RAM)
- Batch size: 32 (if too large, the overfitting will happen)
- Buffer size: 100,000
- Final epsilon: 0.1
- FPS:
- Slow mode: 14.xx - 18.xx fps (with prioritize replay buffer)
- Fast mode: 50 fps (without prioritizied replay buffer)
For the final epsilon, we believe that it should be the most important hyper-parameters to affect the learning process. We tried 0.03 and 0.01 and 0.0001 before but the agent is not stable. The scores achieved by the agents are very obsolete during learning and the agent in testing is totally garbage if the epsilon is too small. Giving more exploration to the agent in this game seems better. I tried to follow the hyparameter in this report first but the problem occurs in what I have mentioned before. The training score (epsilon = 0.0001) is shown in figure . The average and median score of this agent in testing for 20 episodes are 50.xx only.
Later, we tried the final epsilon = 0.1. Although the max score in learning is smaller than 1,000, the test score is very higher when we test the agent for 20 episodes after training 100 episodes each time.
The FPS in here refers to the number of frames the agent to predict the action per second instead of the FPS of the game rendered by javascript in browser.
Since the computation of prioritized replay buffer is much higher, our pc in this experiment can only achieve
Paper: Actor-Critic Algorithms
- Paper: Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
- Paper of Grad-CAM's demo
- The Grad-CAM package in python
Since the computation time of Grad-CAM is very long, every second can only produce one heat map only even though we use GPU to speed up. Therefore, it is hard to produce real-time Gram-CAM images during test. Instead of visualizing the heat maps in real time, we save the states during testing and produce the Grad-Cam heat maps after testing.
- Set cam_visualization in config1.py to be True. The states from the game will be save to the folder, test_states, for each testing episode.
"cam_visualization": True
- Create heat_maps in GIF
python3 visualize_CAM.py -c config1
You can choose which testing episode you want to generate the heat maps under visualize_CAM
with open('./test_states/dino_states7.pickle', 'rb') as f: # the 7-th test episode
The warmer area in the visualization is the area with higher weights, i.e. larger impact for these area's pixels to the final output of the agent. From the heat maps' visualization, it is very obvious that the area of the dinosaur and the obstacles always has higher impacts to the agent's action.