Q-learning project where an agent learns by himself to find the exit inside a maze. The project is implemented as a level-based game.
- 🔧 Installation
- 📖 Project Description
- ⚙️ Implementation
- 📁 Project_Structure
- 🖥️ Project_Display
- 🏆 Conclusion
Dependencies described in the requirements.txt
file were used for the project.
It is advisable to install everything on a virtual environment.
After installing the dependencies you can start the project with the command:
python3 main.py
Design and implementation of a reinforcement learning environment, for training an agent using a Q-Learning algorithm in the framework of AI-Gym.
The purpose of the project is to show that the agent, through reinforcement learning, can learn to move in the labyrinth without bumping on the walls.
A mxn grid of cells where each cell is either empty (white) or a wall (coloured), where the agent is located in an upper left position (e.g., cell 2,2 in figure) and the exit is in the lower right position with a certain percentage of random walls cells representing the labyrinth.
The agent's available actions are the four movement actions: Up, Down, Left, Right.
The percept state returned from the environment is a representation of the Moore 8-neighborhood centered on the agent's position.
Each movement action has a reward of -1, bumping toward a wall has a reward of -5, and reaching the final exit position has a reward of 10.
The agent reaches the exit cell or maxK actions are executed.
The implementation of the environment has:
- m, n dimensions of the grid.
- Percentage of walls.
- maxK maximum number of actions before the end.
The system has:
- Run and train the Qlearning reinforcement learning algorithm.
- Generating and trying different labyrinths during training, showing the evolution of the accumulated reward.
- Saving the Q(State, Action) matrix, Loading a saved Q matrix.
- Executing the agent step-by-step on a given labyrinth, showing the reward.
Contains the main function called main_menu()
.
Provides a series of selectable menus that lead the user to select a different layout of the map and walls based on the chosen difficulty.
Contains classes that graphically "draw" the user's various movements and the map via the turtle library.
Represents the map and all locations of both the agent and the environment. Transforms the selected matrix into a binary sequence and saves it to the file labyrinth
.
A dataset of maps that the user can select from the various menus.
Manages the Q matrix: saving it to an appropriate file and loading it into memory, managing the training and execution phase.
Contains the sprites used for the movement of the character in the graphics part.
Two parts are visible in this project, from terminal or graphics.
This mode shows the matrix in the terminal. The elements that make up this type of representation are:
- x: indicates the wall
- empty space: is traversable by the agent
- p: indicates the agent
- u: indicates the exit of the maze
This type of implementation is shown on the screen a turtle window with inside a grid representing the maze.
It represents entry of the labyrinth.
It represents exit of the labyrinth.
It represents agent in the labyrinth.
It represents the wall of the labyrinth.
As seen above, the maze is chosen from the dataset of maps made available. Thus, it was possible to both manage the size of the maps and the amount of wall inside, while still ensuring that the user could select the choice he or she felt was most appropriate. Another important advantage was that during the creation static maps there was no recourse to checking for the existence of a possible solution, because precisely the maps always provided a path leading to the exit of the maze. We can conclude as can be seen from the results that the Qlearning algorithm is able to lead the agent to the exit in the first 2 levels, while in the third level, due to of the choice of algorithm parameters that fail to make it perform on the dimensions of the matrix. By going to increase the number of epochs and steps we still go to find the solution.
Enjoy 2F