Deep Learning Grid World Q-Learning

Overview

This repository contains an implementation of a Q-learning algorithm to solve a grid world environment using deep learning techniques. The environment consists of a 5x5 grid with obstacles, rewards, and a goal state. The agent learns to navigate this grid to maximize its cumulative reward using Q-learning.

Files

deep_learning_grid_world_q_learning.py: Contains the main implementation of the Q-learning algorithm, including:
- create_grid_world(ax): Function to create and visualize the grid world.
- epsilon_greedy(Q, state, epsilon): Function to select an action using the epsilon-greedy policy.
- step(Q, state, action, alpha, gamma): Function to perform a step in the environment and update Q-values.
- q_learning_agent(alpha_values, num_episodes): Function to train the Q-learning agent with different alpha values.
- visualize_q_values(Q): Function to visualize the learned Q-values.

Usage

Run the Deep Learning Q-learning Agent

Execute the script to train the Q-learning agent with different learning rates (alpha_values). The training process includes visualization of the agent's movement in the grid world and updates to the Q-values.
Visualize Q-values

After training, the Q-values are visualized to show the learned state values.

Explanation

Grid World

The grid world consists of a 5x5 grid with:

Obstacles: Cells that are blocked and cannot be traversed.
Rewards: Cells that provide rewards (+5 or +10).
Goal: The cell at (5, 5) where the agent receives a reward of +10 and the episode terminates.

Deep Q-learning Algorithm

Epsilon-Greedy Policy: Balances exploration and exploitation.
Learning Rate (Alpha): Controls the rate at which the Q-values are updated.
Discount Factor (Gamma): Determines the importance of future rewards.

Visualization

Grid World: The grid is displayed with obstacles, rewards, and the agent's path.
Q-values: Visualized as a heatmap to show the learned state values.

Screenshots

Output Video

Github.mp4

Notes

The script includes an early stopping condition if the average reward exceeds a threshold over a window of episodes.
The agent's progress is visualized in real-time during training.

Contributing

Tashfeen Abbasi (abbasitashfeen7@gmail.com)
Laiba Mazhar (laibamazhar.000@gmail.com)

Feel free to fork the repository and submit pull requests. For issues or feature requests, please open an issue on GitHub.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
.github/workflows		.github/workflows
Q-Learning.pdf		Q-Learning.pdf
README.md		README.md
deep_learning_grid_world_q_learning.py		deep_learning_grid_world_q_learning.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deep Learning Grid World Q-Learning

Overview

Files

Usage

Explanation

Grid World

Deep Q-learning Algorithm

Visualization

Screenshots

Output Video

Notes

Contributing

About

Contributors 2

Languages

tashi-2004/Deep-Learning-Grid-World-Q-Learning

Folders and files

Latest commit

History

Repository files navigation

Deep Learning Grid World Q-Learning

Overview

Files

Usage

Explanation

Grid World

Deep Q-learning Algorithm

Visualization

Screenshots

Output Video

Notes

Contributing

About

Topics

Resources

Stars

Watchers

Forks

Contributors 2

Languages