noisy-navigation

A navigation task with noisy movements is a type of decision-theoretic planning problem where the aim is to find the best action in each system state. Such problems can be modeled as Markov decision processes and solved with dynamic programming algorithms and reinforcement learning.
Here a grid navigation task is solved with value iteration and q-learning to compare both the accuracy and the convergence speed of the solutions given by the two approaches.

Grid

The navigation grid used in this project is shown below:

Reaching the grid cells marked with +1 and -1 have the respective rewards, while the rest cells have reward 0. The gray cell is inaccessible and all move actions in the top right cell cause the agent to be teleported to the cell marked as 'start'.

The noise in the movement actions is shown below:

There are four possible moves: North, West, South and East. When attempting a movement action, there is a 10% chance that the agent ends up in either one of the directions that are perpendicular to the intended one.

Value iteration

In the value iteration algorithm all actions are iterated for each state to find the best action and update the state values. The state value is a combination of the immediate reward as well as the discounted long-term reward associated with the state. The algorithm returns the best navigation policy, given by the actions yielding the maximum value.

Q-learning

In contrast to value iteration, the state transition probabilities and the state rewards are not known a priori in Q-learning. The algorithm begins from the 'start' state, chooses an action to take and then updates the state-action Q-value. The chosen action is decided with the so called Multi-Armed Bandit rule which optimally balances between improving the accuracy of the Q-values and taking the action with the highest Q-value. The algorithm returns the state-action Q-values and the navigation policy can be extracted by choosing the action with the highest Q-value in each state.

Requirements

The following python packages are required to run the solver:

numpy 1.13.3
matplotlib 2.1.0

Running the solver

The grid navigation task can be solved with the two algorithms by:

$ python3 main.py

Q-learning is run for 100000 iterations by default. The number of iterations can be changed with the -i argument:

$ python3 main.py -i 150000

The accuracy and convergence rate of the algorithms can be plotted with the -p argument:

$ python3 main.py -p

Todos

Convergence criterion for Q-learning

References

Decision-Theoretic Planning: Structural Assumptions and Computational Leverage,
Boutilier C., Dean T. and Hanks S., Journal of Artificial Intelligence Research, 1999, vol. 11
Learning from Delayed Rewards,
Watkins, C. J. C. H., Ph.D. Thesis, Cambridge University, 1989
Algorithms for the Multi-Armed Bandit Problem,
Kuleshov, V. and Precup, D., Journal of Machine Learning Research, 2000, vol. 1
CE-E4800 Artificial Intelligence,
Aalto University

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
img		img
.gitignore		.gitignore
README.md		README.md
cell.py		cell.py
main.py		main.py
q_learning.py		q_learning.py
solver.py		solver.py
value_iteration.py		value_iteration.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

noisy-navigation

Grid

Value iteration

Q-learning

Requirements

Running the solver

Todos

References

About

Releases

Packages

Languages

mtdo/noisy-navigation

Folders and files

Latest commit

History

Repository files navigation

noisy-navigation

Grid

Value iteration

Q-learning

Requirements

Running the solver

Todos

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages