R package markovDP - Infrastructure for Discrete-Time Markov Decision Processes (MDP)

Introduction

A Markov decision process (MDP) (Bellman 1957; Howard 1960) is a discrete-time stochastic control process. In each time step, an agent can perform actions which affect the system (i.e., may cause the system state to change). The agent’s goal is to maximize its expected future rewards that depend on the sequence of system state and the agent’s actions in the future. Solving the MDP means finding the optimal (or at least a good) policy that guides the agent’s actions.

The markovDP package provides the infrastructure to work with MDPs in R. The focus is on convenience in formulating MDPs with small to medium sized state spaces in multiple ways, the support of sparse representations (using sparse matrices, lists and data.frames) and visualization of results. Some key components are implemented in C++ to speed up computation. The package provides to the following popular tabular methods:

Dynamic Programming
- Value Iteration (Bellman 1957)
- Modified Policy Iteration (Howard 1960; Puterman and Shin 1978)
- Prioritized Sweeping (Moore and Atkeson 1993; Andre, Friedman, and Parr 1997; Li and Littman 2008)
Linear Programming
- Primal Formulation (Manne 1960)
Monte Carlo Control
- Monte Carlo Control with Exploring Starts (Sutton and Barto 2018)
- On-policy Monte Carlo Control (Sutton and Barto 2018)
- Off-policy Monte Carlo Control (Sutton and Barto 2018)
Termporal Differencing
- Q-Learning (Watkins and Dayan 1992)
- Sarsa (Rummery and Niranjan 1994)
- Expected Sarsa (Sutton and Barto 2018)
Sampling
- Random-sample one-step tabular Q-planning (Sutton and Barto 2018)

These implementations follow the description is (Russell and Norvig 2020) and (Sutton and Barto 2018). The implementations represent the state space explicitly, so only problems with small to medium state spaces can be used.

Partially observable Markov Decision Problems (POMDPs) can me modeled in a similar fashion using package pomdp (Hahsler 2024).

To cite package ‘markovDP’ in publications use:

Hahsler M (2024). markovDP: Infrastructure for Discrete-Time Markov Decision Processes (MDP). R package version 0.99.0, https://github.com/mhahsler/markovDP.

@Manual{,
  title = {markovDP: Infrastructure for Discrete-Time Markov Decision Processes (MDP)},
  author = {Michael Hahsler},
  year = {2024},
  note = {R package version 0.99.0},
  url = {https://github.com/mhahsler/markovDP},
}

Installation

Current development version: Install from r-universe.

install.packages("markovDP",
    repos = c("https://mhahsler.r-universe.dev",
              "https://cloud.r-project.org/"))

Usage

Solving the simple maze from (Russell and Norvig 2020).

library("markovDP")
data("Maze")
Maze

## MDP, list - Stuart Russell's 3x4 Maze
##   Discount factor: 1
##   Horizon: Inf epochs
##   Size: 11 states / 4 actions
##   Start: s(3,1)
## 
##   List components: 'name', 'discount', 'horizon', 'states', 'actions',
##     'start', 'transition_prob', 'reward', 'info', 'absorbing_states',
##     'unreachable_states'

The maze is a gridworld and can be displayed directly.

gw_plot(Maze, state = TRUE)

sol <- solve_MDP(model = Maze)
sol

## MDP, list - Stuart Russell's 3x4 Maze
##   Discount factor: 1
##   Horizon: Inf epochs
##   Size: 11 states / 4 actions
##   Start: s(3,1)
##   Solved:
##     Method: 'value_iteration'
##     Solution converged: TRUE
## 
##   List components: 'name', 'discount', 'horizon', 'states', 'actions',
##     'start', 'transition_prob', 'reward', 'info', 'absorbing_states',
##     'unreachable_states', 'solution'

Display the value function.

plot_value_function(sol)

The state values can be shown in the gridworld as colored map.

gw_plot(sol)

Acknowledgments

Development of this package was supported in part by National Institute of Standards and Technology (NIST) under grant number 60NANB17D180.

References

Andre, David, Nir Friedman, and Ronald Parr. 1997. “Generalized Prioritized Sweeping.” In Advances in Neural Information Processing Systems, edited by M. Jordan, M. Kearns, and S. Solla. Vol. 10. MIT Press. https://proceedings.neurips.cc/paper_files/paper/1997/file/7b5b23f4aadf9513306bcd59afb6e4c9-Paper.pdf.

Bellman, Richard. 1957. “A Markovian Decision Process.” Indiana University Mathematics Journal 6: 679–84. https://www.jstor.org/stable/24900506.

Hahsler, Michael. 2024. Pomdp: Infrastructure for Partially Observable Markov Decision Processes (POMDP). https://doi.org/10.32614/CRAN.package.pomdp.

Howard, R. A. 1960. Dynamic Programming and Markov Processes. Cambridge, MA: MIT Press.

Li, Lihong, and Michael Littman. 2008. “Prioritized Sweeping Converges to the Optimal Value Function.” DCS-TR-631. Rutgers University. https://doi.org/10.7282/T3TX3JSX.

Manne, Alan. 1960. “On the Job-Shop Scheduling Problem.” Operations Research 8 (2): 219–23. https://doi.org/10.1287/opre.8.2.219.

Moore, Andrew, and C. G. Atkeson. 1993. “Prioritized Sweeping: Reinforcement Learning with Less Data and Less Real Time.” Machine Learning 13 (1): 103–30. https://doi.org/10.1007/BF00993104.

Puterman, Martin L., and Moon Chirl Shin. 1978. “Modified Policy Iteration Algorithms for Discounted Markov Decision Problems.” Management Science 24: 1127–37. https://doi.org/10.1287/mnsc.24.11.1127.

Rummery, G., and Mahesan Niranjan. 1994. “On-Line Q-Learning Using Connectionist Systems.” Techreport CUED/F-INFENG/TR 166. Cambridge University Engineering Department.

Russell, Stuart J., and Peter Norvig. 2020. Artificial Intelligence: A Modern Approach (4th Edition). Pearson. http://aima.cs.berkeley.edu/.

Sutton, Richard S., and Andrew G. Barto. 2018. Reinforcement Learning: An Introduction. Second. The MIT Press. http://incompleteideas.net/book/the-book-2nd.html.

Watkins, Christopher J. C. H., and Peter Dayan. 1992. “Q-Learning.” Machine Learning 8 (3): 279–92. https://doi.org/10.1007/BF00992698.

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
R		R
data		data
inst		inst
man		man
src		src
tests		tests
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
NAMESPACE		NAMESPACE
NEWS.md		NEWS.md
README.Rmd		README.Rmd
README.md		README.md
markovDP.Rproj		markovDP.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

R package markovDP - Infrastructure for Discrete-Time Markov Decision Processes (MDP)

Introduction

Installation

Usage

Acknowledgments

References

About

Releases

Packages

Languages

mhahsler/markovDP

Folders and files

Latest commit

History

Repository files navigation

R package markovDP - Infrastructure for Discrete-Time Markov Decision Processes (MDP)

Introduction

Installation

Usage

Acknowledgments

References

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages