VALUE-ITERATION

AIMA3e

function VALUE-ITERATION(mdp, ε) returns a utility function
inputs: mdp, an MDP with states S, actions A(s), transition model P(s′ | s, a),
rewards R(s), discount γ
ε, the maximum error allowed in the utility of any state
local variables: U, U′, vectors of utilities for states in S, initially zero
δ, the maximum change in the utility of any state in an iteration

repeat
U ← U′; δ ← 0
for each state s in S do
U′[s] ← R(s) + γ max_{a ∈ A(s)} Σ P(s′ | s, a) U[s′]
if | U′[s] − U[s] | > δ then δ ← | U′[s] − U[s] |
until δ < ε(1 − γ)/γ
return U

Figure ?? The value iteration algorithm for calculating utilities of states. The termination condition is from Equation (??).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Value-Iteration.md

Value-Iteration.md

VALUE-ITERATION

AIMA3e

Files

Value-Iteration.md

Latest commit

History

Value-Iteration.md

File metadata and controls

VALUE-ITERATION

AIMA3e