Skip to content

Latest commit

 

History

History
20 lines (17 loc) · 1.32 KB

Value-Iteration.md

File metadata and controls

20 lines (17 loc) · 1.32 KB

VALUE-ITERATION

AIMA3e

function VALUE-ITERATION(mdp, ε) returns a utility function
inputs: mdp, an MDP with states S, actions A(s), transition model P(s′ | s, a),
      rewards R(s), discount γ
   ε, the maximum error allowed in the utility of any state
local variables: U, U′, vectors of utilities for states in S, initially zero
        δ, the maximum change in the utility of any state in an iteration

repeat
   UU′; δ ← 0
   for each state s in S do
     U′[s] ← R(s) + γ maxaA(s) Σ P(s′ | s, a) U[s′]
     if | U′[s] − U[s] | > δ then δ ← | U′[s] − U[s] |
until δ < ε(1 − γ)/γ
return U


Figure ?? The value iteration algorithm for calculating utilities of states. The termination condition is from Equation (??).