Skip to content

Latest commit

 

History

History
20 lines (17 loc) · 1.33 KB

POMDP-Value-Iteration.md

File metadata and controls

20 lines (17 loc) · 1.33 KB

POMDP-VALUE-ITERATION

AIMA3e

function POMDP-VALUE-ITERATION(pomdp, ε) returns a utility function
inputs: pomdp, a POMDP with states S, actions A(s), transition model P(s′ | s, a),
      sensor model P(e | s), rewards R(s), discount γ
     ε, the maximum error allowed in the utility of any state
local variables: U, U′, sets of plans p with associated utility vectors αp

U′ ← a set containing just the empty plan [], with α[](s) = R(s)
repeat
   UU′
   U′ ← the set of all plans consisting of an action and, for each possible next percept,
     a plan in U with utility vectors computed according to Equation(??)
   U′ ← REMOVE-DOMINATED-PLANS(U′)
until MAX-DIFFERENCE(U, U′) < ε(1 − γ) ⁄ γ
return U


Figure ?? A high-level sketch of the value iteration algorithm for POMDPs. The REMOVE-DOMINATED-PLANS step and MAX-DIFFERENCE test are typically implemented as linear programs.