PASSIVE-TD-AGENT

AIMA3e

function Passive-TD-Agent(percept) returns an action
inputs: percept, a percept indication the current state s' and reward signal r'
persistent: π, a fixed policy
U, a table of utilities, initially empty
N_s, a table of frequencies for states, initially zero
s, a, r, the previous state, action, and reward, initially null

if s' is new then U[s'] ← r'
if s is not null then
increment N_s[s]
U[s] ← U[s] + α(N_s[s])(r + γ U[s'] - U[s])
if s'.Terminal? then s, a, r ← null else s, a, r ← s', π[s'], r'
return a

Figure ?? A passive reinforcement learning agent that learns utility estimates using temporal differences. The step-size function α(n) is chosen to ensure convergence, as described in the text.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Passive-TD-Agent.md

Passive-TD-Agent.md

PASSIVE-TD-AGENT

AIMA3e

Files

Passive-TD-Agent.md

Latest commit

History

Passive-TD-Agent.md

File metadata and controls

PASSIVE-TD-AGENT

AIMA3e