Reinforcement Learning Course By David Silver

Question reference

Monte-Carlo Learning

Optimal value function V_* with Monte-Carlo Agent running 100,000 episodes

Q Function Update V(S_t) ← V(S_t) + α (R_t - V(S_t))

SARSA

Q Function Update V(S_t) ← V(S_t) + α (R_t+1 + yV(S_t+1) - V(S_t))

MSE Per Lambda	MSE Per Episode

Linear Function Approximation

MSE Per Lambda	MSE Per Episode