Skip to content

Latest commit

 

History

History
36 lines (26 loc) · 1.31 KB

README.md

File metadata and controls

36 lines (26 loc) · 1.31 KB

Reinforcement Learning Course By David Silver

Question reference


Monte-Carlo Learning

Optimal value function V* with Monte-Carlo Agent running 100,000 episodes

Q Function Update V(St) ← V(St) + α (Rt - V(St))

Tri-Surface Plot


SARSA

Q Function Update V(St) ← V(St) + α (Rt+1 + yV(St+1) - V(St))

MSE Per Lambda MSE Per Episode
Point Plot FacetGrid

Linear Function Approximation

MSE Per Lambda MSE Per Episode
Point Plot FacetGrid