Reinforcement Learning is applied to Easy21. This is an assignment as part of David Silver's Reinforcement Learning Course at UCL. The assignment can be found here.
python3 monteCarlo.py
The agent played 1 Million games (episodes) to obtain the following Value function:
The optimal policy chosen by selecting the actions with the highest value:
python3 temporalDifference.py
The MSE of Q, the state-action function, over the course of episodic learning. For each lambda, 10,000 Episodes have been measured against the Monte-Carlo 1 Million state-action function, saved in Q.dill
:
Mean Squared Error after 1,000 episodes for different lambdas:
The optimal policy as derived from 10,000 episodes of TD(lambda = 0.3):
python3 lfa.py
The matrix lookup-table approach of the previous models are replaced by coarse coding function approximator. This reduces the 420 state-action combinations down to 36.