Reinforcement Learning on Easy21

Reinforcement Learning is applied to Easy21. This is an assignment as part of David Silver's Reinforcement Learning Course at UCL. The assignment can be found here.

Monte-Carlo Control

python3 monteCarlo.py

The agent played 1 Million games (episodes) to obtain the following Value function:

Visualized as a heatmap:

The optimal policy chosen by selecting the actions with the highest value:

TD Learning

python3 temporalDifference.py

The MSE of Q, the state-action function, over the course of episodic learning. For each lambda, 10,000 Episodes have been measured against the Monte-Carlo 1 Million state-action function, saved in Q.dill:

Mean Squared Error after 1,000 episodes for different lambdas:

The optimal policy as derived from 10,000 episodes of TD(lambda = 0.3):

Linear Function Approximation

python3 lfa.py

The matrix lookup-table approach of the previous models are replaced by coarse coding function approximator. This reduces the 420 state-action combinations down to 36.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
figures		figures
Q.dill		Q.dill
README.md		README.md
enviro.py		enviro.py
lfa.py		lfa.py
monteCarlo.py		monteCarlo.py
temporalDifference.py		temporalDifference.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reinforcement Learning on Easy21

Monte-Carlo Control

TD Learning

Linear Function Approximation

About

Releases

Packages

Languages

tybens/rl-easy21

Folders and files

Latest commit

History

Repository files navigation

Reinforcement Learning on Easy21

Monte-Carlo Control

TD Learning

Linear Function Approximation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages