Solutions to the Stanford CS:234 Reinforcement Learning 2022 course assignments.
Course website: https://web.stanford.edu/class/cs234/
Frozen Lake Markov Decision Process using Value Iteration and Policy Iterasion
Policy Iteration | Value Iteration |
---|
Tabular Q Learning and Deep Q Learning
Learning Curve on the test environment:
Policy Gradient Methods and REINFORCE
Learning Curve of the REINFORCE algorithm on CartPole-v0:
Aplication of Bandit Algorithms in the medical setting
Comparison of different Bandit Algorithms:
Aplication of Upper Confidence Bandit in personalized Recomendation Systems
Comparison of different arm update strategies: