By the end of this lab you are expected to:
- Have a quick review for the lecture 01.
- Understand the fundamental concepts of reinforcement learning and learn to test your algorithms with OpenAI gym.
- Training your first reinforcement learning model using stable-baselines3 , evaluate, test, use callbacks and learn how to save and load the RL models.
In this lab you will implement several exploration strategies for simplest problem - bernoulli bandit.
By the end of this lab you will understand how Markov Reward Process and Markov Decision Process work. Also you will apply the direct solution to find the optimal policy.
Policy Iteration and Value Iteration algorithms.
Monte Carlo for Prediction.
Temporal Difference Prediction
By the end of this lab you will understand the difference between Model-Free Prediction and Model-Free Control and you will be familiar with On-Policy vs Off-Policy Learning, and you will implement SARSA & Q-Learning algorithms from scratch.
- Reinforcement learning: an introduction - The MIT Press (1998)
- Artificial Intelligence: A Modern Approach, 4th US ed.
- https://hadovanhasselt.com/2016/01/12/ucl-course/
- https://github.com/huggingface/deep-rl-class
- https://github.com/nicknochnack/ReinforcementLearningCourse
- https://www.coursera.org/learn/unsupervised-learning-recommenders-reinforcement-learning
- https://ydata.yandex.com/course/reinforcement-learning
- https://stable-baselines3.readthedocs.io/en/master/
- https://github.com/CityAplons/rl-skoltech-course
- https://mpatacchiola.github.io/blog/