强化学习60天

英文地址

我为了你我设计这个挑战：在这60天里深入学习“深度强化学习”。

你肯定听说过 Deepmind with AlphaGo Zero 和 OpenAI in Dota 2 取得的惊人成绩！你难道不想知道他们是如何工作的吗？现在正是你我最终学会“深度强化学习”，并应用到已有项目的时机。

终极目标是使用这些多功能的技术，并应用他们到各种重要的真实世界问题中。Demis Hassabis

这个项目引导你完成从最基本的到高级的 AlphaGo Zero 深度强化学习算法。你可以发现按周组织的主题和建议学习资源。同时，每周我会提供用Python实现的应用实例，帮助你更好地消化理论。

这是原作者的第一个此类型项目，有任何想法，建议或改进都可以联系作者 andrea.lonza@gmail.com。

在整个挑战期间，作者将持续更新此项目，请保持关注。

MLEveryday 提示：以下资源尽可能换成国内可访问网站，并用标签中文，英文字幕，英文等区别。如果有找到中文版，请通过issue反馈。

必备知识

了解 Python 和 PyTorch
了解机器学习
了解深度学习（MLP，CNN 和 RNN）

项目（待定）

Q-learning
DQN
AC2
ES
AlphaGo Zero

第一周 - 强化学习介绍

中文|bilibili强化学习简介(An introduction to Reinforcement Learning) by Arxiv Insights
英文字幕|bilibili强化学习课程CS294(Introduction and course overview) by Levine
中文强化学习：像素乒乓大战(Deep Reinforcement Learning: Pong from Pixels) by Karpathy
中文|优酷强化学习简介(Introduction to Reinforcement Learning) - RL by David Silver

第二周 - 强化学习基础：马尔可夫决策过程，动态规划与无模型控制

忘记过去的人，终将重蹈覆辙。 - George Santayana

在这一周，我们将会学习基本的强化学习内容，我们将通过评估和优化表示策略和状态的函数去定义现实世界的各类问题。

理论材料

中文|优酷马尔科夫决策过程(Markov Decision Process) - RL by David Silver

马尔科夫决策过程定义强化学习问题
- 马尔科夫过程
- 马尔科夫决策过程
中文|优酷动态规划设计(Planning by Dynamic Programming) - RL by David Silver

如何解决马尔科夫决策问题
- 策略迭代
- 价值迭代
中文|优酷无模型预测(Model-Free Prediction) - RL by David Silver

评估无模型马尔科夫决策过程的价值函数
- 蒙特卡罗学习
- 时间差分学习
- TD(λ)
中文|优酷无模型约束(Model-Free Control) - RL by David Silver

优化无模型卡尔科夫决策过程价值函数
- Ɛ贪婪策略迭代
- GLIE蒙特卡罗搜索
- SARSA
- 重要性采样

本周项目

Q-learning解决冰冻湖问题. 在本练习中，你将学会使用 SARSA 或者 Q-learning.

想知道更多

阅读该书的第3,4,5,6,7章节 Reinforcement Learning An Introduction - Sutton, Barto

第三周 - 值函数近似和DQN(Deep Q-Learning)

本周我们学习更多高级概念，并应用深度神经网络到Q-learning算法中。

理论材料

讲座

中文|优酷值函数近似(Value functions approximation) - RL by David Silver
- 差分近似函数
- 递增方法
- 批方法
英文字幕|bilibiliAdvanced Q-learning algorithms - DRL UC Berkley by Sergey Levine
- Replay Buffer
- Double Q-learning
- Continous actions (NAF,DDPG)
- Pratical tips

论文

必读

Playing Atari with Deep Reinforcement Learning - 2013
Human-level control through deep reinforcement learning - 2015
Rainbow: Combining Improvements in Deep Reinforcement Learning - 2017

DQN 扩展

Deep Reinforcement Learning with Double Q-learning - 2015
Prioritized Experience Replay - 2015
Dueling Network Architectures for Deep Reinforcement Learning - 2016
Noisy networks for exploration - 2017
Distributional Reinforcement Learning with Quantile Regression - 2017

本周项目

DQN and some variants applied to Pong

本周的目标是开发一个 DQN 算法玩 Atari 游戏。为了使项目更有趣，我开发3个 DQN 变型：Double Q-learning，Multi-step learning，Dueling networks 和 Noisy Nets。使用它们玩游戏，如果你有信心，你可以实现 Prioritized replay， Dueling networks 或者 Distributional RL。阅读论文去了解更多改进。

建议

Deep Reinforcement Learning in the Enterprise: Bridging the Gap from Games to Industry

Week 4 - A2C and A3C

Week 5 - RL in continous space - TRPO/PPO

Week 6 - Evolution Strategies and Genetic Algorithms

Week 7 - I2A

Week 8 - AlphaGoZero + Bonus

Last 4 days - Review + sharing

强化学习论文

强化学习资源

📺 英文|youtubeDeep Reinforcement Learning - UC Berkeley class by Levine, check here their site.

📺 英文|youtubeReinforcement Learning course - by David Silver, DeepMind. Great introductory lectures by Silver, a lead researcher on AlphaGo. They follow the book Reinforcement Learning by Sutton & Barto.

📓 Reinforcement Learning: An Introduction - by Sutton & Barto. The "Bible" of reinforcement learning. Here you can find the PDF draft of the second version.

额外的资源

📚 Awesome Reinforcement Learning. 强化学习专用资源列表

Files

README.md

Latest commit

History

README.md

File metadata and controls

强化学习60天

我为了你我设计这个挑战：在这60天里深入学习“深度强化学习”。

必备知识

项目（待定）

第一周 - 强化学习介绍

中文|bilibili强化学习简介(An introduction to Reinforcement Learning) by Arxiv Insights

英文字幕|bilibili强化学习课程CS294(Introduction and course overview) by Levine

中文强化学习：像素乒乓大战(Deep Reinforcement Learning: Pong from Pixels) by Karpathy

中文|优酷强化学习简介(Introduction to Reinforcement Learning) - RL by David Silver

第二周 - 强化学习基础：马尔可夫决策过程，动态规划与无模型控制

理论材料

中文|优酷马尔科夫决策过程(Markov Decision Process) - RL by David Silver

中文|优酷动态规划设计(Planning by Dynamic Programming) - RL by David Silver

中文|优酷无模型预测(Model-Free Prediction) - RL by David Silver

中文|优酷无模型约束(Model-Free Control) - RL by David Silver

本周项目

想知道更多

第三周 - 值函数近似和DQN(Deep Q-Learning)

理论材料

讲座

中文|优酷值函数近似(Value functions approximation) - RL by David Silver

英文字幕|bilibiliAdvanced Q-learning algorithms - DRL UC Berkley by Sergey Levine

论文

必读

DQN 扩展

本周项目

建议

Week 4 - A2C and A3C

Week 5 - RL in continous space - TRPO/PPO

Week 6 - Evolution Strategies and Genetic Algorithms

Week 7 - I2A

Week 8 - AlphaGoZero + Bonus

Last 4 days - Review + sharing

强化学习论文

强化学习资源

额外的资源

`中文`|`bilibili`强化学习简介(An introduction to Reinforcement Learning) by Arxiv Insights

`英文字幕`|`bilibili`强化学习课程CS294(Introduction and course overview) by Levine

`中文`强化学习：像素乒乓大战(Deep Reinforcement Learning: Pong from Pixels) by Karpathy

`中文`|`优酷`强化学习简介(Introduction to Reinforcement Learning) - RL by David Silver

`中文`|`优酷`马尔科夫决策过程(Markov Decision Process) - RL by David Silver

`中文`|`优酷`动态规划设计(Planning by Dynamic Programming) - RL by David Silver

`中文`|`优酷`无模型预测(Model-Free Prediction) - RL by David Silver

`中文`|`优酷`无模型约束(Model-Free Control) - RL by David Silver

`中文`|`优酷`值函数近似(Value functions approximation) - RL by David Silver

`英文字幕`|`bilibili`Advanced Q-learning algorithms - DRL UC Berkley by Sergey Levine