This repository focuses on Reinforcement Learning related concepts, use cases, point of views and learning approaches. These are purely based on my learnings, readings, experiences in dealing with practical / real-life context and scenarios.
- 01_Introduction covers Key Terms used in RL, Basic elements, Concepts/Topics around RL etc.
- 02_MAB covers Multi-Armed Bandit Problem area
- 03_Monte Carlo Methods covers Monte Carlo Methods
- 04_Temporal Difference Learning covers TD Methods
- 05_Dynamic Programming covers Dynamic Programming
- 06_Approximation covers Online Prediction with Approximation
- 07_PolicyGradientMethods covers Policy Gradient Methods
- 08_MDP covers Finite Markov Decision Processes
- Multi-Armed Bandit Problems (MABP)
- Finite Markov Decision Processes (MDP)
- Dynamic Programming Methods
- Monte Carlo Methods
- Temporal Difference (TD) Learning
- Tabular Solution Methods and Approximate Solution Methods
- Policy Gradient Methods
- Figure out the Adoption factor and ensure "right" stakeholder blessings are met upfront
- Identify "appropriate" business use case within the context of the industry / sub-industry / sub-segment: relevancy is a must
- Identify compute costs upfront and put together a "short term" and "long term" ROI plan to track tasks and how it benefits: we also need to see a pattern of our outcomes so that we can re-adjust and tweak the strategy in the process to stay effective and stay successful
- Focus on simulation method and see how we can strategy for multiple use cases / related use cases and not just one or two use cases
This is where the difference between LEADERS and LAGGARDS in this space !!
Use Case Theme | Description | Industry Relevancy | Category |
---|---|---|---|
Pricing and Promotion Analytics | Ability to apply advanced pricing and promotion strategies to improve product margins | Agriculture | Next Best Actions for Customer |
Waste and Cost reduction | Optimize warehouse logistics and network for reduced waste and maintenance cost reduction | Agriculture | Optimize Complex Operations |
Production Operations Management | Solving Scheduling and Production allocation challenges to optimize and improvise yield | Agriculture | Optimize Complex Operations |
Optimization of Product Design Process | Ability to optimize product design processes to shorten development cycle for new vehicles, features and improvise quality | Automotive | Optimize Product Development Cycle / Design |
Load Balancing | Ability to balance the load of electricity grids in a situation of varying demand cycles | Energy and Utilities | Optimize Complex Operations |
Yield Optimization | Ability to enable real-time well monitoring and precision drilling for improved yield in Oil operations | Energy and Utilities | Optimize Complex Operations |
Trading Strategy Optimization | Ability to optimize the trading strategy for an options-trading portfolio | Financial Services | Optimize Complex Operations |
Customer HyperPersonalization | Delivering advanced personalization abilities that adapt promotions, next best offers and recommendations for increase customer satisfaction and increased sales | Financial Services | Next Best Actions for Customer |
Clinical Trials | The well being of patients during clinical trials is extremely important along with the actual results of the study. In this scenario, the exploration is equivalent to identifying the best treatment, and exploitation is treating patients as effectively as possible during the trial process. | Life Sciences | Optimize Complex Operations |
Effective Inventory Management with Robotics | Stock and pick inventory using Robots | Retail and CPG | Optimize Product Development Cycle / Design |
Network Routing | Routing is the process of selecting a path for traffic in a network, such as telephone networks or computer networks (internet) etc. Allocation of channels to the right users, such that the overall throughput is maximised, can be formulated as a MABP. | Generic / Common | Optimize Product Development Cycle / Design |
Online Advertising | The goal of an advertising campaign is to maximise revenue from displaying ads. The advertiser makes revenue every time an offer is clicked by a web user. Similar to MABP, there is a trade-off between exploration, where the goal is to collect information on an ad’s performance using click-through rates, and exploitation, where we stick with the ad that has performed the best so far. | Generic / Common | Next Best Actions for Customer |
Other References:
- 10 real life problems
- Applications in real world
- RL Cheatsheets
- bsuite - Behavior Suite for RL from DeepMind team
- Actor-Critic Reinforcement Learning for Energy Optimization in Hybrid Production Environments - management and optimisation of energy flows. Everything from the world of power grids could be optimised with RL. From operations and maintenance of microgrids to the optimisation of emergency control procedures, RL could be applied to all the control flows. Heating, ventilation and air conditioning systems (HVAC) are another candidate for optimisation, as energy consumption is a huge cost factor for all industrial sites.
- There are 3 key aspects which are pertinent to greater control of RL algorithms and it's solving power:
- Design approach to see how rewards can be maximized when agent learns
- Importance and relevancy of the Learning environment
- Compute power which is significant where we look for approximation or linear/non-linear function approximations
- Soft-actor critic algorithms are significantly increasing the training efficiency and decreasing compute costs
- Some of the Key cloud computing work that can be looked at:
- Book1: Richard Sutton and Barto
- Book2: Neuro-dynamic Programming - by Dimitri Bertsekas and John Tsitsiklis, Book link from Amazon
- Book3: DL by Ian Goodfellow et al
- RL from Stanford: CS234
- References from Denny Britz
- RL Winter 2021 Stanford: Modules and Videos
- UCL Course on RL
- Common RL Examples on Sagemaker
- Initial Part MABPs: Epsilon, epsilon-Greedy methods
- Advanced MABPs: UCB Bandits, Gradient Bandits, Nonstationary Bandits
- Intro RL
- Top 10 Deep RL Papers in 2019 by Robert T Lange
- Papers, Reports, Slides, and Other Material by Dimitri Bertsekas
- Awesome RL GitHub repo
- RL resource references
- Deep RL at UC Berkeley: CS 285
- To setup and experiment on a cloud platform such as AWS
- Please setup an AWS Sagemaker account
- Ensure to have IAM User and Role setup appropriately for authentication and access control
- Establish an Amazon Sagemaker Notebook Instance
- Establish a S3 Bucket
- Similarly it can be explored for IBM Watson / IBM Cloud OR GCP or Azure