Skip to content

Latest commit

 

History

History
14 lines (9 loc) · 516 Bytes

README.md

File metadata and controls

14 lines (9 loc) · 516 Bytes

Reinforcement Learning

This is an implementation of code for a reinforcement learning course.

Multi-armed Bandits

This repository implements a set of algorithms to solve the multi-armed bandit problem:

  1. Epsilon Greedy (epsilon_greedy.py)
  2. Optimistic Initial Value (optimistic_initial_value.py)
  3. Upper Confidence Bound (ucb.py)
  4. Thompson Sampling (thompson.py)

Furthermore, we implemented 2 sample bandit interfaces as examples of how the algorithms (agent) can interact with bandits (environment).