This repository aims to provide the simple code and tutorial for bandit algorithm.
Bandit or Multi-Armed Bandit or Contextual Bandit is a problem in reinforcement learning. It is a problem where an agent has to choose one of the actions from a set of actions. The agent gets a reward for each action and the goal is to maximize the total reward.
This repository uses Poetry as a dependency manager. To install the dependencies, run:
$ poetry install
To activate the virtual environment, run:
$ poetry shell
- Add contextual bandit algorithms such as LinUCB, LinThompsomSampling, etc.