- Contains implementations of algorithms like: epsilon-greedy, UCB, Softmax for solving multi-arm bandit problems on 10-armed testbed
- Contains implementations of sarsa, sarsa-lambda, MC-policy gradient and linear function approximators
- gym_pdw contains a custom puddle world environment made using openai gym
- Contains implementation of DQN to solve the cartpole problem and SMDP Q-learning on 4 Room grid environment
- grid-worlds contains a custom 4 room grid environment made using openai gym