(image credit to HBR)
Code for our AAMAS 2020 paper:
"A Story of Two Streams: Reinforcement Learning Models from Human Behavior and Neuropsychiatry"
by Baihan Lin (Columbia)*, Guillermo Cecchi (IBM Research), Djallel Bouneffouf (IBM Research), Jenna Reinen (IBM Research) and Irina Rish (Mila, UdeM).
*Corresponding
For the latest full paper: https://arxiv.org/abs/1906.11286
For my oral talk at AAMAS 2020: https://youtu.be/CQBdQz1bmls
All the experimental results can be reproduced using the code in this repository. Feel free to contact me by doerlbh@gmail.com if you have any question about our work.
Abstract
Drawing an inspiration from behavioral studies of human decision making, we propose here a more general and flexible parametric framework for reinforcement learning that extends standard Q-learning to a two-stream model for processing positive and negative rewards, and allows to incorporate a wide range of reward-processing biases -- an important component of human decision making which can help us better understand a wide spectrum of multi-agent interactions in complex real-world socioeconomic systems, as well as various neuropsychiatric conditions associated with disruptions in normal reward processing. From the computational perspective, we observe that the proposed Split-QL model and its clinically inspired variants consistently outperform standard Q-Learning and SARSA methods, as well as recently proposed Double Q-Learning approaches, on simulated tasks with particular reward distributions, a real-world dataset capturing human decision-making in gambling tasks, and the Pac-Man game in a lifelong learning setting across different reward stationarities.
Language: Python3, Python2, bash
Platform: MacOS, Linux, Windows
by Baihan Lin, Sep 2018
If you find this work helpful, please try the models out and cite our works. Thanks!
Reinforcement Learning case (main paper):
@inproceedings{lin2020astory,
title={A Story of Two Streams: Reinforcement Learning Models from Human Behavior and Neuropsychiatry},
author={Lin, Baihan and Cecchi, Guillermo and Bouneffouf, Djallel and Reinen, Jenna and Rish, Irina},
booktitle = {Proceedings of the Nineteenth International Conference on Autonomous Agents and Multi-Agent Systems, {AAMAS-20}},
publisher = {International Foundation for Autonomous Agents and Multiagent Systems},
pages = {744-752},
year = {2020},
month = {5},
doi = {},
url = {},
}
@inproceedings{lin2019split,
title = {Split Q Learning: Reinforcement Learning with Two-Stream Rewards},
author = {Lin, Baihan and Bouneffouf, Djallel and Cecchi, Guillermo},
booktitle = {Proceedings of the Twenty-Eighth International Joint Conference on
Artificial Intelligence, {IJCAI-19}},
publisher = {International Joint Conferences on Artificial Intelligence Organization},
pages = {6448--6449},
year = {2019},
month = {7},
}
Contextual Bandit case:
@article{lin2020unified,
title={Unified Models of Human Behavioral Agents in Bandits, Contextual Bandits, and RL},
author={Lin, Baihan and Cecchi, Guillermo and Bouneffouf, Djallel and Reinen, Jenna and Rish, Irina},
journal={arXiv preprint arXiv:2005.04544},
year={2020}
}
- Markov Decision Process (MDP) example with multi-modal reward distributions
- Multi-Armed Bandits (MAB) example with multi-modal reward distributions
- Iowa Gambling Task (IGT) example scheme 1 and 2
- PacMan RL game with different stationarities
- Python 3 for MDP and IGT tasks, and Python 2.7 for PacMan task.
- PyTorch
- numpy and scikit-learn
- AD ("Alzheimer's Disease")
- ADD ("addition")
- ADHD ("ADHD")
- bvFTD (the behavioral variant of Frontotemporal dementia)
- CP ("Chronic Pain")
- PD ("Parkinson's Disease")
- M ("moderate")
- SQL ("Split Q-Learning")
- PQL ("Positive Q-Learning")
- NQL ("Negative Q-Learning")
- QL ("Q-Learning")
- DQL ("Double Q-Learning")
The PacMan game was built upon Berkeley AI Pac-Man http://ai.berkeley.edu/project_overview.html. We modify many of the original files and included our comparison.