Offline MARL holds great promise for real-world applications by utilising static datasets to build decentralised controllers of complex multi-agent systems. However, currently offline MARL lacks a standardised benchmark for measuring meaningful research progress. Off-the-Grid MARL (OG-MARL) fills this gap by providing a diverse suite of datasets with baselines on popular MARL benchmark environments in one place, with a unified API and an easy-to-use set of tools.
OG-MARL forms part of the InstaDeep MARL ecosystem, developed jointly with the open-source community. To join us in these efforts, reach out, raise issues or just 🌟 to stay up to date with the latest developments! 📢 You can contribute to the conversation around OG-MARL in the Discussion tab. Please don't hesitate to leave a comment. We will be happy to reply.
📢 We recently moved our datasets to Hugging Face. This means that previous download links for the datasets may no longer work. Datasets can now be downloaded directly from Hugging Face.
Clone this repository.
git clone https://github.com/instadeepai/og-marl.git
Install og-marl
and its dependencies. We tested og-marl
with Python 3.9. Consider using a conda
virtual environment.
pip install -e .
pip install flashbax==0.1.2
Download environment dependencies. We will use SMACv1 in this example.
bash install_environments/smacv1.sh
Download a dataset.
python examples/download_dataset.py --env=smac_v1 --scenario=3m
Run a baseline. In this example we will run MAICQ.
python baselines/main.py --env=smac_v1 --scenario=3m --dataset=Good --system=maicq
We provide a simple demonstrative notebook of how to use OG-MARL's dataset API here:
⚠️ If you are having issues with downloading our datasets, it may because you are downloading from a region far from where we are hosting the datasets. As an alternative, please try this Google Drive link instead.
We have generated datasets on a diverse set of popular MARL environments. A list of currently supported environments is included in the table below. It is well known from the single-agent offline RL literature that the quality of experience in offline datasets can play a large role in the final performance of offline RL algorithms. Therefore in OG-MARL, for each environment and scenario, we include a range of dataset distributions including Good
, Medium
, Poor
and Replay
datasets in order to benchmark offline MARL algorithms on a range of different dataset qualities. For more information on why we chose to include each environment and its task properties, please read our accompanying paper.
Environment | Scenario | Agents | Act | Obs | Reward | Types | Repo |
---|---|---|---|---|---|---|---|
🔫SMAC v1 | 3m 8m 2s3z 5m_vs_6m 27m_vs_30m 3s5z_vs_3s6z 2c_vs_64zg |
3 8 5 5 27 8 2 |
Discrete | Vector | Dense | Homog Homog Heterog Homog Homog Heterog Homog |
source |
💣SMAC v2 | terran_5_vs_5 zerg_5_vs_5 terran_10_vs_10 |
5 5 10 |
Discrete | Vector | Dense | Heterog | source |
🚅Flatland | 3 Trains 5 Trains |
3 5 |
Discrete | Vector | Sparse | Homog | source |
🐜MAMuJoCo | 2x3 HalfCheetah 2x4 Ant 4x2 Ant |
2 2 4 |
Cont. | Vector | Dense | Heterog Homog Homog |
source |
🐻PettingZoo | Pursuit Co-op Pong |
8 2 |
Discrete Discrete |
Pixels Pixels |
Dense | Homog Heterog |
source |
Our datasets are now hosted on Hugging Face to further improve accessibility for the community. A few datasets have yet to be uploaded, but will be very soon.
We recently converted several datasets from prior works to Vaults and benchmarked our baseline algorithms on them. For more information, see our technical report on ArXiv. All of the code for re-running the experiments is available on the following branch of this repository:
https://github.com/instadeepai/og-marl/tree/baselines-code.
We include the following datasets from prior works.
Paper | Environment | Scenario | Source |
---|---|---|---|
Pan et al. (2022) | 🐜MAMuJoCo | 2x3 HalfCheetah | source |
Pan et al. (2022) | 🔴MPE | simple_spread | source |
Shao et al. (2023) | 🔫SMAC v1 | 5m_vs_6m 2s3z 3s_vs_5z 6h_vs_8z |
source |
Wang et al. (2023) | 🔫SMAC v1 | 5m_vs_6m 6h_vs_8z 2c_vs_64zg corridor |
source |
Wang et al. (2023) | 🐜MAMuJoCo | 6x1 HalfCheetah 3x1 Hopper 2x4 Ant |
source |
InstaDeep's MARL ecosystem in JAX. In particular, we suggest users check out the following sister repositories:
- 🦁 Mava: a research-friendly codebase for distributed MARL in JAX.
- 🌴 Jumanji: a diverse suite of scalable reinforcement learning environments in JAX.
- 😎 Matrax: a collection of matrix games in JAX.
- 🔦 Flashbax: accelerated replay buffers in JAX.
- 📈 MARL-eval: standardised experiment data aggregation and visualisation for MARL.
Related. Other libraries related to accelerated MARL in JAX.
- 🦊 JaxMARL: accelerated MARL environments with baselines in JAX.
- ♟️ Pgx: JAX implementations of classic board games, such as Chess, Go and Shogi.
- 🔼 Minimax: JAX implementations of autocurricula baselines for RL.
If you use OG-MARL in your work, please cite the library using:
@inproceedings{formanek2023ogmarl,
author = {Formanek, Claude and Jeewa, Asad and Shock, Jonathan and Pretorius, Arnu},
title = {Off-the-Grid MARL: Datasets and Baselines for Offline Multi-Agent Reinforcement Learning},
year = {2023},
publisher = {AAMAS},
booktitle = {Extended Abstract at the 2023 International Conference on Autonomous Agents and Multiagent Systems},
}
The development of this library was supported with Cloud TPUs from Google's TPU Research Cloud (TRC) 🌤.