- PSRO: Lanctot, Marc, et al. "A unified game-theoretic approach to multiagent reinforcement learning." Advances in neural information processing systems 30 (2017). [arXiv] | [official code]
- P2SRO: McAleer, Stephen, et al. "Pipeline psro: A scalable approach for finding approximate nash equilibria in large games." Advances in neural information processing systems 33 (2020): 20238-20248. [arXiv] | [official code]
- EPSRO: Zhou, Ming, et al. "Efficient Policy Space Response Oracles." arXiv preprint arXiv:2202.00633 (2022). [arXiv] | [offcial code]
- ODO: Dinh, Le Cong, et al. "Online Double Oracle." arXiv preprint arXiv:2103.07780 (2021). [arXiv] | [official code]
- XDO: McAleer, Stephen, et al. "XDO: A double oracle algorithm for extensive-form games." Advances in Neural Information Processing Systems 34 (2021): 23128-23139. [arXiv] | [official code]
- NeurPL: Liu, Siqi, et al. "NeuPL: Neural Population Learning." International Conference on Learning Representations. 2021. [arXiv] | [official code]
- MADDPG: Lowe, Ryan, et al. "Multi-agent actor-critic for mixed cooperative-competitive environments." Advances in neural information processing systems 30 (2017). [arXiv]
- MAPPO: Yu, Chao, et al. "The surprising effectiveness of ppo in cooperative, multi-agent games." arXiv preprint arXiv:2103.01955 (2021). [arXiv]
- QMIX: Rashid, Tabish, et al. "Qmix: Monotonic value function factorisation for deep multi-agent reinforcement learning." International conference on machine learning. PMLR, 2018. [arXiv]
- A3C: Mnih, Volodymyr, et al. "Asynchronous methods for deep reinforcement learning." International conference on machine learning. PMLR, 2016. [arXiv]
- DDPG: Lillicrap, Timothy P., et al. "Continuous control with deep reinforcement learning." arXiv preprint arXiv:1509.02971 (2015). [arXiv]
- SAC: Haarnoja, Tuomas, et al. "Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor." International conference on machine learning. PMLR, 2018. [arXiv]
- DQN: Mnih, Volodymyr, et al. "Human-level control through deep reinforcement learning." nature 518.7540 (2015): 529-533. [arXiv]
- PG: Sutton, Richard S., et al. "Policy gradient methods for reinforcement learning with function approximation." Advances in neural information processing systems 12 (1999). [arXiv]
- PPO: Schulman, John, et al. "Proximal policy optimization algorithms." arXiv preprint arXiv:1707.06347 (2017). [arXiv]