spieeltjie

This is a single-file package for doing simple experiments with multi-agent reinforcement learning on symmetric zero-sum games. For more information see "Open-ended learning in Symmetric Zero-Sum Games" and "A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning". The name "spieeltjie" comes from the Afrikaans word for "tournament".

Gallery

Explanation of animations: The top row is the disk game, the bottom row is Rock Paper Scissors (with agents being mixed strategies). The columns are algorithms, being:

fixed play: iteratively update one agent against the initial population
PSRO uniform: update against all previous agents equally
PSRO nash: update against the empirical nash
PSRO rectified nash: update all agents with support in the nash against the nash, ignoring stronger opponents

Three random agents

This first set of images shows trajectories when starting from a set of random initial agents (orange points). The purple polygon and cross shows the nash equilibrium for those algorithms that use it (this nash is approximated via fictitious play and so can jump around a bit). Note that some algorithms do not make progress for some initial conditions.

One random agent + one nash

This second set of images shows trajectories when starting from two agents, one of which is already the Nash equilibrium of the functional game.

One random agent

This third set of images show trajectories when starting from a single, random agent.

Well-supported population

This last set of images above shows trajectories when starting from a set of agents that give good coverage of the policy space. They are slightly randomly perturbed to prevent cancellation of gradients that would otherwise effect particular algorithms.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

spieeltjie

Gallery

Three random agents

One random agent + one nash

One random agent

Well-supported population

Files

README.md

Latest commit

History

README.md

File metadata and controls

spieeltjie

Gallery

Three random agents

One random agent + one nash

One random agent

Well-supported population