Learning Complex Group Behaviors in a Multi-Agent Competitive Environment

Some of the Learned Strategies

DPG 1v2

DPG 2v1

Introduction

The aim of this project is to use reinforcement learning algorithms to explore predator-prey dynamics, such as those seen when predator fish and prey fish interact in nature. In order to do so, we have adapted OpenAI’s multiagent-particle-envs (Lowe et al.) to simulate two competing groups of fish- a predator group (has a single predator) and a prey group (2 or 10 prey). We used 2 different Q-learning algorithms (Q-Network with Experience Replay & Q-Network with Experience Replay and Fixed Q-Target) and a Monte-Carlo Policy Gradient algorithm to train these predator and prey agents. The predator is rewarded based on the number of times they collide with (“eat”) prey. The prey are rewarded negatively for each collision with the prey (“every time they are eaten”). Using such reward functions, the prey group is expected to learn defense strategies and the predator is expected to learn attack strategies.

Methods

Reinforcement learning is a branch of machine learning along with supervised and unsupervised learning. The aim of reinforcement learning is for an agent to interact with its environment. By interaction with the environment, the agent tries to learn the sequence of actions to take in particular states that lead to the maximization of a reward signal.
In the multi-agent setting, more than one agent is present and each agent learns independently of all other agents.

Figure 1: A multiagent environment (Chincoli et. al).

Three RF(Reinforcement Learning) algorithms used in this project for agents to learn their optimal policies in order to maximize their expected cumulative feature reward.

Details and derivations can be found on the paper.

1. Q-learning with experience replay

2. Q-learning with experience replay and a fixed Q-target network

3. Monte-Carlo Policy Gradient

Peformance

Figure 2: Algorithm 1 Epsilon, Collision & Reward versus Episode are plotted for each agent. For this model, the learning rate = 0.0001 & gamma = 0.99

Figure 3: Algorithm 2 Epsilon, Collision & Reward versus Episode are plotted for each agent. For this model, the learning rate = 0.001 & gamma = 0.99

Figure 4: Algorithm 3 Collision, Reward & Loss versus Episode are plotted for each agent. Agent 0: predator; agent 1 & agent 2: prey.

Citation

If you use this work, please cite it as below:

@misc{kazim_sanlav_2018_2528551,
author = {kazim sanlav and
afshan nabi},
title = {{Learning Complex Group Behaviours in a Multi Agent
Competitive Environment}},
month = dec,
year = 2018,
doi = {10.5281/zenodo.2528551},
url = {https://doi.org/10.5281/zenodo.2528551}
}

References

Chincoli, Michele, and Antonio Liotta. "Self-learning power control in wireless sensor networks." Sensors 18.2 (2018): 375.

Lowe, Ryan, et al. "Multi-agent actor-critic for mixed cooperative-competitive environments." Advances in Neural Information Processing Systems. 2017.

Mnih, Volodymyr, et al. "Playing atari with deep reinforcement learning." arXiv preprint arXiv:1312.5602 (2013).

Matiisen, Tambet. “Demystifying Deep Reinforcement Learning.” 2015

Sutton, Richard S., and Andrew G. Barto. Reinforcement learning: An introduction. MIT press, 2018.

Van Hasselt, Hado, Arthur Guez, and David Silver. "Deep Reinforcement Learning with Double Q-Learning." AAAI. Vol. 2. 2016.

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
__pycache__		__pycache__
assets		assets
gym-foo		gym-foo
model_output/swarm		model_output/swarm
multiagent		multiagent
paper		paper
.directory		.directory
.gitignore		.gitignore
DPG.py		DPG.py
LICENSE		LICENSE
README.md		README.md
deneme.py		deneme.py
make_env_.py		make_env_.py
play_DPG.py		play_DPG.py
play_DQN.py		play_DQN.py
play_DQN_fixed_target.py		play_DQN_fixed_target.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Learning Complex Group Behaviors in a Multi-Agent Competitive Environment

Some of the Learned Strategies

DPG 1v2

DPG 2v1

Introduction

Methods

1. Q-learning with experience replay

2. Q-learning with experience replay and a fixed Q-target network

3. Monte-Carlo Policy Gradient

Peformance

Citation

References

About

Releases 2

Packages

Languages

License

kazimsanlav/Learning-Complex-Group-Behaviours-in-a-Multi-Agent-Competitive-Environment

Folders and files

Latest commit

History

Repository files navigation

Learning Complex Group Behaviors in a Multi-Agent Competitive Environment

Some of the Learned Strategies

DPG 1v2

DPG 2v1

Introduction

Methods

1. Q-learning with experience replay

2. Q-learning with experience replay and a fixed Q-target network

3. Monte-Carlo Policy Gradient

Peformance

Citation

References

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages