Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
anthony0727 authored Nov 14, 2022
1 parent ed22350 commit 5304eec
Showing 1 changed file with 142 additions and 0 deletions.
142 changes: 142 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1 +1,143 @@
# Marlenv

Marlenv is a multi-agent environment for reinforcement learning, based on the OpenAI [gym](https://github.com/openai/gym) convention.

The function names such as reset(), step() are consistent but the return format is different. Unlike the single agent environments, the multi-agent environments included in this repo formats all returns in a list format, where each element corresponds to each agent in the environment. A similar rule applies to the input action where the action should be a list of actions with a length of number of agents.

Marlenv is an ongoing project and modifications and new environments are expected in the future.


## Installation

clone marlenv repo and use pip to install

```bash
git clone https://github.com/kc-ml2/marlenv.git
cd marlenv
pip install -e .
```

## Rules


### Snake Game

Multiple snakes battle on a fixed size grid map.

Each snake is spawned at a random location on the map, with a random pose and direction at reset().

The map may be initialized with a different walls upon instantiation of the environment.

Snake dies when its head hits a wall or body of another snake. Here, the other snake receives a reward for kill and the dead snake receives a reward for death ('lose').

When multiple snakes collide head to head, all dies without receiving the kill score.

When there is only one snake remaining, it receives a win reward for every unit time of survival.

The snake grows by one pixel when it has eatten a fruit.

**Observation Types**

Image grid : The order is **'NHWC'**

## Examples Input Arguments

### Snake Game

Creating an environment

```python
import gym
import marlenv
env = gym.make(
'Snake-v1',
height=20, # Height of the grid map
width=20, # Width of the grid map
num_snakes=4, # Number of snakes to spawn on grid
snake_length=3, # Initial length of the snake at spawn time
vision_range=5, # Vision range (both width height), map returned if None
frame_stack=1, # Number of observations to stack on return
)
```

Single-agent wrapper

```python
env = gym.make('Snake-v1', num_snakes=1)
env = marlenv.wrappers.SingleAgent(env)
```

This will unwrap the returned the observation, reward, etc from a list

Using the make_snake() function

```python
# Automatically chooses wrappers to handle single agent, multi-agent, vector_env, etc.
env, observation_space, action_space, properties = marlenv.wrappers.make_snake(
num_envs=1, # Number of environments. Used to decided vector env or not
num_snakes=1, # Number of players. Used to determine single/multi agent
**kwargs # Other input parameters to the environment
)
```

The returned values are

- env : The environment object
- observation_space : The processed observation space (according to env type)
- action_space : The processed action space
- properties : The properties is a dict that includes
- high: highest value that observation can have
- low: lowest value that the observation can have
- num_envs: number of environments
- num_snakes: number of snakes to be spawned
- discrete: True if action space is discrete, categorical
- action_info
- {action_high, action_low} if continuous action or {action_n} if discrete

**Custom reward function**

The user can change the reward function structure of the snake-game upon instantiation.

The reward function can be defined using python dictionary as the following

```python
custom_reward_func = {
'fruit': 1.0,
'kill': 0.0,
'lose': 0.0,
'time': 0.0,
'win': 0.0
}
env = gym.make('snake-v1', reward_func=custom_reward_func)
```

Each of the each of the keys represent

- fruit : reward received when the snake eats a fruit
- kill : reward received when the snake kills another snake
- lose : reward (or penalty) received when the snake dies
- time : reward received for each unit of time of survival
- win : reward received during the snake's time of survival as the last one standing

Each reward can be both + and - float number

## Testing

```python
pytest
```

## Citation

```python
@MISC{marlenv2021,
author = {ML2},
title = {Marlenv, Multi-agent Reinforcement Learning Environment},
howpublished = {\url{http://github.com/kc-ml2/marlenv}},
year = {2021}
}
```

## Updates

Currently, there is only one environment of multi-agent snake game.

0 comments on commit 5304eec

Please sign in to comment.