-
Notifications
You must be signed in to change notification settings - Fork 6
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
ed22350
commit 5304eec
Showing
1 changed file
with
142 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,143 @@ | ||
# Marlenv | ||
|
||
Marlenv is a multi-agent environment for reinforcement learning, based on the OpenAI [gym](https://github.com/openai/gym) convention. | ||
|
||
The function names such as reset(), step() are consistent but the return format is different. Unlike the single agent environments, the multi-agent environments included in this repo formats all returns in a list format, where each element corresponds to each agent in the environment. A similar rule applies to the input action where the action should be a list of actions with a length of number of agents. | ||
|
||
Marlenv is an ongoing project and modifications and new environments are expected in the future. | ||
|
||
|
||
## Installation | ||
|
||
clone marlenv repo and use pip to install | ||
|
||
```bash | ||
git clone https://github.com/kc-ml2/marlenv.git | ||
cd marlenv | ||
pip install -e . | ||
``` | ||
|
||
## Rules | ||
|
||
|
||
### Snake Game | ||
|
||
Multiple snakes battle on a fixed size grid map. | ||
|
||
Each snake is spawned at a random location on the map, with a random pose and direction at reset(). | ||
|
||
The map may be initialized with a different walls upon instantiation of the environment. | ||
|
||
Snake dies when its head hits a wall or body of another snake. Here, the other snake receives a reward for kill and the dead snake receives a reward for death ('lose'). | ||
|
||
When multiple snakes collide head to head, all dies without receiving the kill score. | ||
|
||
When there is only one snake remaining, it receives a win reward for every unit time of survival. | ||
|
||
The snake grows by one pixel when it has eatten a fruit. | ||
|
||
**Observation Types** | ||
|
||
Image grid : The order is **'NHWC'** | ||
|
||
## Examples Input Arguments | ||
|
||
### Snake Game | ||
|
||
Creating an environment | ||
|
||
```python | ||
import gym | ||
import marlenv | ||
env = gym.make( | ||
'Snake-v1', | ||
height=20, # Height of the grid map | ||
width=20, # Width of the grid map | ||
num_snakes=4, # Number of snakes to spawn on grid | ||
snake_length=3, # Initial length of the snake at spawn time | ||
vision_range=5, # Vision range (both width height), map returned if None | ||
frame_stack=1, # Number of observations to stack on return | ||
) | ||
``` | ||
|
||
Single-agent wrapper | ||
|
||
```python | ||
env = gym.make('Snake-v1', num_snakes=1) | ||
env = marlenv.wrappers.SingleAgent(env) | ||
``` | ||
|
||
This will unwrap the returned the observation, reward, etc from a list | ||
|
||
Using the make_snake() function | ||
|
||
```python | ||
# Automatically chooses wrappers to handle single agent, multi-agent, vector_env, etc. | ||
env, observation_space, action_space, properties = marlenv.wrappers.make_snake( | ||
num_envs=1, # Number of environments. Used to decided vector env or not | ||
num_snakes=1, # Number of players. Used to determine single/multi agent | ||
**kwargs # Other input parameters to the environment | ||
) | ||
``` | ||
|
||
The returned values are | ||
|
||
- env : The environment object | ||
- observation_space : The processed observation space (according to env type) | ||
- action_space : The processed action space | ||
- properties : The properties is a dict that includes | ||
- high: highest value that observation can have | ||
- low: lowest value that the observation can have | ||
- num_envs: number of environments | ||
- num_snakes: number of snakes to be spawned | ||
- discrete: True if action space is discrete, categorical | ||
- action_info | ||
- {action_high, action_low} if continuous action or {action_n} if discrete | ||
|
||
**Custom reward function** | ||
|
||
The user can change the reward function structure of the snake-game upon instantiation. | ||
|
||
The reward function can be defined using python dictionary as the following | ||
|
||
```python | ||
custom_reward_func = { | ||
'fruit': 1.0, | ||
'kill': 0.0, | ||
'lose': 0.0, | ||
'time': 0.0, | ||
'win': 0.0 | ||
} | ||
env = gym.make('snake-v1', reward_func=custom_reward_func) | ||
``` | ||
|
||
Each of the each of the keys represent | ||
|
||
- fruit : reward received when the snake eats a fruit | ||
- kill : reward received when the snake kills another snake | ||
- lose : reward (or penalty) received when the snake dies | ||
- time : reward received for each unit of time of survival | ||
- win : reward received during the snake's time of survival as the last one standing | ||
|
||
Each reward can be both + and - float number | ||
|
||
## Testing | ||
|
||
```python | ||
pytest | ||
``` | ||
|
||
## Citation | ||
|
||
```python | ||
@MISC{marlenv2021, | ||
author = {ML2}, | ||
title = {Marlenv, Multi-agent Reinforcement Learning Environment}, | ||
howpublished = {\url{http://github.com/kc-ml2/marlenv}}, | ||
year = {2021} | ||
} | ||
``` | ||
|
||
## Updates | ||
|
||
Currently, there is only one environment of multi-agent snake game. |