Update README.md

kc-ml2 · Nov 14, 2022 · 5304eec · 5304eec
1 parent ed22350
commit 5304eec
Showing 1 changed file with 142 additions and 0 deletions.
diff --git a/README.md b/README.md
@@ -1 +1,143 @@
+# Marlenv
 
+Marlenv is a multi-agent environment for reinforcement learning, based on the OpenAI [gym](https://github.com/openai/gym) convention. 
+
+The function names such as reset(), step() are consistent but the return format is different. Unlike the single agent environments, the multi-agent environments included in this repo formats all returns in a list format, where each element corresponds to each agent in the environment. A similar rule applies to the input action where the action should be a list of actions with a length of number of agents. 
+
+Marlenv is an ongoing project and modifications and new environments are expected in the future. 
+
+
+## Installation
+
+clone marlenv repo and use pip to install
+
+```bash
+git clone https://github.com/kc-ml2/marlenv.git
+cd marlenv
+pip install -e .
+```
+
+## Rules
+
+
+### Snake Game
+
+Multiple snakes battle on a fixed size grid map.
+
+Each snake is spawned at a random location on the map, with a random pose and direction at reset().
+
+The map may be initialized with a different walls upon instantiation of the environment.
+
+Snake dies when its head hits a wall or body of another snake. Here, the other snake receives a reward for kill and the dead snake receives a reward for death ('lose').
+
+When multiple snakes collide head to head, all dies without receiving the kill score. 
+
+When there is only one snake remaining, it receives a win reward for every unit time of survival.
+
+The snake grows by one pixel when it has eatten a fruit. 
+
+**Observation Types**
+
+Image grid : The order is  **'NHWC'**
+
+## Examples Input Arguments
+
+### Snake Game
+
+Creating an environment
+
+```python
+import gym
+import marlenv
+env = gym.make(
+    'Snake-v1',
+    height=20,       # Height of the grid map
+    width=20,        # Width of the grid map
+    num_snakes=4,    # Number of snakes to spawn on grid
+    snake_length=3,  # Initial length of the snake at spawn time
+    vision_range=5,  # Vision range (both width height), map returned if None
+    frame_stack=1,   # Number of observations to stack on return
+)
+```
+
+Single-agent wrapper
+
+```python
+env = gym.make('Snake-v1', num_snakes=1)
+env = marlenv.wrappers.SingleAgent(env)
+```
+
+This will unwrap the returned the observation, reward, etc from a list
+
+Using the make_snake() function
+
+```python
+# Automatically chooses wrappers to handle single agent, multi-agent, vector_env, etc.
+env, observation_space, action_space, properties = marlenv.wrappers.make_snake(
+    num_envs=1,  # Number of environments. Used to decided vector env or not
+    num_snakes=1,  # Number of players. Used to determine single/multi agent
+    **kwargs  # Other input parameters to the environment
+)
+```
+
+The returned values are
+
+- env : The environment object
+- observation_space : The processed observation space (according to env type)
+- action_space : The processed action space
+- properties : The properties is a dict that includes
+    - high: highest value that observation can have
+    - low: lowest value that the observation can have
+    - num_envs: number of environments
+    - num_snakes: number of snakes to be spawned
+    - discrete: True if action space is discrete, categorical
+    - action_info
+        - {action_high, action_low} if continuous action or {action_n} if discrete
+
+**Custom reward function**
+
+The user can change the reward function structure of the snake-game upon instantiation. 
+
+The reward function can be defined using python dictionary as the following
+
+```python
+custom_reward_func = {
+    'fruit': 1.0,
+    'kill': 0.0,
+    'lose': 0.0,
+    'time': 0.0,
+    'win': 0.0
+}
+env = gym.make('snake-v1', reward_func=custom_reward_func)
+```
+
+Each of the each of the keys represent
+
+- fruit : reward received when the snake eats a fruit
+- kill : reward received when the snake kills another snake
+- lose : reward (or penalty) received when the snake dies
+- time : reward received for each unit of time of survival
+- win : reward received during the snake's time of survival as the last one standing
+
+Each reward can be both + and - float number
+
+## Testing
+
+```python
+pytest
+```
+
+## Citation
+
+```python
+@MISC{marlenv2021,
+author =   {ML2},
+title =    {Marlenv, Multi-agent Reinforcement Learning Environment},
+howpublished = {\url{http://github.com/kc-ml2/marlenv}},
+year = {2021}
+}
+```
+
+## Updates
+
+Currently, there is only one environment of multi-agent snake game.