Deep-Reinforcement-Learning-Hands-on-by-Minecraft-with-ChainerRL

Minecraft

Do you know Minecraft?

is a sandbox construction game created by Mojang AB founder Markus "Notch" Persson, inspired by Infiniminer, Dwarf Fortress, Dungeon Keeper, and Notch's past games Legend of the Chambered and RubyDung. Gameplay involves players interacting with the game world by placing and breaking various types of blocks in a three-dimensional environment. In this environment, players can build creative structures, creations, and artwork on multiplayer servers and singleplayer worlds across multiple game modes.

This game has some characteristics related to today's reinforcement learning.

can freely make buildings and other things
can play various games
can multiplay

These characteristics made Minecraft good material for studying reinforcement learning.

Marlo Project

In this hands-on, you do deep reinforcement learning with Minecraft as a simulator environment. This time you use a Minecraft environment called marLo which is used in MARLO, a contest utilizing Minecraft for a deep reinforcement learning. You can easily use a reinforcement learning framework such as ChainerRL because marLo is compatible with OpenAI's Gym(although it is not complete ... for example wrapper used for saving movies cannot be used).

MarLo has some environments as follows. For example, you can make an AI walking on a single road on lava with deep reinforcement learning. Now you learn with MarLo-FindTheGoal-v0 environment, and then you try assignments.

`MarLo-MazeRunner-v0`	`MarLo-CliffWalking-v0`	`MarLo-CatchTheMob-v0`	`MarLo-FindTheGoal-v0`
`MarLo-Attic-v0`	`MarLo-DefaultFlatWorld-v0`	`MarLo-DefaultWorld-v0`	`MarLo-Eating-v0`
`MarLo-Obstacles-v0`	`MarLo-TrickyArena-v0`

Hands-on content

This hands-on is based on marlo-handson.

Requirements

As of December 14, 2018, the following is necessary.

Python 3.5+ environment with

Chainer v5.0.0
CuPy v5.0.0
ChainerRL v0.4.0
marlo v0.0.1.dev23

Environment construction by Azure

In order to follow the following procedure, Azure subscription is required.

1. Installing Azure CLI

Please choose from the following according to your environment.

on Windows Insatll Azure CLI on Windows
by Homebrew(macOS)
```
$ brew update && brew install azure-cli
```
by Python
```
$ pip install azure-cli
```

2. Log in to Azure

$ az login

3. Select subscription

With the following command, you can list the subscription you have.

$ az account list --all

Let's set the subscription you want to select for your account. Of course, replace [A SUBSCRIPTION ID] with your subscription ID.

$ az account set --subscription [A SUBSCRIPTION ID]

4. Start up the GPU VM

You create a data science VM. --generate-ssh-keys automatically creates the key to connect to the VM and saves it as secret key id_rsa and public key id_rsa.pub in ~ /.ssh/ .

$ AZ_USER=[Any username you like e.g. kumezawa]
$ AZ_LOCATION=[location where resource-group was created e.g. eastus]
$ AZ_RESOURCE_GROUP=[resource-groupを作ったlocation e.g. marmo]
$ az vm create \
--location ${AZ_LOCATION} \
--resource-group ${AZ_RESOURCE_GROUP} \
--name ${AZ_USER}-vm \
--admin-username ${AZ_USER} \
--public-ip-address-dns-name ${AZ_USER} \
--image microsoft-ads:linux-data-science-vm-ubuntu:linuxdsvmubuntu:latest \
--size Standard_NC6 \
--generate-ssh-keys

If it works, you should see a message like below.

{
  "fqdns": "[YOUR USERNAME].eastus.cloudapp.azure.com",
  "id": "/subscriptions/[YOUR SUBSCRIPTION ID]/resourceGroups/marLo-handson/providers/Microsoft.Compute/virtualMachines/vm",
  "location": "eastus",
  "macAddress": "AA-BB-CC-DD-EE-FF",
  "powerState": "VM running",
  "privateIpAddress": "10.0.0.4",
  "publicIpAddress": "123.456.78.910",
  "resourceGroup": "marLo-handson",
  "zones": ""
}

Remember publicIpAddress for the next step.

Note

If the secret key id_rsa and the public key id_rsa.pub are in ~ /.ssh/, the above command will cause an error. In that case, you can create your own key and designate it to use.

$ az vm create \
--location ${AZ_LOCATION} \
--resource-group ${AZ_RESOURCE_GROUP} \
--name ${AZ_USER}-vm \
--admin-username ${AZ_USERER} \
--public-ip-address-dns-name ${AZ_USER} \
--image microsoft-ads:linux-data-science-vm-ubuntu:linuxdsvmubuntu:latest \
--size Standard_NC6 \
--ssh-key-value [Public key path e.g. ~/.ssh/id_rsa.pub]

Note

If you want to start up on a CPU instance instead of a GPU, try --size Standard_D2s_v3 for example. If you want to check the size of other available VMs, you can look it up like this.

az vm list-sizes --location eastus --output table

5. Open the port required for access

$ az vm open-port --resource-group ${AZ_RESOURCE_GROUP} --name ${AZ_USER}-VM --port 8000 --priority 1010 \
&& az vm open-port --resource-group ${AZ_RESOURCE_GROUP} --name ${AZ_USER}-VM --port 8001 --priority 1020 \
&& az vm open-port --resource-group ${AZ_RESOURCE_GROUP} --name ${AZ_USER}-VM --port 6080 --priority 1030

6. ssh connection to VM

$ AZ_IP=[your VM IP e.g. "40.121.36.99"]
$ ssh ${AZ_USER}@${AZ_IP} -i ~/.ssh/id_rsa

7. Create a Conda environment for MarLo

Execute the following command in VM environment.

$ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \
&& distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
&& curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list \
&& sudo apt-get update \
&& sudo apt-get install -y libopenal-dev

$ conda config --set always_yes yes \
&& conda create python=3.6 --name marlo \
&& conda config --add channels conda-forge \
&& conda activate marlo \
&& conda install -c crowdai malmo matplotlib ipython numpy scipy opencv \
&& pip install git+https://github.com/crowdAI/marLo.git \
&& pip install chainer==5.1.0 cupy-cuda92==5.1.0 chainerrl==0.5.0

Please install the appropriate cupy and chainer depending on the currently installed version of cuda.
(e.g. cuda 9.0 -> cupy-cuda90)

$ nvcc --version

$ pip install chainer==5.1.0 cupy-cuda90==5.1.0 chainerrl==0.5.0

8. Starting Minecraft client with Docker

$ sudo docker pull ikeyasu/marlo:latest
$ VNC_PW=[Any password you like]
$ sudo docker run -it --net host --rm --name robo6082 -d -p 6080:6080 -p 10000:10000 -e VNC_PW=${VNC_PW} ikeyasu/marlo:latest

Please go to http://your vm IP:6080 and enter your own password $ {VNC_PW}.

You should be able to connect to the remote environment via a VNC connection and check that Minecraft will start in that!

Let's start hands-on

0. Check if the Conda environment is activated

$ conda info -e
# conda environments:
#
base                     /anaconda
marlo                 *  /anaconda/envs/marlo
py35                     /anaconda/envs/py35
py36                     /anaconda/envs/py36

If the marlo created earlier is not activated, please execute the command below.

$ conda activate marlo

1. Clone Hands-on repository

$ git clone https://github.com/keisuke-umezawa/marlo-handson.git
$ cd marlo-handson

2. Run malro test script

$ python test_malmo.py

After waiting for a while, you should see a screen like this.

The above command executes the following python script.

test_malmo.py

import marlo


def make_env(env_seed=0):
    join_tokens = marlo.make(
        "MarLo-FindTheGoal-v0",
        params=dict(
            allowContinuousMovement=["move", "turn"],
            videoResolution=[336, 336],
            kill_clients_after_num_rounds=500
        ))
    env = marlo.init(join_tokens[0])

    obs = env.reset()
    action = env.action_space.sample()
    obs, r, done, info = env.step(action)
    env.seed(int(env_seed))
    return env


env = make_env()
obs = env.reset()


for i in range(10):
    action = env.action_space.sample()
    obs, r, done, info = env.step(action)
    print(r, done, info)

changing "MarLo-FindTheGoal-v0" can change the environment.
obs is numpy format image data.
r is the reward for the previous action.
done is a boolean value whether the game is over.
info contains other information.

Assignment 1: Check or change the above.

3. Execution of training script of DQN by ChainerRL

Execute the following command first. You can start training on reinforcement learning model DQN from scratch with ChainerRL.

$ python train_DQN.py

Stop the script by pressing CTRL + C whenever you like. Then you should have the following directories. The model trained is stored in xxxx_except.

Note

If you want to run by cpu only, add the following option.

$ python train_DQN.py --gpu -1

3. Check the operation of the saved model

You can load the trained model and check its operation with the following command.

$ python train_DQN.py --load results/3765_except --demo

What's the result look like? The model has not trained much, so it should not work properly.

4. Start training from the saved model

You can resume training from a previously saved model by the following command:

$ python train_DQN.py --load results/3765_except

You also can use a prepared model which is already trained to some extent. However, it is a model created by simply running train_DQN.py, so it may not work if you change the code or the environment.

$ wget https://github.com/keisuke-umezawa/marlo-handson/releases/download/v0.2/157850_except.tar.gz
$ tar -xvzf 157850_except.tar.gz
$ python train_DQN.py --load 157850_except

Assignment 2: Read the source code of train_DQN.py

train_DQN.py

# ...ellipsis...

def main():
    parser = argparse.ArgumentParser()

    # ...ellipsis...

    # Set a random seed used in ChainerRL.
    misc.set_random_seed(args.seed, gpus=(args.gpu,))

    if not os.path.exists(args.out_dir):
        os.makedirs(args.out_dir)

    experiments.set_log_base_dir(args.out_dir)
    print('Output files are saved in {}'.format(args.out_dir))

    env = make_env(env_seed=args.seed)

    n_actions = env.action_space.n

    q_func = links.Sequence(
        links.NatureDQNHead(n_input_channels=3),
        L.Linear(512, n_actions),
        DiscreteActionValue
    )

    # Use the same hyper parameters as the Nature paper's
    opt = optimizers.RMSpropGraves(
        lr=args.lr, alpha=0.95, momentum=0.0, eps=1e-2)

    opt.setup(q_func)

    rbuf = replay_buffer.ReplayBuffer(10 ** 6)

    explorer = explorers.LinearDecayEpsilonGreedy(
        1.0, args.final_epsilon,
        args.final_exploration_frames,
        lambda: np.random.randint(n_actions))

    def phi(x):
        # Feature extractor
        x = x.transpose(2, 0, 1)
        return np.asarray(x, dtype=np.float32) / 255

    agent = agents.DQN(
        q_func,
        opt,
        rbuf,
        gpu=args.gpu,
        gamma=0.99,
        explorer=explorer,
        replay_start_size=args.replay_start_size,
        target_update_interval=args.target_update_interval,
        update_interval=args.update_interval,
        batch_accumulator='sum',
        phi=phi
    )

    if args.load:
        agent.load(args.load)

    if args.demo:
        eval_stats = experiments.eval_performance(
            env=env,
            agent=agent,
            n_runs=args.eval_n_runs)
        print('n_runs: {} mean: {} median: {} stdev {}'.format(
            args.eval_n_runs, eval_stats['mean'], eval_stats['median'],
            eval_stats['stdev']))
    else:
        experiments.train_agent_with_evaluation(
            agent=agent,
            env=env,
            steps=args.steps,
            eval_n_runs=args.eval_n_runs,
            eval_interval=args.eval_interval,
            outdir=args.out_dir,
            save_best_so_far_agent=False,
            max_episode_len=args.max_episode_len,
            eval_env=env,
        )


if __name__ == '__main__':
    main()

References

Assignment 3: Improve Performance

There are some ways to improve the performance of reinforcement learning. For example,

change model
change ReplayBuffer
changing parameters

And there is a paper which evaluated performance improvement by trying various reinforcement learning methods. According to this, you may improve performance by the following:

use PrioritizedReplayBuffer
use DDQN

This tutorial is a English version of original hands-on in japanese (Allowed by the author).

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deep-Reinforcement-Learning-Hands-on-by-Minecraft-with-ChainerRL

Minecraft

Marlo Project

Hands-on content

Requirements

Environment construction by Azure

1. Installing Azure CLI

2. Log in to Azure

3. Select subscription

4. Start up the GPU VM

Note

Note

5. Open the port required for access

6. ssh connection to VM

7. Create a Conda environment for MarLo

8. Starting Minecraft client with Docker

Let's start hands-on

0. Check if the Conda environment is activated

1. Clone Hands-on repository

2. Run malro test script

Assignment 1: Check or change the above.

3. Execution of training script of DQN by ChainerRL

Note

3. Check the operation of the saved model

4. Start training from the saved model

Assignment 2: Read the source code of train_DQN.py

Assignment 3: Improve Performance

About

Releases

Packages

License

kuri8ive/Deep-Reinforcement-Learning-Hands-on-by-Minecraft-with-ChainerRL

Folders and files

Latest commit

History

Repository files navigation

Deep-Reinforcement-Learning-Hands-on-by-Minecraft-with-ChainerRL

Minecraft

Marlo Project

Hands-on content

Requirements

Environment construction by Azure

1. Installing Azure CLI

2. Log in to Azure

3. Select subscription

4. Start up the GPU VM

Note

Note

5. Open the port required for access

6. ssh connection to VM

7. Create a Conda environment for MarLo

8. Starting Minecraft client with Docker

Let's start hands-on

0. Check if the Conda environment is activated

1. Clone Hands-on repository

2. Run malro test script

Assignment 1: Check or change the above.

3. Execution of training script of DQN by ChainerRL

Note

3. Check the operation of the saved model

4. Start training from the saved model

Assignment 2: Read the source code of train_DQN.py

Assignment 3: Improve Performance

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages