Skip to content

kuri8ive/Deep-Reinforcement-Learning-Hands-on-by-Minecraft-with-ChainerRL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

Deep-Reinforcement-Learning-Hands-on-by-Minecraft-with-ChainerRL

Minecraft

Do you know Minecraft?

PR movie

Official Wiki says as belows:

Minecraft is a sandbox construction game created by Mojang AB founder Markus "Notch" Persson, inspired by Infiniminer, Dwarf Fortress, Dungeon Keeper, and Notch's past games Legend of the Chambered and RubyDung. Gameplay involves players interacting with the game world by placing and breaking various types of blocks in a three-dimensional environment. In this environment, players can build creative structures, creations, and artwork on multiplayer servers and singleplayer worlds across multiple game modes.

This game has some characteristics related to today's reinforcement learning.

  • can freely make buildings and other things
  • can play various games
  • can multiplay

These characteristics made Minecraft good material for studying reinforcement learning.

Marlo Project

In this hands-on, you do deep reinforcement learning with Minecraft as a simulator environment. This time you use a Minecraft environment called marLo which is used in MARLO, a contest utilizing Minecraft for a deep reinforcement learning. You can easily use a reinforcement learning framework such as ChainerRL because marLo is compatible with OpenAI's Gym(although it is not complete ... for example wrapper used for saving movies cannot be used).

MarLo has some environments as follows. For example, you can make an AI walking on a single road on lava with deep reinforcement learning. Now you learn with MarLo-FindTheGoal-v0 environment, and then you try assignments.

MarLo-MazeRunner-v0Alt text MarLo-CliffWalking-v0Alt text MarLo-CatchTheMob-v0Alt text MarLo-FindTheGoal-v0Alt text
MarLo-Attic-v0Alt text MarLo-DefaultFlatWorld-v0Alt text MarLo-DefaultWorld-v0Alt text MarLo-Eating-v0Alt text
MarLo-Obstacles-v0Alt text MarLo-TrickyArena-v0Alt text

Hands-on content

This hands-on is based on marlo-handson.

Requirements

As of December 14, 2018, the following is necessary.

Python 3.5+ environment with

  • Chainer v5.0.0
  • CuPy v5.0.0
  • ChainerRL v0.4.0
  • marlo v0.0.1.dev23

Environment construction by Azure

In order to follow the following procedure, Azure subscription is required.

1. Installing Azure CLI

Please choose from the following according to your environment.

  1. on Windows Insatll Azure CLI on Windows

  2. by Homebrew(macOS)

    $ brew update && brew install azure-cli
  3. by Python

    $ pip install azure-cli

2. Log in to Azure

$ az login

3. Select subscription

With the following command, you can list the subscription you have.

$ az account list --all

Let's set the subscription you want to select for your account. Of course, replace [A SUBSCRIPTION ID] with your subscription ID.

$ az account set --subscription [A SUBSCRIPTION ID]

4. Start up the GPU VM

You create a data science VM. --generate-ssh-keys automatically creates the key to connect to the VM and saves it as secret key id_rsa and public key id_rsa.pub in ~ /.ssh/ .

$ AZ_USER=[Any username you like e.g. kumezawa]
$ AZ_LOCATION=[location where resource-group was created e.g. eastus]
$ AZ_RESOURCE_GROUP=[resource-groupを作ったlocation e.g. marmo]
$ az vm create \
--location ${AZ_LOCATION} \
--resource-group ${AZ_RESOURCE_GROUP} \
--name ${AZ_USER}-vm \
--admin-username ${AZ_USER} \
--public-ip-address-dns-name ${AZ_USER} \
--image microsoft-ads:linux-data-science-vm-ubuntu:linuxdsvmubuntu:latest \
--size Standard_NC6 \
--generate-ssh-keys

If it works, you should see a message like below.

{
  "fqdns": "[YOUR USERNAME].eastus.cloudapp.azure.com",
  "id": "/subscriptions/[YOUR SUBSCRIPTION ID]/resourceGroups/marLo-handson/providers/Microsoft.Compute/virtualMachines/vm",
  "location": "eastus",
  "macAddress": "AA-BB-CC-DD-EE-FF",
  "powerState": "VM running",
  "privateIpAddress": "10.0.0.4",
  "publicIpAddress": "123.456.78.910",
  "resourceGroup": "marLo-handson",
  "zones": ""
}

Remember publicIpAddress for the next step.

Note

If the secret key id_rsa and the public key id_rsa.pub are in ~ /.ssh/, the above command will cause an error. In that case, you can create your own key and designate it to use.

$ az vm create \
--location ${AZ_LOCATION} \
--resource-group ${AZ_RESOURCE_GROUP} \
--name ${AZ_USER}-vm \
--admin-username ${AZ_USERER} \
--public-ip-address-dns-name ${AZ_USER} \
--image microsoft-ads:linux-data-science-vm-ubuntu:linuxdsvmubuntu:latest \
--size Standard_NC6 \
--ssh-key-value [Public key path e.g. ~/.ssh/id_rsa.pub]

Note

If you want to start up on a CPU instance instead of a GPU, try --size Standard_D2s_v3 for example. If you want to check the size of other available VMs, you can look it up like this.

az vm list-sizes --location eastus --output table

5. Open the port required for access

$ az vm open-port --resource-group ${AZ_RESOURCE_GROUP} --name ${AZ_USER}-VM --port 8000 --priority 1010 \
&& az vm open-port --resource-group ${AZ_RESOURCE_GROUP} --name ${AZ_USER}-VM --port 8001 --priority 1020 \
&& az vm open-port --resource-group ${AZ_RESOURCE_GROUP} --name ${AZ_USER}-VM --port 6080 --priority 1030

6. ssh connection to VM

$ AZ_IP=[your VM IP e.g. "40.121.36.99"]
$ ssh ${AZ_USER}@${AZ_IP} -i ~/.ssh/id_rsa

7. Create a Conda environment for MarLo

Execute the following command in VM environment.

$ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \
&& distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
&& curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list \
&& sudo apt-get update \
&& sudo apt-get install -y libopenal-dev
$ conda config --set always_yes yes \
&& conda create python=3.6 --name marlo \
&& conda config --add channels conda-forge \
&& conda activate marlo \
&& conda install -c crowdai malmo matplotlib ipython numpy scipy opencv \
&& pip install git+https://github.com/crowdAI/marLo.git \
&& pip install chainer==5.1.0 cupy-cuda92==5.1.0 chainerrl==0.5.0

Please install the appropriate cupy and chainer depending on the currently installed version of cuda.
(e.g. cuda 9.0 -> cupy-cuda90)

$ nvcc --version
$ pip install chainer==5.1.0 cupy-cuda90==5.1.0 chainerrl==0.5.0

8. Starting Minecraft client with Docker

$ sudo docker pull ikeyasu/marlo:latest
$ VNC_PW=[Any password you like]
$ sudo docker run -it --net host --rm --name robo6082 -d -p 6080:6080 -p 10000:10000 -e VNC_PW=${VNC_PW} ikeyasu/marlo:latest

Please go to http://your vm IP:6080 and enter your own password $ {VNC_PW}.

You should be able to connect to the remote environment via a VNC connection and check that Minecraft will start in that!

Alt text

Let's start hands-on

0. Check if the Conda environment is activated

$ conda info -e
# conda environments:
#
base                     /anaconda
marlo                 *  /anaconda/envs/marlo
py35                     /anaconda/envs/py35
py36                     /anaconda/envs/py36

If the marlo created earlier is not activated, please execute the command below.

$ conda activate marlo

1. Clone Hands-on repository

$ git clone https://github.com/keisuke-umezawa/marlo-handson.git
$ cd marlo-handson

2. Run malro test script

$ python test_malmo.py

After waiting for a while, you should see a screen like this.

Alt text

The above command executes the following python script.

test_malmo.py

import marlo


def make_env(env_seed=0):
    join_tokens = marlo.make(
        "MarLo-FindTheGoal-v0",
        params=dict(
            allowContinuousMovement=["move", "turn"],
            videoResolution=[336, 336],
            kill_clients_after_num_rounds=500
        ))
    env = marlo.init(join_tokens[0])

    obs = env.reset()
    action = env.action_space.sample()
    obs, r, done, info = env.step(action)
    env.seed(int(env_seed))
    return env


env = make_env()
obs = env.reset()


for i in range(10):
    action = env.action_space.sample()
    obs, r, done, info = env.step(action)
    print(r, done, info)
  • changing "MarLo-FindTheGoal-v0" can change the environment.
  • obs is numpy format image data.
  • r is the reward for the previous action.
  • done is a boolean value whether the game is over.
  • info contains other information.

Assignment 1: Check or change the above.

3. Execution of training script of DQN by ChainerRL

Execute the following command first. You can start training on reinforcement learning model DQN from scratch with ChainerRL.

$ python train_DQN.py

Stop the script by pressing CTRL + C whenever you like. Then you should have the following directories. The model trained is stored in xxxx_except.

Note

If you want to run by cpu only, add the following option.

$ python train_DQN.py --gpu -1

3. Check the operation of the saved model

You can load the trained model and check its operation with the following command.

$ python train_DQN.py --load results/3765_except --demo

What's the result look like? The model has not trained much, so it should not work properly.

4. Start training from the saved model

You can resume training from a previously saved model by the following command:

$ python train_DQN.py --load results/3765_except

You also can use a prepared model which is already trained to some extent. However, it is a model created by simply running train_DQN.py, so it may not work if you change the code or the environment.

$ wget https://github.com/keisuke-umezawa/marlo-handson/releases/download/v0.2/157850_except.tar.gz
$ tar -xvzf 157850_except.tar.gz
$ python train_DQN.py --load 157850_except

Assignment 2: Read the source code of train_DQN.py

train_DQN.py

# ...ellipsis...

def main():
    parser = argparse.ArgumentParser()

    # ...ellipsis...

    # Set a random seed used in ChainerRL.
    misc.set_random_seed(args.seed, gpus=(args.gpu,))

    if not os.path.exists(args.out_dir):
        os.makedirs(args.out_dir)

    experiments.set_log_base_dir(args.out_dir)
    print('Output files are saved in {}'.format(args.out_dir))

    env = make_env(env_seed=args.seed)

    n_actions = env.action_space.n

    q_func = links.Sequence(
        links.NatureDQNHead(n_input_channels=3),
        L.Linear(512, n_actions),
        DiscreteActionValue
    )

    # Use the same hyper parameters as the Nature paper's
    opt = optimizers.RMSpropGraves(
        lr=args.lr, alpha=0.95, momentum=0.0, eps=1e-2)

    opt.setup(q_func)

    rbuf = replay_buffer.ReplayBuffer(10 ** 6)

    explorer = explorers.LinearDecayEpsilonGreedy(
        1.0, args.final_epsilon,
        args.final_exploration_frames,
        lambda: np.random.randint(n_actions))

    def phi(x):
        # Feature extractor
        x = x.transpose(2, 0, 1)
        return np.asarray(x, dtype=np.float32) / 255

    agent = agents.DQN(
        q_func,
        opt,
        rbuf,
        gpu=args.gpu,
        gamma=0.99,
        explorer=explorer,
        replay_start_size=args.replay_start_size,
        target_update_interval=args.target_update_interval,
        update_interval=args.update_interval,
        batch_accumulator='sum',
        phi=phi
    )

    if args.load:
        agent.load(args.load)

    if args.demo:
        eval_stats = experiments.eval_performance(
            env=env,
            agent=agent,
            n_runs=args.eval_n_runs)
        print('n_runs: {} mean: {} median: {} stdev {}'.format(
            args.eval_n_runs, eval_stats['mean'], eval_stats['median'],
            eval_stats['stdev']))
    else:
        experiments.train_agent_with_evaluation(
            agent=agent,
            env=env,
            steps=args.steps,
            eval_n_runs=args.eval_n_runs,
            eval_interval=args.eval_interval,
            outdir=args.out_dir,
            save_best_so_far_agent=False,
            max_episode_len=args.max_episode_len,
            eval_env=env,
        )


if __name__ == '__main__':
    main()

References

Assignment 3: Improve Performance

There are some ways to improve the performance of reinforcement learning. For example,

  • change model
  • change ReplayBuffer
  • changing parameters

And there is a paper which evaluated performance improvement by trying various reinforcement learning methods. According to this, you may improve performance by the following:

  • use PrioritizedReplayBuffer
  • use DDQN

This tutorial is a English version of original hands-on in japanese (Allowed by the author).

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published