Do you know Minecraft?
Official Wiki says as belows:
is a sandbox construction game created by Mojang AB founder Markus "Notch" Persson, inspired by Infiniminer, Dwarf Fortress, Dungeon Keeper, and Notch's past games Legend of the Chambered and RubyDung. Gameplay involves players interacting with the game world by placing and breaking various types of blocks in a three-dimensional environment. In this environment, players can build creative structures, creations, and artwork on multiplayer servers and singleplayer worlds across multiple game modes.
This game has some characteristics related to today's reinforcement learning.
- can freely make buildings and other things
- can play various games
- can multiplay
These characteristics made Minecraft good material for studying reinforcement learning.
In this hands-on, you do deep reinforcement learning with Minecraft as a simulator environment. This time you use a Minecraft environment called marLo which is used in MARLO, a contest utilizing Minecraft for a deep reinforcement learning. You can easily use a reinforcement learning framework such as ChainerRL because marLo is compatible with OpenAI's Gym(although it is not complete ... for example wrapper used for saving movies cannot be used).
MarLo has some environments as follows. For example, you can make an AI walking on a single road on lava with deep reinforcement learning. Now you learn with MarLo-FindTheGoal-v0
environment, and then you try assignments.
This hands-on is based on marlo-handson.
As of December 14, 2018, the following is necessary.
Python 3.5+ environment with
- Chainer v5.0.0
- CuPy v5.0.0
- ChainerRL v0.4.0
- marlo v0.0.1.dev23
In order to follow the following procedure, Azure subscription is required.
Please choose from the following according to your environment.
-
on Windows Insatll Azure CLI on Windows
-
by Homebrew(macOS)
$ brew update && brew install azure-cli
-
by Python
$ pip install azure-cli
$ az login
With the following command, you can list the subscription you have.
$ az account list --all
Let's set the subscription you want to select for your account. Of course, replace [A SUBSCRIPTION ID] with your subscription ID.
$ az account set --subscription [A SUBSCRIPTION ID]
You create a data science VM. --generate-ssh-keys
automatically creates the key to connect to the VM and saves it as secret key id_rsa
and public key id_rsa.pub
in ~ /.ssh/
.
$ AZ_USER=[Any username you like e.g. kumezawa]
$ AZ_LOCATION=[location where resource-group was created e.g. eastus]
$ AZ_RESOURCE_GROUP=[resource-groupを作ったlocation e.g. marmo]
$ az vm create \
--location ${AZ_LOCATION} \
--resource-group ${AZ_RESOURCE_GROUP} \
--name ${AZ_USER}-vm \
--admin-username ${AZ_USER} \
--public-ip-address-dns-name ${AZ_USER} \
--image microsoft-ads:linux-data-science-vm-ubuntu:linuxdsvmubuntu:latest \
--size Standard_NC6 \
--generate-ssh-keys
If it works, you should see a message like below.
{
"fqdns": "[YOUR USERNAME].eastus.cloudapp.azure.com",
"id": "/subscriptions/[YOUR SUBSCRIPTION ID]/resourceGroups/marLo-handson/providers/Microsoft.Compute/virtualMachines/vm",
"location": "eastus",
"macAddress": "AA-BB-CC-DD-EE-FF",
"powerState": "VM running",
"privateIpAddress": "10.0.0.4",
"publicIpAddress": "123.456.78.910",
"resourceGroup": "marLo-handson",
"zones": ""
}
Remember publicIpAddress
for the next step.
If the secret key id_rsa
and the public key id_rsa.pub
are in ~ /.ssh/
, the above command will cause an error.
In that case, you can create your own key and designate it to use.
$ az vm create \
--location ${AZ_LOCATION} \
--resource-group ${AZ_RESOURCE_GROUP} \
--name ${AZ_USER}-vm \
--admin-username ${AZ_USERER} \
--public-ip-address-dns-name ${AZ_USER} \
--image microsoft-ads:linux-data-science-vm-ubuntu:linuxdsvmubuntu:latest \
--size Standard_NC6 \
--ssh-key-value [Public key path e.g. ~/.ssh/id_rsa.pub]
If you want to start up on a CPU instance instead of a GPU, try --size Standard_D2s_v3
for example. If you want to check the size of other available VMs, you can look it up like this.
az vm list-sizes --location eastus --output table
$ az vm open-port --resource-group ${AZ_RESOURCE_GROUP} --name ${AZ_USER}-VM --port 8000 --priority 1010 \
&& az vm open-port --resource-group ${AZ_RESOURCE_GROUP} --name ${AZ_USER}-VM --port 8001 --priority 1020 \
&& az vm open-port --resource-group ${AZ_RESOURCE_GROUP} --name ${AZ_USER}-VM --port 6080 --priority 1030
$ AZ_IP=[your VM IP e.g. "40.121.36.99"]
$ ssh ${AZ_USER}@${AZ_IP} -i ~/.ssh/id_rsa
Execute the following command in VM environment.
$ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \
&& distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
&& curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list \
&& sudo apt-get update \
&& sudo apt-get install -y libopenal-dev
$ conda config --set always_yes yes \
&& conda create python=3.6 --name marlo \
&& conda config --add channels conda-forge \
&& conda activate marlo \
&& conda install -c crowdai malmo matplotlib ipython numpy scipy opencv \
&& pip install git+https://github.com/crowdAI/marLo.git \
&& pip install chainer==5.1.0 cupy-cuda92==5.1.0 chainerrl==0.5.0
Please install the appropriate cupy and chainer depending on the currently installed version of cuda.
(e.g. cuda 9.0 -> cupy-cuda90
)
$ nvcc --version
$ pip install chainer==5.1.0 cupy-cuda90==5.1.0 chainerrl==0.5.0
$ sudo docker pull ikeyasu/marlo:latest
$ VNC_PW=[Any password you like]
$ sudo docker run -it --net host --rm --name robo6082 -d -p 6080:6080 -p 10000:10000 -e VNC_PW=${VNC_PW} ikeyasu/marlo:latest
Please go to http://your vm IP:6080
and enter your own password $ {VNC_PW}
.
You should be able to connect to the remote environment via a VNC connection and check that Minecraft will start in that!
$ conda info -e
# conda environments:
#
base /anaconda
marlo * /anaconda/envs/marlo
py35 /anaconda/envs/py35
py36 /anaconda/envs/py36
If the marlo
created earlier is not activated, please execute the command below.
$ conda activate marlo
$ git clone https://github.com/keisuke-umezawa/marlo-handson.git
$ cd marlo-handson
$ python test_malmo.py
After waiting for a while, you should see a screen like this.
The above command executes the following python script.
import marlo
def make_env(env_seed=0):
join_tokens = marlo.make(
"MarLo-FindTheGoal-v0",
params=dict(
allowContinuousMovement=["move", "turn"],
videoResolution=[336, 336],
kill_clients_after_num_rounds=500
))
env = marlo.init(join_tokens[0])
obs = env.reset()
action = env.action_space.sample()
obs, r, done, info = env.step(action)
env.seed(int(env_seed))
return env
env = make_env()
obs = env.reset()
for i in range(10):
action = env.action_space.sample()
obs, r, done, info = env.step(action)
print(r, done, info)
- changing "MarLo-FindTheGoal-v0" can change the environment.
- obs is numpy format image data.
- r is the reward for the previous action.
- done is a boolean value whether the game is over.
- info contains other information.
Execute the following command first. You can start training on reinforcement learning model DQN from scratch with ChainerRL.
$ python train_DQN.py
Stop the script by pressing CTRL + C whenever you like. Then you should have the following directories. The model trained is stored in xxxx_except.
If you want to run by cpu only, add the following option.
$ python train_DQN.py --gpu -1
You can load the trained model and check its operation with the following command.
$ python train_DQN.py --load results/3765_except --demo
What's the result look like? The model has not trained much, so it should not work properly.
You can resume training from a previously saved model by the following command:
$ python train_DQN.py --load results/3765_except
You also can use a prepared model which is already trained to some extent. However, it is a model created by simply running train_DQN.py
, so it may not work if you change the code or the environment.
$ wget https://github.com/keisuke-umezawa/marlo-handson/releases/download/v0.2/157850_except.tar.gz
$ tar -xvzf 157850_except.tar.gz
$ python train_DQN.py --load 157850_except
# ...ellipsis...
def main():
parser = argparse.ArgumentParser()
# ...ellipsis...
# Set a random seed used in ChainerRL.
misc.set_random_seed(args.seed, gpus=(args.gpu,))
if not os.path.exists(args.out_dir):
os.makedirs(args.out_dir)
experiments.set_log_base_dir(args.out_dir)
print('Output files are saved in {}'.format(args.out_dir))
env = make_env(env_seed=args.seed)
n_actions = env.action_space.n
q_func = links.Sequence(
links.NatureDQNHead(n_input_channels=3),
L.Linear(512, n_actions),
DiscreteActionValue
)
# Use the same hyper parameters as the Nature paper's
opt = optimizers.RMSpropGraves(
lr=args.lr, alpha=0.95, momentum=0.0, eps=1e-2)
opt.setup(q_func)
rbuf = replay_buffer.ReplayBuffer(10 ** 6)
explorer = explorers.LinearDecayEpsilonGreedy(
1.0, args.final_epsilon,
args.final_exploration_frames,
lambda: np.random.randint(n_actions))
def phi(x):
# Feature extractor
x = x.transpose(2, 0, 1)
return np.asarray(x, dtype=np.float32) / 255
agent = agents.DQN(
q_func,
opt,
rbuf,
gpu=args.gpu,
gamma=0.99,
explorer=explorer,
replay_start_size=args.replay_start_size,
target_update_interval=args.target_update_interval,
update_interval=args.update_interval,
batch_accumulator='sum',
phi=phi
)
if args.load:
agent.load(args.load)
if args.demo:
eval_stats = experiments.eval_performance(
env=env,
agent=agent,
n_runs=args.eval_n_runs)
print('n_runs: {} mean: {} median: {} stdev {}'.format(
args.eval_n_runs, eval_stats['mean'], eval_stats['median'],
eval_stats['stdev']))
else:
experiments.train_agent_with_evaluation(
agent=agent,
env=env,
steps=args.steps,
eval_n_runs=args.eval_n_runs,
eval_interval=args.eval_interval,
outdir=args.out_dir,
save_best_so_far_agent=False,
max_episode_len=args.max_episode_len,
eval_env=env,
)
if __name__ == '__main__':
main()
References
There are some ways to improve the performance of reinforcement learning. For example,
- change model
- change ReplayBuffer
- changing parameters
And there is a paper which evaluated performance improvement by trying various reinforcement learning methods. According to this, you may improve performance by the following:
- use PrioritizedReplayBuffer
- use DDQN
This tutorial is a English version of original hands-on in japanese (Allowed by the author).