Skip to content

The first experiment

Cuong Nguyen edited this page Jul 11, 2021 · 5 revisions

This is a guide for how to run the first experiment in the SLOG paper.

Start the servers

Replace the addresses in the following config with your IP addresses then save it in a file, for example slog.conf.

For simplicity, we only deploy with 2 regions in this guide; but this can easily be changed by adding more replicas to the configuration. The number of partitions can also be changed; make sure that the num_partitions field matches the number of addresses fields in each replicas.

The client_addresses field is for the IP addresses of the benchmarking machines at each region. It is used to distribute the benchmark over multiple machines. If you only run the benchmark from your local machine, that field can be omitted; however, it might not generate a high enough workload.

# Deployment
protocol: "tcp"
num_partitions: 4
replicas {
  addresses: "10.0.0.1"
  addresses: "10.0.0.2"
  addresses: "10.0.0.3"
  addresses: "10.0.0.4"
  client_addresses: "10.0.0.5"
}
replicas {
  addresses: "10.0.0.6"
  addresses: "10.0.0.7"
  addresses: "10.0.0.8"
  addresses: "10.0.0.9"
  client_addresses: "10.0.0.10"
}

# Initialize the storage with 1 billion records, each has 100 bytes in size
simple_partitioning {
  num_records: 1000000000
  record_size_bytes: 100
}
recv_retries: 5000
return_dummy_txn: true

# Broker
broker_ports: 2021

# Server
server_port: 2023

# Forwarder 
forwarder_port: 2024
forwarder_batch_duration: 1

# Sequencer
sequencer_port: 2025
sequencer_batch_duration: 5

# Interleaver
replication_factor: 1

# Scheduler
num_workers: 3

# Logging
enabled_events: ENTER_SERVER
enabled_events: EXIT_SERVER_TO_CLIENT
sample_rate: 10

You can start the servers either manually or using the admin tool.

Start the servers manually

For each machine:

  • SSH to the machine, clone and build this repo.
  • Copy the config file to the machine.
  • Run the following command, replace <ip-address> with the address of the current machine:
build/slog -config slog.conf -address <ip-address>

Start the servers using the admin tool

See this guide for how to set up the admin tool.

Run the following command to start the servers:

python3 tools/admin.py start slog.conf -u <username> --image <docker-image>

Where <username> is the username used to SSH to the servers, <docker-image> is the name of the SLOG Docker image. You can either build the image yourself or use the ctring/rma-master image, which is kept up-to-day with the master branch.

Wait for the servers to be ready

If the number of initial records to generate is large, it might take some time until the servers are ready. To check if they are ready, look at the log of any server. They are ready when the log ends with

...
I0710 14:03:56.517822    13 server.cpp:180] All machines are online

If you started the servers using the admin tool, you can use the following command to fetch the log from a server

python3 tools/admin.py logs slog.conf -rp 0 0 -u <username> -f

Here the two numbers following -rp are the replica and partition number of the server, -f is used to follow the log.

Run the benchmark

Run a benchmark from local machine

Create a directory to store the result, for example data. For each run, create a descriptive directory for that run in data with the subdirectories client/0. The subdirectories are for the output tree to match with what is expected by the result analyzing Jupyter notebook. For example, if current run is with HOT = 10000, 0% multi-home transaction, and 0% multi-partition transaction, create the directories data/hot10000mh0mp0/client/0.

Run the following command to start the benchmark

build/benchmark -config slog.conf\
  -r 0\
  -clients 100\
  -generators 5\
  -txns 1000000\
  -duration 30\
  -out-dir data/hot10000mh0mp0/client/0\
  -wl "basic"\
  -params "hot=10000,mh=0,mp=0,writes=10,records=10,hot_records=2"
  • -r: the region that the client is located in.
  • -clients: the number of parallel clients. Depending on the size of deployment, the number of clients might be too low to saturate the system, so you should try to increase it until the throughput stops increasing.
  • -generators: number of threads used to run the benchmark
  • -txns: number of transactions generated. The benchmark will loop over these transactions until the end of the duration. If this is set to 0, the transactions will be generated on the fly instead.
  • -duration: how long to run the benchmark.
  • -out-dir: where to output the result.
  • -wl: name of the workload.
  • -params: parameters for the workload.

In the first experiment in the paper, the varying parameters are: mh, mp, and hot, so change these parameters accordingly each run to reproduce the experiment. Make sure that hot is always less than the number of generated records specified in the configuration file.

Run a distributed benchmark

A distributed benchmark will be run on the servers specified in the client_addresses field of the config. These also need to have Docker installed on them. To start a benchmark, run the following command:

python3 tools/admin.py benchmark slog.conf\
  -u <username>\
  --image <docker-image>\
  --clients 100\
  --generators 5\
  --txns 1000000\
  --duration 30\
  --workload basic\
  --param "hot=10000,mh=0,mp=0,writes=10,records=10,hot_records=2"\
  --tag hot10000mh0mp0

The arguments here are similar to those from the benchmark tool. The argument --tag is used to name the output directory.

Run the following command to collect the output data:

python3 tools/admin.py collect_client slog.conf hot10000mh0mp0 --out-dir data -u <username>

The second positional argument in this command hot10000mh0mp0 is the tag assigned to the run in the previous command. This command also put the results in the correct directory tree as expected by the Jupyter notebook.

End-to-end script

A script is provided to run the whole experiment end-to-end including: starting servers, running the benchmark, and collecting the data.

tools/microbenchmark.sh -a tools/admin.py -c slog.conf -i <docker-image> -o <output-dir> -u <username>

You might need to tweak the values in the BENCHMARK_ARGS variable at the beginning of this script.

Read the result

Download this Jupyter notebook and follow the instructions there to compute metrics from the output data.

Install the prerequisite packages before running this notebook

pip3 install pyspark pandas numpy matplotlib