-
Notifications
You must be signed in to change notification settings - Fork 12
The first experiment
This is a guide for how to run the first experiment in the SLOG paper.
Replace the addresses in the following config with your IP addresses then save it in a file, for example slog.conf
.
For simplicity, we only deploy with 2 regions in this guide; but this can easily be changed by adding more replicas
to the configuration. The number of partitions can also be changed; make sure that the num_partitions
field matches the number of addresses
fields in each replicas
.
The client_addresses
field is for the IP addresses of the benchmarking machines at each region. It is used to distribute the benchmark over multiple machines. If you only run the benchmark from your local machine, that field can be omitted; however, it might not generate a high enough workload.
# Deployment
protocol: "tcp"
num_partitions: 4
replicas {
addresses: "10.0.0.1"
addresses: "10.0.0.2"
addresses: "10.0.0.3"
addresses: "10.0.0.4"
client_addresses: "10.0.0.5"
}
replicas {
addresses: "10.0.0.6"
addresses: "10.0.0.7"
addresses: "10.0.0.8"
addresses: "10.0.0.9"
client_addresses: "10.0.0.10"
}
# Initialize the storage with 1 billion records, each has 100 bytes in size
simple_partitioning {
num_records: 1000000000
record_size_bytes: 100
}
recv_retries: 5000
return_dummy_txn: true
# Broker
broker_ports: 2021
# Server
server_port: 2023
# Forwarder
forwarder_port: 2024
forwarder_batch_duration: 1
# Sequencer
sequencer_port: 2025
sequencer_batch_duration: 5
# Interleaver
replication_factor: 1
# Scheduler
num_workers: 3
# Logging
enabled_events: ENTER_SERVER
enabled_events: EXIT_SERVER_TO_CLIENT
sample_rate: 10
You can start the servers either manually or using the admin tool.
For each machine:
- SSH to the machine, clone and build this repo.
- Copy the config file to the machine.
- Run the following command, replace
<ip-address>
with the address of the current machine:
build/slog -config slog.conf -address <ip-address>
See this guide for how to set up the admin tool.
Run the following command to start the servers:
python3 tools/admin.py start slog.conf -u <username> --image <docker-image>
Where <username>
is the username used to SSH to the servers, <docker-image>
is the name of the SLOG Docker image. You can either build the image yourself or use the ctring/rma-master
image, which is kept up-to-day with the master branch.
If the number of initial records to generate is large, it might take some time until the servers are ready. To check if they are ready, look at the log of any server. They are ready when the log ends with
...
I0710 14:03:56.517822 13 server.cpp:180] All machines are online
If you started the servers using the admin tool, you can use the following command to fetch the log from a server
python3 tools/admin.py logs slog.conf -rp 0 0 -u <username> -f
Here the two numbers following -rp
are the replica and partition number of the server, -f
is used to follow the log.
Create a directory to store the result, for example data
. For each run, create a descriptive directory for that run in data
with the subdirectories client/0
. The subdirectories are for the output tree to match with what is expected by the result analyzing Jupyter notebook. For example, if current run is with HOT = 10000, 0% multi-home transaction, and 0% multi-partition transaction, create the directories data/hot10000mh0mp0/client/0
.
Run the following command to start the benchmark
build/benchmark -config slog.conf\
-r 0\
-clients 100\
-generators 5\
-txns 1000000\
-duration 30\
-out-dir data/hot10000mh0mp0/client/0\
-wl "basic"\
-params "hot=10000,mh=0,mp=0,writes=10,records=10,hot_records=2"
-
-r
: the region that the client is located in. -
-clients
: the number of parallel clients. Depending on the size of deployment, the number of clients might be too low to saturate the system, so you should try to increase it until the throughput stops increasing. -
-generators
: number of threads used to run the benchmark -
-txns
: number of transactions generated. The benchmark will loop over these transactions until the end of the duration. If this is set to 0, the transactions will be generated on the fly instead. -
-duration
: how long to run the benchmark. -
-out-dir
: where to output the result. -
-wl
: name of the workload. -
-params
: parameters for the workload.
In the first experiment in the paper, the varying parameters are: mh
, mp
, and hot
, so change these parameters accordingly each run to reproduce the experiment. Make sure that hot
is always less than the number of generated records specified in the configuration file.
A distributed benchmark will be run on the servers specified in the client_addresses
field of the config. These also need to have Docker installed on them. To start a benchmark, run the following command:
python3 tools/admin.py benchmark slog.conf\
-u <username>\
--image <docker-image>\
--clients 100\
--generators 5\
--txns 1000000\
--duration 30\
--workload basic\
--param "hot=10000,mh=0,mp=0,writes=10,records=10,hot_records=2"\
--tag hot10000mh0mp0
The arguments here are similar to those from the benchmark tool. The argument --tag
is used to name the output directory.
Run the following command to collect the output data:
python3 tools/admin.py collect_client slog.conf hot10000mh0mp0 --out-dir data -u <username>
The second positional argument in this command hot10000mh0mp0
is the tag assigned to the run in the previous command. This command also put the results in the correct directory tree as expected by the Jupyter notebook.
A script is provided to run the whole experiment end-to-end including: starting servers, running the benchmark, and collecting the data.
tools/microbenchmark.sh -a tools/admin.py -c slog.conf -i <docker-image> -o <output-dir> -u <username>
You might need to tweak the values in the BENCHMARK_ARGS
variable at the beginning of this script.
Download this Jupyter notebook and follow the instructions there to compute metrics from the output data.
Install the prerequisite packages before running this notebook
pip3 install pyspark pandas numpy matplotlib