This repository contains middleware source code, aggregated data, final plots, bash scripts for running the system on Azure, Python scripts to extract aggregated data from raw data and Mathlab code for Queueing models.
author: Jovan Nikolic
ETH, Zurich, Switzerland
Final report is available here.
Project description is available here.
Size of raw data obtained from experiments is around 160GB, hence it is not uploaded to the repository. However, aggregated data from middleware logs over 1 second window, is uploaded and can be found in aggregated_data
folder. Data from dstat
tool, ping
results and logs from memtier client are not uploaded as well. All data is available on request.
In aggregated_data
folder, data is organized per experiments in the following way:
baseline_1midd
contains data from baseline experiment with 1 middlewarebaseline_2midd
contain data from baseline experiment with 2 middlewaresbaseline_2midd_2vms
contains data from repetition of experiment with 2 middlewares under write-only load with 2 client machinesexperiment_write-only
contains data from write-only experiments from Section 4experiment_gets
contain data from gets and multi-gets experiments from Section 5experiment_2k
contains data from experiments conducted for 2k analysisqueueing theory
contains data extracted from previous experiments for input values in queueing modelsbaseline_nomidd_1server
contains raw data from baseline without middleware with 1 server experimentbaseline_nomidd_2servers
contains raw data from baseline without middleware with 2 servers experiment
In baseline_nomidd_1server
the following naming convention applies:
-
json_output_file_cpt_
X_rep
Y_
ZZZZ_vm
N.json
represents output file of memtier instance, where X is number of virtual clients per thread at memtier, Y is repetition number = {1, 2, 3}, ZZZZ isS0-G1
for read-only load andS1-G0
for write-only load, N is virtual machine id = {1, 2, 3}. -
dstat_
machineN_cpt
X_rep
Y_
S.csv
represents output file ofdstat
tool, where machine = {"memtier", "memcached"}, N is id of machine instance, X is number of virtual clients per thread at memtier, Y is repetition number = {1, 2, 3} and S is "r" for read-only and "w" for write-only. Ping results are identified in similar way.
In baseline_nomidd_2servers
the following naming convention applies:
-
for
dstat
logs and ping results, convention is the same as above -
json_output_file_inst
A_cpt
X_rep
Y_
ZZZZ_vm
N.json
represents output file of memtier clients, where A is instance id = {1, 2}, X is number of virtual clients per thread at memtier, Y is repetition number = {1, 2, 3}, ZZZZ isS0-G1
for read-only load andS1-G0
for write-only load, N is virtual machine id = {1, 2, 3}.
In queueing_theory
folder, files noq_1midd.txt
and noq_2midd.txt
contain service times that are input for network of queues models with 1 and 2 middlewares, and mm_values.ods
is Libre Office (~ MC Excel) table with sign tables for M/M/1 and M/M/m models.
In all other folders, data is divided in timers
and counters
folders. Folder timers
contains aggregated time-related data like response time, server-service time, net-thread processing time and so on. The naming convention is quite clear: timer_aggregated_data_clientThreads_
X_workerThreads_
M_
ZZZZ.csv
where X is number of virtual clients per thread at memtier, M is number of workers per middleware and ZZZZ is S0-G1
for read-only load and S1-G0
for write-only load. Folder counters
contains throughput aggregates and files are named similarly: throughput_
X_workerThreads_
M_
ZZZZ.csv
, where X is number of virtual clients per thread at memtier, M is number of workers per middleware and ZZZZ is S0-G1
for read-only load and S1-G0
for write-only load. In cases with multiple middleware instances, ID of middleware is added to the filename with tag mw
. In experiment_gets
folder, filename includes (human-readable) information about number of keys and if sharding is enabled.
Plots are organized in separate folders based on experiments, and folders are named the same way as in aggregated_data
folder. Inside each of these folders, there is folder called timers
which includes all final tables and plots from which middleware-related-figures in the report are plotted (including throughput!). Outside of that folder, final tables and plots regarding memtier are located. Filenames are human-readable.
All plots are generated using gnuplot
, and all scripts are next to the files they use for plotting.
Global figures and diagrams like flow chart, architecture of the system and illustartions of network of queues models are located directly in plots
folder.
Bash shell scripts used for running experiments on Azure are located next to src
folder: baseline_1midd.sh
, baseline_2mid.sh
, baseline_2midd_repeating.sh
, baseline_nomidd_1server.sh
, baseline_nomidd_2servers.sh
, experiments_2k.sh
, experiment_gets.sh
, experiment_write-only.sh
. The scripts do not require any input parameters, all parameters are specified inside. For rerun, path to known_hosts
file must be updated, and scripts assume that virtual machines are already running.
Python
scripts for analyzing logs from middleware are located next to src
folder as well. They require raw data which is currently not uploaded to repository. Naming convention is simple: scripts starting with process_middleware_timers
extracts and aggregates time-related data from middlewares, like response time, server-service time and so on, scripts starting with process_middleware_counters
extracts throughput from middleware logs, scripts starting with process_memtier_data
extracts data from memtier logs.
Scripts starting with process_aggregated_data
use data from aggregated_data
folder to create final tables for plotting, and to aggregate data between repetitions.
Report folder contains *.tex file of the report.
Middleware code is located in src/ch/ethz
folder and inside: asltest
folder contains all main classes that build middleware, in instrumentation
are utilities used for logging and in utils
folder are located various utility classes.
All experiments have these parameters in common:
-
CT
is number of threads (set by--thread
or-t
),VC
is number of virtual clients per thread (set by--client
or-c
),CPM
=CT
*VC
is number of virtual clients per memtier instance, and final number of clients isNumClients
=NumInstances
*CPM
=NumInstances
*CT
*VC
-
choose at least 6 points in range [1, 32] for
VC
value -
--data-size=1024
-
--key-maximum=10000
-
--expiry-range=9999-10000
-
--random-data
-
repeated at least 3 times for statistical significance
-
should have stable phase of at least 60 secs
name type for what foraslvms1 A2 2vcpus, 3.5 GB memtier foraslvms2 A2 2vcpus, 3.5 GB memtier foraslvms3 A2 2vcpus, 3.5 GB memtier foraslvms4 A4 8vcpus, 14 GB middleware foraslvms5 A4 8vcpus, 14 GB middleware foraslvms6 A1 1vcpus, 1.75 GB memcached foraslvms7 A1 1vcpus, 1.75 GB memcached foraslvms8 A1 1vcpus, 1.75 GB memcached
-
1.1. Baseline without Middleware, 1 server can be found in
data\baseline_nomidd_1server
name value memtier VMs 3 memtier instances per VM 1 memtier threads per instance 2 memtier virtual clients per thread (1 5 9 14 19 23 28 32) memtier actual virtual clients per thread (1 5 9 14 19 23 28 32 37 47 52) memcached VMs 1 memcached instances per VM 1 load read-only and write-only -
1.2. Baseline withour Middleware, 2 servers can be found in
data\baseline_nomidd_2servers
name value memtier VMs 1 memtier instances per VM 2 memtier threads per instance 1 memtier virtual clients per thread (1 5 9 14 19 23 28 32) memtier actual virtual clients per thread (1 5 9 14 19 23 28 32 37 47 52) memcached VMs 2 memcached instances per VM 1 load read-only and write-only -
2.1. Baseline with 1 Middleware can be found in
data\baseline_1midd
name value memtier VMs 1 memtier instances per VM 1 memtier threads per instance 2 memtier virtual clients per thread (1 5 9 14 19 23 28 32) memtier actual virtual clients per thread (1 5 8 14 19 23 28 32 42 52 64) middleware VMs 1 middleware instances per VM 1 middleware threads per instance (8, 16, 32, 64) memcached VMs 1 memcached instances per VM 1 load read-only and write-only -
2.2. Baseline with 2 Middlewares can be found in
data\baseline_2midd
name value memtier VMs 1 memtier instances per VM 2 memtier threads per instance 1 memtier virtual clients per thread (1 5 9 14 19 23 28 32) memtier actual virtual clients per thread (1 5 8 14 19 23 28 32 42 52 64) middleware VMs 2 middleware instances per VM 1 middleware threads per instance (8, 16, 32, 64) memcached VMs 1 memcached instances per VM 1 load read-only and write-only -
3. Throughput for Writes can be found in
data\experiment_write-only
name value memtier VMs 3 memtier instances per VM 2 memtier threads per instance 1 memtier virtual clients per thread (1 5 9 14 19 23 28 32) memtier actual virtual clients per thread (1 5 8 14 19 23 28 32 42 52 64) middleware VMs 2 middleware instances per VM 1 middleware threads per instance (8, 16, 32, 64) memcached VMs 3 memcached instances per VM 1 load write-only -
4.1. GETs and multi-GETs, Sharded Case can be found in
data\experiment_gets
name value memtier VMs 3 memtier instances per VM 2 memtier threads per instance 1 memtier virtual clients per thread 2 middleware VMs 2 middleware instances per VM 1 middleware threads per instance 64 memcached VMs 3 memcached instances per VM 1 load read-only with (1 3 6 9) keys, sharded enabled -
4.2. GETs and multi-GETs, Non-Sharded Case can be found in
data\experiment_gets
name value memtier VMs 3 memtier instances per VM 2 memtier threads per instance 1 memtier virtual clients per thread 2 middleware VMs 2 middleware instances per VM 1 middleware threads per instance 64 (or any lower number of threads that gives max throughput) memcached VMs 3 memcached instances per VM 1 load read-only with (1 3 6 9) keys, sharded disabled -
5. 2k Analysis can be found in
data\2k_analysis
name value memtier VMs 3 memtier instances per VM 2 memtier threads per instance 1 memtier virtual clients per thread 32 middleware VMs (1 2) middleware instances per VM 1 middleware threads per instance (8 32) memcached VMs (2 3) memcached instances per VM 1 load write-only, read-only and 50-50-read-write, all single-keyed GET requests
Color Palette:
#D43849
#63CB9D
#0B547E
#AF71FC
#C4B205
Tried these as well: #63cb9d
, #1a4b4d
, #99c042
, #2f1b51
, #8f3040
, #000000
, #2a7d58
, #5b1e35
, #F24738
, #FF9600