Skip to content

Latest commit

 

History

History

experiments

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

DISTOD experiments

This folder contains the configuration, artefacts, and scripts used to perform the experiments in the paper.

Structure & Ansible

We use a combination of Ansible and bash scripts to automate the configuration and execution of the experiments on a twelve node cluster.

The folder ansible contains the Ansible configuration files and playbooks. Each algorithm's experiment scripts are located in their corresponding folder. DISTOD and FASTOD-BID read the same data format, the datasets should be located in the experiments/data folder. Since DIST-FASTOD-BID reads JSON, it reads the transformed datasets from experiments/fastod-spark/data. The original datasets can be downloaded from the HPI repeatability website and should be preprocessed with the to-json.py-script to substitute all values with an integer representation and transform them to header-less CSV and JSON files.

How to run

Executing an experiment from the experiments-folder is done using Ansible playbooks, for example:

cd experiments
ansible-playbook -i ansible/inventory.ini ansible/fastod.yml -e 'experiment=exp1-datasets'

If the experiment is in the Wait until experiment finished step, one can safely stop the Ansible driver process (on the local machine) using Ctrl-C. After the experiment finished, you can then obtain the results by changing the load-results.yml playbook to the executed experiment and running:

ansible-playbook -i ansible/inventory.ini ansible/load-results.yml

Experiments

Experiment DISTOD FASTOD-BID DIST-FASTOD-BID Description
exp1-datasets ✔️ ✔️ ✔️ Tests each algorithm in its most powerfull configuration on all datasets.
exp2-nodes ✔️ (n/a) ✔️ Scales the number of nodes on the adult dataset.
exp3-cost ✔️ Scales the number of cores on the hepatitis and adult datasets.
exp4-rows ✔️ Scales the number of rows on the adult, flight, and ncvoter datasets.
exp5-columns ✔️ Scales the number of columns on the plista dataset.
exp6-memory Performed manually! ✔️ Performed manually! Compares the runtime of all algorithms with different heap memory limits.
exp7-caching ✔️ (n/a) (n/a) Compares the runtimes of DISTOD with partition caching turned off or on.
exp8-jvms ✔️ (n/a) (n/a) Runs DISTOD on different JVMs and using different GCs and settings.
exp9-dispatchers ✔️ (n/a) (n/a) Runs the DISTOD master and the workers on different dispatcher implementations to compare their impact.
exp10-network ✔️ (n/a) Measures the network utilization while DISTOD is running on the full cluster. Requires password-less sudo and iptraf installed (iptraf-ng in PATH).