Skip to content

Latest commit

 

History

History
45 lines (34 loc) · 3.54 KB

README.md

File metadata and controls

45 lines (34 loc) · 3.54 KB

DISTOD experiments

This folder contains the configuration, artefacts, and scripts used to perform the experiments in the paper.

Structure & Ansible

We use a combination of Ansible and bash scripts to automate the configuration and execution of the experiments on a twelve node cluster.

The folder ansible contains the Ansible configuration files and playbooks. Each algorithm's experiment scripts are located in their corresponding folder. DISTOD and FASTOD-BID read the same data format, the datasets should be located in the experiments/data folder. Since DIST-FASTOD-BID reads JSON, it reads the transformed datasets from experiments/fastod-spark/data. The original datasets can be downloaded from the HPI repeatability website and should be preprocessed with the to-json.py-script to substitute all values with an integer representation and transform them to header-less CSV and JSON files.

How to run

Executing an experiment from the experiments-folder is done using Ansible playbooks, for example:

cd experiments
ansible-playbook -i ansible/inventory.ini ansible/fastod.yml -e 'experiment=exp1-datasets'

If the experiment is in the Wait until experiment finished step, one can safely stop the Ansible driver process (on the local machine) using Ctrl-C. After the experiment finished, you can then obtain the results by changing the load-results.yml playbook to the executed experiment and running:

ansible-playbook -i ansible/inventory.ini ansible/load-results.yml

Experiments

Experiment DISTOD FASTOD-BID DIST-FASTOD-BID Description
exp1-datasets ✔️ ✔️ ✔️ Tests each algorithm in its most powerfull configuration on all datasets.
exp2-nodes ✔️ (n/a) ✔️ Scales the number of nodes on the adult dataset.
exp3-cost ✔️ Scales the number of cores on the hepatitis and adult datasets.
exp4-rows ✔️ Scales the number of rows on the adult, flight, and ncvoter datasets.
exp5-columns ✔️ Scales the number of columns on the plista dataset.
exp6-memory Performed manually! ✔️ Performed manually! Compares the runtime of all algorithms with different heap memory limits.
exp7-caching ✔️ (n/a) (n/a) Compares the runtimes of DISTOD with partition caching turned off or on.
exp8-jvms ✔️ (n/a) (n/a) Runs DISTOD on different JVMs and using different GCs and settings.
exp9-dispatchers ✔️ (n/a) (n/a) Runs the DISTOD master and the workers on different dispatcher implementations to compare their impact.
exp10-network ✔️ (n/a) Measures the network utilization while DISTOD is running on the full cluster. Requires password-less sudo and iptraf installed (iptraf-ng in PATH).