ARC bench

Benchmark suite for the QCBA framework.

Prerequisites include Java 8, R and qCBA package installed in R.

Preparing data

Prerequisites include Python 2 with scikit-learn and pandas.

generateFolds.sh

The benchmark uses standard open datasets from the UCI repository. To ensure that all algorithm implementations operate on exactly the same folds, the folds are materialized. Missing values are treated in both versions.

The output is saved into

data/folds

The process may also create a temporary folder

data/output

This folder is removed on script completion.

Preparing data for IDS + CORELS

Create materialized discretized folds with discretize_for_ids_corels.R. This will create

data/folds_discr2/train/{dataset}{fold}.csv
data/folds_discr2/train/{dataset}{fold}.cutpoints
data/folds_discr2/test/{dataset}{fold}.csv

Note that the intervals are exactly the same as in data/folds_discr/ only the interval format is different.

Preparing data for scalability benchmarks

Scalability benchmarks are executed on subsets of the KDD'99 dataset, which are generated using:

lib/materialize_folds_KDD.py

The results are saved to:

data/datasets/kdd{subset}_.csv
data/folds_nodiscr/train/kdd{subset}_{splitID}.csv
data/folds_nodiscr/test/kdd{subset}_{splitID}.csv

CBA + QCBA

QCBA benchmark is run using functions defined in evalCBA_QCBA_all.R. If evalCBA_QCBA_all.R is run without parameters, it will list all predefined experiments.

To run the main experiments, run evalCBA_QCBA_all.sh.

The experimentation framework is fail safe (evalCBA_QCBA_all.R) in that when interrupted and executed again, it will check the contents of CBA_results and skip reexecuting all dataset-experiment combinations for which the result are already logged.

Discovered rules are stored into folders:

CBA_models
QCBA_models

For each configuration, the benchmark creates two files -cba.csv and -qcba.csv in

CBA_results

The first file contains evaluation results for CBA and the second file for the postprocesing by QCBA. The difference between 117-noExtend-D-mci=-1-cba.csv and all other -cba.csv runs is that in run 117 the CBA execution includes default rule pruning (-D-). For all remaining runs, default rule pruning is not performed as part of CBA, but is performed as part of some QCBA runs, as indicated by -Pcba- in the filename.

SBRL + QCBA

QCBA benchmark is run using functions defined in evalSBRL_QCBA.R.

Evaluation results on individual test datasets will be stored to:

SBRL_results

The filename indicates one of {CBA,QCBA,SBRL,SBRLQCBA} and maxlength (maximum antecedent length) one of {1, Long} for maximum length of SBRL rules. The Long option has different meaning for individual datasets. Refer to evalSBRL_QCBA.R for specific values.

Discovered rules are stored into folders:

SBRL_Models/
SBRL_QCBA_Models

IDS + QCBA

This benchmark uses the IDS implementation at https://github.com/jirifilip/pyIDS.

As QCBA implementation, two options are supported:

R implementation in https://github.com/kliegr/QCBA (default)
Python implementation in https://github.com/jirifilip/pyARC (experimental)
Run evalIDS.py. Make sure that runQCBA = False is set inside the script. This will run pyIDS using the discretized folds created in the previous step.

The model evaluation statistics against test data will be stored to:

IDS_results/IDS.csv

The IDS models will be stored to

IDS_Models/{dataset}{fold}.csv

Run evalIDS_QCBA.R This will run QCBA implementation in R, loading the IDS models from files in IDS_Models. The IDS-QCBA models are stored to folder
```
QCBA_IDS_Models
```

Evaluation results on individual test datasets are stored in folder:

IDS_results

PRM/FOIL2/CMAR/CPAR + QCBA

This benchmark uses the implemention of PRM, FOIL2, CMAR and CPAR from https://cran.r-project.org/web/packages/arulesCBA/index.html R package:

evalCPAR_CMAR_PRM_FOIL2.R

Models are stored to:

PRM_QCBA_Models
FOIL2_QCBA_Models
CPAR_QCBA_Models
CMAR_QCBA_Models

Evaluation results are stored to

PRM_results
FOIL2_results
CPAR_results
CMAR_results

These scripts save intermediate evaluation results to temporary files (such as temp_vowel_default_PRM_7_noPruning.csv) in the project directory. These can be deleted.

Benchmarks against PART, FURIA, J48

These benchmarks are implemented on top of the WEKA framework in folder WekaBench (in Java) and are executed with:

bash evalWEKA.sh

Evaluation results are stored in folder

WEKA_results

Benchmark against CORELS

Requires https://github.com/corels/pycorels

python evalCORELS.py

This scripts also adds one-hot-encoded data into

data/folds/folds_discr_dumm

Evaluation results are saved to

CORELS_results

Models are saved to

CORELS_models

Scalability benchmarks (CBA+QCBA) - KDD dataset

These are executed for two different thresholds of the minCI extension step:

evalCBA_QCBA_KDDminc-1.sh
evalCBA_QCBA_KDDminc0.sh

These scripts execute evalCBA_QCBA_all.R.

This experiment was run on a different more powerful machine than other experiments and the results have been placed into separate folders.

Discovered rules are stored in folders:

CBA_models/server
QCBA_models/server

For each configuration, the benchmark creates two files -cba.csv and -qcba.csv in

CBA_results/server

Restricting experiments to one core.

To eliminate effect of multicore optimizations, computational resources can be restricted to using taskset as follows (Linux only):

taskset -c 1 Rscript evalCPAR_CMAR_PRM_FOIL2.R --option
taskset -c 0 ./evalCBA_QCBA_all.sh --option

The corresponding scripts are available as:

single_core_evalCPAR_CMAR_PRM_FOIL2.sh
single_core_evalQCBA_all.sh

Experiments with QCBA, CPAR, CMAR, PRM, FOIL2 were run using the single core version.

Analysis of results

To obtain aggregate results for all datasets:

Rscript aggregateResults.R

This will create a stats.csv file in folders:

IDS_results
CBA_results
SBRL_results
WEKA_results
PRM_results
FOIL2_results
CPAR_results
CMAR_results
QCBA_eval_PRM_FOIL2_CMAR_CPAR #summary of effects of postprocessing by QCBA#5 on PRM,FOIL2,CMAR,CPAR 
QCBA_eval_IDS_SBRL #summary of effects of postprocessing by QCBA#5 on IDS, SBRL and CBA

Additionally, this will create in project folder:

accdifferencebydataset.csv

This file indicates on which datasets a model posprocessed by QCBA#5 won compared to a baseline method.

Tables

The results from stats.csv files generated by aggregateResults.R are consolidated into tables

resultsTables.xlsx

Figures

Figures (graphs, plots) are generated using

plots.ipynb

CBA+QCBA experiments - plots analysing results per dataset

The results of scalability experiments are read from CBA_results, SBRL_results, IDS_results.

The resulting graphs are stored as pdf files to:

SBRL_results/SBRLQCBA-noPruning-Long.pdf
SBRL_results/SBRLQCBA-transactionBased-Long.pdf
CBA_results/CBA_QCBA_198-numericOnly-T-Pcba-A-mci=-1.pdf
CBA_results/CBA_QCBA_196-numericOnly-T-Pcba-A-transactionBased-mci=-1.pdf
IDS_results/IDSQCBA_R_noPruning_ATTPRUNING_TRUE.pdf
IDS_results/IDSQCBA_R_transactionBased_ATTPRUNING_TRUE.pdf

Scalability experiments - plots analysing results

The results of scalability experiments are read from CBA_results/server, SBRL_results/server, IDS_results/server.

The resulting graphs are stored as pdf files to:

CBA_results/server/KDDablation.pdf
CBA_results/server/KDD_CBA_QCBA_198-numericOnly-T-Pcba-A-mci=-1.pdf
CBA_results/server/KDD_CBA_QCBA_198-numericOnly-T-Pcba-A-mci=0.pdf
CBA_results/server/KDD_CBA_QCBA-196-numericOnly-T-Pcba-A-transactionBased-mci=0.pdf
SBRL_results/SBRLQCBA-noPruning-Long.pdf
IDS_results/server/KDD-IDSQCBA_R_noPruning_ATTPRUNING_TRUE.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ARC bench

Preparing data

Preparing data for IDS + CORELS

Preparing data for scalability benchmarks

CBA + QCBA

SBRL + QCBA

IDS + QCBA

PRM/FOIL2/CMAR/CPAR + QCBA

Benchmarks against PART, FURIA, J48

Benchmark against CORELS

Scalability benchmarks (CBA+QCBA) - KDD dataset

Restricting experiments to one core.

Analysis of results

Tables

Figures

CBA+QCBA experiments - plots analysing results per dataset

Scalability experiments - plots analysing results

About

Releases 5

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 105 Commits
.ipynb_checkpoints		.ipynb_checkpoints
CBA_models		CBA_models
CBA_results		CBA_results
CMAR_Models		CMAR_Models
CMAR_QCBA_Models		CMAR_QCBA_Models
CMAR_results		CMAR_results
CORELS_models		CORELS_models
CORELS_results		CORELS_results
CPAR_Models		CPAR_Models
CPAR_QCBA_Models		CPAR_QCBA_Models
CPAR_results		CPAR_results
FOIL2_Models		FOIL2_Models
FOIL2_QCBA_Models		FOIL2_QCBA_Models
FOIL2_results		FOIL2_results
IDS_Models		IDS_Models
IDS_results		IDS_results
PRM_Models		PRM_Models
PRM_QCBA_Models		PRM_QCBA_Models
PRM_results		PRM_results
QCBA_IDS_Models		QCBA_IDS_Models
QCBA_eval_IDS_SBRL		QCBA_eval_IDS_SBRL
QCBA_eval_PRM_FOIL2_CMAR_CPAR		QCBA_eval_PRM_FOIL2_CMAR_CPAR
QCBA_models		QCBA_models
SBRL_Models		SBRL_Models
SBRL_QCBA_Models		SBRL_QCBA_Models
SBRL_results		SBRL_results
WEKA_results		WEKA_results
WekaBench		WekaBench
data		data
lib		lib
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
accbydataset.csv		accbydataset.csv
accdifferencebydataset.csv		accdifferencebydataset.csv
aggregateResults.R		aggregateResults.R
discretize_for_ids_corels.R		discretize_for_ids_corels.R
evalCBA_QCBA_KDDminc-1.sh		evalCBA_QCBA_KDDminc-1.sh
evalCBA_QCBA_KDDminc0.sh		evalCBA_QCBA_KDDminc0.sh
evalCBA_QCBA_all.R		evalCBA_QCBA_all.R
evalCBA_QCBA_all.sh		evalCBA_QCBA_all.sh
evalCORELS.py		evalCORELS.py
evalCPAR_CMAR_PRM_FOIL2.R		evalCPAR_CMAR_PRM_FOIL2.R
evalIDS.py		evalIDS.py
evalIDS_QCBA.R		evalIDS_QCBA.R
evalSBRL.sh		evalSBRL.sh
evalSBRL_QCBA.R		evalSBRL_QCBA.R
evalWEKA.sh		evalWEKA.sh
generateFolds.sh		generateFolds.sh
mapping.csv		mapping.csv
plot.html		plot.html
plot.ipynb		plot.ipynb
plot_backup.ipynb		plot_backup.ipynb
resultsTables.xlsx		resultsTables.xlsx
single_core_evalCPAR_CMAR_PRM_FOIL2.sh		single_core_evalCPAR_CMAR_PRM_FOIL2.sh
single_core_evalQCBA_all.sh		single_core_evalQCBA_all.sh

License

kliegr/arcBench

Folders and files

Latest commit

History

Repository files navigation

ARC bench

Preparing data

Preparing data for IDS + CORELS

Preparing data for scalability benchmarks

CBA + QCBA

SBRL + QCBA

IDS + QCBA

PRM/FOIL2/CMAR/CPAR + QCBA

Benchmarks against PART, FURIA, J48

Benchmark against CORELS

Scalability benchmarks (CBA+QCBA) - KDD dataset

Restricting experiments to one core.

Analysis of results

Tables

Figures

CBA+QCBA experiments - plots analysing results per dataset

Scalability experiments - plots analysing results

About

Resources

License

Stars

Watchers

Forks

Releases 5

Packages 0

Languages

Packages