This repository contains the implementaion code for repetability of the experiments in my publication:
Harmouch, H., Naumann, F.: Cardinality Estimation: An Experimental Survey. Proceedings of the VLDB Endowment (PVLDB). pp. 499 - 512 (2017).
This includes the following twelve algorithms which are the most popular and well-know cardinality estimation algorithms:
- Flajolet and Martin (FM)
- Probabilistic counting with stochastic averaging(PCSA)
- Linear Counting (LC)
- Alon, Martias and Szegedy (AMS)
- Baryossef, Jayram, Kumar, Sivakumar and Trevisan(BJKST)
- LogLog
- SuperLogLog
- MinCount
- AKMV
- HyperLogLog
- Bloom Filters
- HyperLogLog++
In addition to:
- GEE Sampling-Based algorithm.
- As a baseline we used a hash table.
Metanome is a framework that handles both algorithms and datasets as external resources. All the algorithms above have been developed to work within Metanome.
- Download latest release of Metanome from Metanome releases page as well as the algorithms from the Algorithm releases page.
- Unzip deployment/target/deployment-1.1-SNAPSHOT-package_with_tomcat.zip
- Go into the unzipped folder, place the algorithm jar-file into the folder /WEB-INF/classes/algorithms and the datasets in the folder /WEB-INF/classes/inputData
- Start the run script, either run.sh or run.bat(Windows Systems)
- Open a browser at http://localhost:8080/ and register both the algorithm and the dataset in the Metanome frontend
- Choose the algorithm and datasource, setting parameter and then run!
MetanomeTestRunner: is a project to run the algorithms in development phase. As it is a MVN project all the required Metanome libraries will be automatically downloaded. If you want to build your own algorithm, give it a look here.
Metanome and all the algorithms developed by the developers group has the following license.