GitHub - sean-chester/SkyBench: Collection of algorithms in C++ for main-memory skyline computation

SkyBench

Version 1.1

Introduction

The SkyBench software suite contains software for efficient main-memory computation of skylines. The state-of-the-art sequential (i.e., single-threaded) and multi-core (i.e., multi-threaded) algorithms are included.

The skyline operator [1] identifies so-called pareto-optimal points in a multi-dimensional dataset. In two dimensions, the problem is often presented as finding the silhouette of Manhattan:
if one has knows the position of the corner points of every building, what parts of which buildings are visible from across the river? The two-dimensional case is trivial to solve and not the focus of SkyBench.

In higher dimensions, the problem is formalised with the concept of dominance: a point p is dominated by another point q if q has better or equal values for every attribute and the points are distinct. All points that are not dominated are part of the skyline. For example, if the points correspond to hotels, then any hotel that is more expensive, farther from anything of interest, and lower-rated than another choice would not be in the skyline. In the table below, Marge's Hotel is dominated by Happy Hostel, because it is more expensive, farther from Central Station, and lower rated, so it is not in the skyline. On the other hand, The Grand has the best rating and Happy Hostel has the best price. Lovely Lodge does not have the best value for any one attribute, but neither The Grand nor Happy Hostel outperform it on every attribute, so it too is in the skyline and represents a good balance of the attributes.

Name	Price per Night	Rating	Distance to Central Station	In skyline?
The Grand	$325	⋆⋆⋆⋆⋆	1.2km	✓
Marge's Motel	$55	⋆⋆	3.6km
Happy Hostel	$25	⋆⋆⋆	0.4km	✓
Lovely Lodge	$100	⋆⋆⋆⋆	8.2km	✓

As the number of dimensions/attributes increases, so too does the size of and difficulty in producing the skyline. Parallel algorithms, such as those implemented here, quickly become necessary.

SkyBench is released in conjunction with our recent ICDE paper [2]. All of the code and scripts necessary to repeat experiments from that paper are available in this software suite. To the best of our knowledge, this is also the first publicly released C++ skyline software, which will hopefully be a useful resource for the academic and industry research communities.

Algorithms

The following algorithms have been implemented in SkyBench:

Hybrid [2]: Located in src/hybrid. It is the state-of-the-art multi-core algorithm, based on two-level quad-tree partitioning of the data and memoisation of point-to-point relationships.
Q-Flow [2]: Located in src/qflow. It is a simplification of Hybrid to demonstrate control flow.
PSkyline [3]: Located in src/pskyline. It was the previous state-of-the-art multi-core algorithm, based on a divide-and-conquer paradigm.
BSkyTree [4]: Located in src/bskytree. It is the state-of-the-art sequential algorithm, based on a quad-tree partitioning of the data and memoisation of point-to-point relationships.

All four algorithms are implementations of the common interface defined in common/skyline_i.h and use common dominance tests from
common/common.h and common/dt_avx.h (the latter when vectorisation is enabled).

Datasets

For reproducibility of the experiments in [2], we include three datasets. The WEATHER dataset was originally obtained from The University of East Anglia Climatic Research Unit and preprocessed for skyline computation. We also include two classic skyline datasets, exactly as used in [2]: NBA and HOUSE.

The synthetic workloads can be generated with the standard benchmark skyline data generator [1] hosted on pgfoundry.

Requirements

SkyBench depends on the following applications:

A C++ compiler that supports C++11 and OpenMP (e.g., the newest GNU compiler)
The GNU make program
AVX or AVX2 if vectorised dominance tests are to be used

Usage

To run, the code needs to be compiled with the given number of dimensions.^ For example, to compute the skyline of the 8-dimensional NBA data set located in workloads/nba-U-8-17264.csv, do:

make all DIMS=8

./bin/SkyBench -f workloads/nba-U-8-17264.csv

By default, it will compute the skyline with all algorithms. Running ./bin/SkyBench without parameters will provide more details about the supported options.

You can make use of the provided shell script (/script/runExp.sh) that does all of the above automatically. For details, execute:

./script/runExp.sh

To reproduce the experiment with real datasets (Table II in [2]), do (assuming a 16-core machine):

./scripts/realTest.sh 16 T "bskytree pbskytree pskyline qflow hybrid"

^For performance reasons, skyline implementations that we obtained from other authors compile their code for a specific number of dimensions. For a fair comparison, we adopted the same approach.

License

This software is subject to the terms of The MIT License, which has been included in this repository.

Contact

This software suite will be expanded soon with new algorithms; so, you are encouraged to ensure that this is still the latest version. Please do not hesitate to contact the authors if you have comments, questions, or bugs to report.

SkyBench on GitHub

References

S. Börzsönyi, D. Kossmann, and K. Stocker. (2001) "The Skyline Operator." In Proceedings of the 17th International Conference on Data Engineering (ICDE 2001), 421--432. http://infolab.usc.edu/csci599/Fall2007/papers/e-1.pdf

S. Chester, D. Šidlauskas, I Assent, and K. S. Bøgh. (2015) "Scalable parallelization of skyline computation for multi-core processors." In Proceedings of the 31st IEEE International Conference on Data Engineering (ICDE 2015), 1083--1094. http://cs.au.dk/~schester/publications/chester_icde2015_mcsky.pdf

H. Im, J. Park, and S. Park. (2011) "Parallel skyline computation on multicore architectures." Information Systems 36(4): 808--823. http://dx.doi.org/10.1016/j.is.2010.10.005

J. Lee and S. Hwang. (2014) "Scalable skyline computation using a balanced pivot selection technique." Information Systems 39: 1--21. http://dx.doi.org/10.1016/j.is.2013.05.005

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
scripts		scripts
src		src
workloads		workloads
LICENSE.md		LICENSE.md
README.md		README.md
makefile		makefile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SkyBench

Table of Contents

Introduction

Algorithms

Datasets

Requirements

Usage

License

Contact

References

About

Releases 1

Packages

Contributors 2

Languages

License

sean-chester/SkyBench

Folders and files

Latest commit

History

Repository files navigation

SkyBench

Table of Contents

Introduction

Algorithms

Datasets

Requirements

Usage

License

Contact

References

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Languages

Packages