- Paper: Mourad Khayati, Ines Arous, Zakhar Tymchenko and Philippe Cudré-Mauroux: ORBITS: Online Recovery of Missing Values in Multiple Time Series Streams. PVLDB 2021.
- Algorithms: The benchmark evaluates all the algorithms mentioned in the paper: ORBITS, SPIRIT, SAGE, OGDImpute, pcaMME, TKCM and M-RNN*. To enable/disable any algorithm, please refer to the Algorithms customization section below.
- Datasets: The benchmark evaluates all the datasets used in the paper: gas (drfit10), motion, bafu and soccer*. To enable/disable any dataset, please refer to the Datasets customization section below.
- Scenarios: The benchmark will execute the full set of 11 recovery scenarios and report the error using RMSE, MSE and MAE. A detailed description of the recovery scenarios can be found here.
- Reproducibilty: We create a dedicated repo for the reproducibility of all the results reported in this paper.
*disabled by default as it takes a couple of days to run.
Prerequisites | Build | Execution | Benchmark Customization | Citation
- Ubuntu 18 or 20 (including Ubuntu derivatives, e.g., Xubuntu).
- Clone this repository.
- Mono: Install mono from https://www.mono-project.com/download/stable/ (takes few minutes)
- Build the Testing Framework using the installation script located in the root folder (takes few minutes):
$ sh install_linux.sh
$ cd TestingFramework/bin/Debug/
$ mono TestingFramework.exe
The test suite with the default setup will take ~20 hours to finish.
-
Results: All results will be added to
Results
folder. The accuracy results of all algorithms will be sequentially added for each scenario and dataset to:Results/.../.../.../error/
. The runtime results of all algorithms will be added to:Results/.../.../.../runtime/
. The plots of the recovered blocks will be added to the folderResults/.../.../.../plots/
. -
Scenarios creation: To compare (externally) your technique against the benchmark results, we provide a command to export the missing scenarios/patterns for a given dataset:
$ cd TestingFramework/bin/Debug/
$ mono TestingFramework.exe export dataset_name
This command will produce contaminated data (where missing values are designated as NaN) in the Export/
folder for each streaming scenario in the benchmark.
To enable an additional algorithm
- open the file
TestingFramework/config.cfg
- add the name of the algorithm to the line
EnabledAlgorithms =
-
All the datasets used in this paper can be found in:
TestingFramework/bin/Debug/data/
-
To enable an additional dataset
- open the file
TestingFramework/config.cfg
- Add the name of the dataset to the line
Datasets =
- open the file
-
To add a new dataset to the benchmark
- import the file to
TestingFramework/bin/Debug/data/{name}/{name}_normal.txt
(name
is the name of your data). - Requirements: rows>= 1'000; columns>= 10; column separator = space; row separator = newline
- import the file to
To enable an additional recovery scenario
- open the file
TestingFramework/config.cfg
- add the name of the scenario to the line
Scenarios =
@inproceedings{orbits2021vldb,
author = {Mourad Khayati and Ines Arous and Zakhar Tymchenko and Philippe Cudr{\'{e}}{-}Mauroux},
title = {ORBITS: Online Recovery of Missing Values in Multiple Time Series Streams},
booktitle = {Proceedings of the VLDB Endowment},
volume = {14},
number = {3},
year = {2021}
}