Skip to content
/ lccs-lsh Public

Longest Common Sub-sequence Bucketing algorithm

License

Notifications You must be signed in to change notification settings

1flei/lccs-lsh

Repository files navigation

lccs-lsh

Locality-Sensitive Hashing Scheme based on Longest Circular Co-Substring

Paper accepted by SIGMOD 2020 Locality-Sensitive Hashing Scheme based on Longest Circular Co-Substring. SIGMOD Conference 2020, Yifan Lei, Qiang Huang, Mohan S. Kankanhalli, Anthony K. H. Tung

To build the project, use the following instructions:

mkdir build
cd build
cmake ..
make -j

This will output the lccs binary. You can use

./lccs -h

for simple help information.

In order to make the program, you may need a compiler supporting c++17 and the program has the following additional dependencies:

Boost

GSL (required by lshkit)

Other dependencies should be already included in this repository.

In Ubuntu, one can simply use

sudo apt-get install libboost-dev
sudo apt-get install libgsl-dev

to install the Boost and GSL libraries.

We provide the Mnist dataset as a simple example. To run the LCCS-LSH algorithm on the Mnist data, you can simply run the following instructions in the build folder (python 3+ required):

python ../scripts/run_simple_experiment.py

This will work as a shell script and invoke the program using the command line.

To run more experiments, you may need to prepare the datasets in the data folder properly like the Mnist dataset. Once the datasets are properly prepared, you can simply run (in the build folder as well)

python ../scripts/run_ground_truth.py
python ../scripts/run_time_recall.py

The plot scripts can be found in ./scripts/lccs_plot. You can run the plot_sigmod.py to reproduce the figures shown in the paper.

In order to run the scripts, you may need numpy, scipy, and matplotlib. The scripts will automatically use the latest result indicated by the timestamp in the result file name.

The datasets can be found in https://1drv.ms/u/s!Ascf3jEjrVdxg6c6w7CutkF0TpXgpA?e=fjnR80.

About

Longest Common Sub-sequence Bucketing algorithm

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published