SSG : Satellite System Graph For Approximate Nearest Neighbor Search

SSG is a graph-based approximate nearest neighbor search (ANNS) algorithm. It provides a flexible and efficient solution for the metric-free large-scale ANNS on dense real vectors. It implements the algorithm of our paper: Satellite System Graph: Towards the Efficiency Up-Boundary of Graph-Based Approximate Nearest Neighbor Search

Benchmark datasets

Data set	Download	dimension	nb base vectors	nb query vectors	original website
SIFT1M	original website	128	1,000,000	10,000	original website
GIST1M	original website	128	1,000,000	1,000	original website
Crawl	crawl.tar.gz (1.7GB)	300	1,989,995	10,000	original website
GloVe-100	glove-100.tar.gz (424MB)	100	1,183,514	10,000	original website
Deep100M	deep100m.tar.gz* (34GB)	96	100,000,000	10,000	original website

For Deep100M we will provide the download link upon request

ANNS performance

The performance was tested without parallelism. Among all the graph-based algorithms, NSG and SSG has the smallest index size.

Compared Algorithms:

Please see our NSG paper for the performance of other graph-based algorithms - FANNG.

How to use

Compile

Prerequisite : openmp, cmake, boost
Compile:
1. Go to the root directory of faiss, it's under the directory of extern_libraries aside of ours.
2. Execute the following commands:

$ cd /path/to/project
$ mkdir -p build && cd build
$ cmake .. && make -j

Building SSG Index

The main interfaces and classes have its respective test codes under directory tests/.

Please follow the instructions below to build the SSG index.

a) Build a kNN graph

Firstly, we need to prepare a kNN graph.

We suggest you use our efanna_graph to build this kNN graph. But you can also use any alternatives you like, such as KGraph or faiss.

b) Convert the kNN graph to an SSG

For example:

$ cd /path/to/project/build/tests/
$ ./test_ssg_index data_path knn_graph_path L R Angle ssg_path

data_path is the path of the origin data.
knn_graph_path is the path of the pre-built kNN graph.
L controls the quality of the NSG, the larger the better, L > R.
R controls the index size of the graph, the best R is related to the intrinsic dimension of the dataset.
Angle controls the angle between two edges.
ssg_path is the path where the result SSG stored.

Approximate Nearest Neighbor Search using SSG

For example:

$ cd /path/to/project/build/tests/
$ ./test_ssg_optimized_search data_path query_path ssg_path search_L search_K result_path [random_seed]

data_path is the path of the origin data.
query_path is the path of the query data.
ssg_path is the path of the pre-built SSG.
search_L controls the quality of the search results, the larger the better but slower (must larger than search_K).
search_K controls the number of neighbors we want to find.
result_path is the path of the result neighbors.
random_seed (optional) is the random seed.

NOTE: For now, we only provide interface for search for only one query at a time, and test the performance with single thread.

NOTE: Data alignment is essential for the correctness of our procedure, because we use SIMD instructions for acceleration of numerical computing such as AVX and SSE2. You should use it to ensure your data elements (feature) is aligned with 8 or 16 int or float. For example, if your features are of dimension 70, then it should be extend to dimension 72. And the last 2 dimension should be filled with 0 to ensure the correctness of the distance computing. And this is what data_align() does.

NOTE: Only data-type int32 and float32 are supported for now.

Python API

Install

$ cd /path/to/project/
$ python setup.py install

Usage

NOTE: currently Python API only supports search method.

import numpy as np
import pyssg

data = np.fromfile("/path/to/sift_base.fvecs", dtype=np.float32)
dim = data[0].view(np.int32)
data = data.reshape(-1, dim + 1)
data = np.ascontiguousarray(data[:, 1:])
ndata, dim = data.shape

pyssg.set_seed(1234)
index = pyssg.IndexSSG(dim, ndata)
index.load("/path/to/ssg", data)

k, l = 100, 300
query = np.random.randn(dim).astype(np.float32)
knn = index.search(query, k, l)
print(knn)

Please refer to tests/test_python_query.py for a real-world example.

Parameters used in Our Paper

SSG Building

We use the following parameters to get the SSG index in Fig. 6 of our paper.

We use efanna_graph to build the kNN graph

Step 1. Build kNN Graph

Tool: efanna_graph
Parameters:

Dataset	K	L	iter	S	R
SIFT1M	200	200	12	10	100
GIST1M	400	400	12	15	100
Crawl	400	420	12	15	100
GloVe-100	400	420	12	20	200

Commands:

$ efanna_graph/tests/test_nndescent sift.fvecs sift_200nn.knng 200 200 12 10 100
$ efanna_graph/tests/test_nndescent gist.fvecs gist_400nn.knng 400 400 12 15 100
$ efanna_graph/tests/test_nndescent crawl.fvecs crawl_400nn.knng 400 420 12 15 100
$ efanna_graph/tests/test_nndescent glove-100.fvecs glove-100_400nn.knng 400 420 12 15 200

Step 2. Convert kNN Graph to SSG

Parameters:

Dataset	L	R	Angle
SIFT1M	100	50	60
GIST1M	500	70	60
Crawl	500	40	60
GloVe-100	500	50	60

Commands:

$ ./test_ssg_index sift.fvecs sift_200nn.knng 100 50 60 sift.ssg
$ ./test_ssg_index gist.fvecs gist_400nn.knng 500 70 60 gist.ssg
$ ./test_ssg_index crawl.fvecs crawl_400nn.knng 500 40 60 crawl.ssg
$ ./test_ssg_index glove-100.fvecs glove-100_400nn.knng 500 50 60 glove-100.ssg

SSG Search

search_L: range from search_K to 2000
random_seed: 161803398

Pre-built kNN Graph and NSG Index

Here we provide our pre-built kNN graphs and SSG index files used in our papar's experiments.

Dataset	kNN Graph	SSG Index
SIFT1M	sift_200nn.knng	sift.ssg
GIST1M	gist_400nn.knng	gist.ssg
Crawl	crawl_400nn.knng	crawl.ssg
GloVe-100	glove_400nn.knng	glove.ssg

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
include		include
pictures		pictures
src		src
tests		tests
third_party		third_party
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SSG : Satellite System Graph For Approximate Nearest Neighbor Search

Benchmark datasets

ANNS performance

Tree-based

Quantization-based

Graph-based

How to use

Compile

Building SSG Index

a) Build a kNN graph

b) Convert the kNN graph to an SSG

Approximate Nearest Neighbor Search using SSG

Python API

Install

Usage

Parameters used in Our Paper

SSG Building

Step 1. Build kNN Graph

Step 2. Convert kNN Graph to SSG

SSG Search

Pre-built kNN Graph and NSG Index

About

Releases

Packages

Languages

License

liql2007/SSG

Folders and files

Latest commit

History

Repository files navigation

SSG : Satellite System Graph For Approximate Nearest Neighbor Search

Benchmark datasets

ANNS performance

Tree-based

Quantization-based

Graph-based

How to use

Compile

Building SSG Index

a) Build a kNN graph

b) Convert the kNN graph to an SSG

Approximate Nearest Neighbor Search using SSG

Python API

Install

Usage

Parameters used in Our Paper

SSG Building

Step 1. Build kNN Graph

Step 2. Convert kNN Graph to SSG

SSG Search

Pre-built kNN Graph and NSG Index

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages