use-case: Add adaptive retrieval use case (#61)

* feat: Add adaptive retrieval Signed-off-by: Ce Gao <cegao@tensorchord.ai> * chore: Resolve comments Signed-off-by: Ce Gao <cegao@tensorchord.ai> --------- Signed-off-by: Ce Gao <cegao@tensorchord.ai>
tensorchord · Mar 7, 2024 · bdfe590 · bdfe590
1 parent 322e483
commit bdfe590
Show file tree

Hide file tree

Showing 19 changed files with 291 additions and 30 deletions.
diff --git a/.vitepress/config.mts b/.vitepress/config.mts
@@ -44,7 +44,7 @@ export default defineConfig({
     nav: [
       { text: 'Home', link: '/' },
       { text: 'Docs', link: '/getting-started/overview' },
-      {text: 'Tutorial', link: '/tutorial/'},
+      {text: 'Reference', link: '/reference/'},
       { text: 'Blog', link: 'https://blog.pgvecto.rs' },
     ],
 
@@ -54,6 +54,17 @@ export default defineConfig({
     },
 
     sidebar: {
+      '/reference/': [
+        {
+          text: 'Reference',
+          collapsed: false,
+          items: [
+            { text: 'Schema', link: '/reference/schema' },
+            { text: 'Indexing Options', link: '/reference/indexing_options' },
+            { text: 'Search Options', link: '/reference/search_options' },
+          ],
+        },
+      ],
       '/': [
         {
           text: 'Getting Started',
@@ -75,6 +86,16 @@ export default defineConfig({
             { text: 'Compatibility', link: '/usage/compatibility' },
           ]
         },
+        {
+          text: 'Use Cases',
+          collapsed: false,
+          items: [
+            { text: 'Hybrid search', link: '/use-case/hybrid-search' },
+            { text: 'Adaptive retrieval', link: '/use-case/adaptive-retrieval' },
+            { text: 'Image search', link: '/use-case/image-search' },
+            { text: 'Multi tenancy', link: '/use-case/multi-tenancy' },
+          ],
+        },
         {
           text: 'Integration',
           collapsed: false,
@@ -103,18 +124,9 @@ export default defineConfig({
             { text: `Benchmarks`, link: '/faqs/benchmark' },
           ]
         },
-        {
-          text: 'Reference',
-          collapsed: false,
-          items: [
-            { text: 'Schema', link: '/reference/schema' },
-            { text: 'Indexing Options', link: '/reference/indexing_options' },
-            { text: 'Search Options', link: '/reference/search_options' },
-          ]
-        },
         {
           text: 'Community',
-          collapsed: false,
+          collapsed: true,
           items: [
             { text: 'Contributing to pgvecto.rs', link: '/community/contributing' },
             { text: 'Roadmap', link: '/community/roadmap' },
@@ -123,23 +135,13 @@ export default defineConfig({
         },
         {
           text: 'Developers',
-          collapsed: false,
+          collapsed: true,
           items: [
             { text: 'Development tutorial', link: '/developers/development' },
           ],
         },
       ],
-      '/tutorial/': [
-        {
-          text: 'Use Cases',
-          collapsed: false,
-          items: [
-            { text: 'Hybrid Search', link: '/tutorial/hybrid-search' },
-            { text: 'Image Search', link: '/tutorial/image-search' },
-            { text: 'Multi Tenancy', link: '/tutorial/multi-tenancy' },
-          ],
-        },
-      ]
+
     },
 
     socialLinks: [

diff --git a/src/faqs/benchmark.md b/src/faqs/benchmark.md
@@ -24,7 +24,7 @@ When the `vbase` mode is enabled, `pgvecto.rs` can achieve over **2x** more resp
 
 ## Shortening embeddings (a.k.a [Matryoshka Embedding](https://aniketrege.github.io/blog/2024/mrl/))
 
-OpenAI [latest embedding models](https://openai.com/blog/new-embedding-models-and-api-updates) supports shortening embeddings, where developers could simply “remove some numbers from the end of a sequence” and still maintain a valid representation for text.
+OpenAI [latest embedding models](https://openai.com/blog/new-embedding-models-and-api-updates) support shortening embeddings, where developers could simply “remove some numbers from the end of a sequence” and still maintain a valid representation for text.
 
 > This enables very flexible usage. For example, when using a vector data store that only supports embeddings up to 1024 dimensions long, developers can now still use our best embedding model text-embedding-3-large and specify a value of 1024 for the dimensions API parameter, which will shorten the embedding down from 3072 dimensions, trading off some accuracy in exchange for the smaller vector size.
 
@@ -34,16 +34,16 @@ These are the performance benchmarks for the shortened embeddings. The [dbpedia-
 
 **First Pass** in the benchmark shows that the accuracy of the shortened embeddings is not as good as the full embeddings. 256D HNSW index only achieves ~65% accuracy, while 1024D HNSW index achieves ~85% accuracy. 
 
-However, the accuracy of the shortened embeddings surpasses 95% when adaptive retrieval is employed. Additionally, adopting this approach leads to significant improvements in terms of RPS (Requests Per Second) and latency.
+However, the accuracy of the shortened embeddings surpasses 95% when [adaptive retrieval](/use-case/adaptive-retrieval) is employed. Additionally, adopting this approach leads to significant improvements in terms of RPS (Requests Per Second) and latency.
 
 ## Binary vectors
 
 `pgvecto.rs` supports [binary vectors](/usage/vector-types.html#bvector-binary-vector). The `bvector` type is optimized for storage and performance. It uses a bit-packed representation to store the binary vector. The distance calculation is also optimized for binary vectors.
 
 Here are some performance benchmarks for the `bvector` type. We use the [dbpedia-entities-openai3-text-embedding-3-large-3072-1M](https://huggingface.co/datasets/Qdrant/dbpedia-entities-openai3-text-embedding-3-large-3072-1M) dataset for the benchmark. The VM is n2-standard-8 (8 vCPUs, 32 GB memory) on Google Cloud.
 
-We upsert 1M binary vectors into the table and then run a KNN query for each embedding. It only takes about 600MB memory to index 1M binary vectors.
+We upsert 1M binary vectors into the table and then run a KNN query for each embedding. It only takes about 600MB memory to index 1M binary vectors, while the `vector` type takes about 18GB memory to index the same number of vectors.
 
 ![bvector](./images/bvector.png)
 
-We can see that the `bvector`'s accuracy is not as good as the `vector` type, but it exceeds 95%  if we adopt adaptive retrieval.
+We can see that the `bvector`'s accuracy is not as good as the `vector` type, but it exceeds 95%  if we adopt [adaptive retrieval](/use-case/adaptive-retrieval).
diff --git a/src/reference/index.md b/src/reference/index.md
@@ -0,0 +1,7 @@
+# Reference
+
+The page shows the reference.
+
+- [Indexing options](/reference/indexing_options)
+- [Search options](/reference/search_options)
+- [Schema](/reference/schema)
diff --git a/src/usage/images/bvector.png b/src/usage/images/bvector.png
diff --git a/src/usage/indexing.md b/src/usage/indexing.md
@@ -126,4 +126,4 @@ Smaller [`m`](/reference/indexing_options.html#options-for-table-hnsw)) values a
 
 The [`ef_construction`](/reference/indexing_options.html#options-for-table-hnsw) parameter determines the dynamic candidate list size when adding new nodes; increasing this value may enhance recall but could extend index construction time.
 
-The HNSW index is resource-intensive, requiring additional RAM and an adjustment to the `maintenance_work_mem` setting for larger datasets. If you're seeking performance that's markedly faster than IVF, with a high recall rate and scalability that matches dataset size, HNSW is an excellent option.
+The HNSW index is resource-intensive, requiring additional RAM and an adjustment to the `maintenance_work_mem` setting for larger datasets. If you're seeking performance that's markedly faster than IVF, with a high recall rate and scalability that matches dataset size, HNSW is an excellent option.
diff --git a/src/usage/vector-types.md b/src/usage/vector-types.md
@@ -38,8 +38,8 @@ The `bvector` type is optimized for storage and performance. It uses a bit-packe
 
 Here are some performance benchmarks for the `bvector` type. We use the [dbpedia-entities-openai3-text-embedding-3-large-3072-1M](https://huggingface.co/datasets/Qdrant/dbpedia-entities-openai3-text-embedding-3-large-3072-1M) dataset for the benchmark. The VM is n2-standard-8 (8 vCPUs, 32 GB memory) on Google Cloud.
 
-We upsert 1M binary vectors into the table and then run a KNN query for each embedding. It only takes about 600MB memory to index 1M binary vectors.
+We upsert 1M binary vectors into the table and then run a KNN query for each embedding. It only takes about 600MB memory to index 1M binary vectors, while the `vector` type takes about 18GB memory to index the same number of vectors.
 
 ![bvector](./images/bvector.png)
 
-We can see that the `bvector`'s accuracy is not as good as the `vector` type, but it exceeds 95%  if we adopt adaptive retrieval.
+We can see that the `bvector`'s accuracy is not as good as the `vector` type, but it exceeds 95%  if we adopt [adaptive retrieval](/use-case/adaptive-retrieval).
diff --git a/src/use-case/adaptive-retrieval.md b/src/use-case/adaptive-retrieval.md
@@ -0,0 +1,246 @@
+# Adaptive retrieval and shortening embeddings
+
+OpenAI [latest embedding models](https://openai.com/blog/new-embedding-models-and-api-updates) support shortening embeddings, where developers could simply “remove some numbers from the end of a sequence” and still maintain a valid representation for text.
+
+> This enables very flexible usage. For example, when using a vector data store that only supports embeddings up to 1024 dimensions long, developers can now still use our best embedding model text-embedding-3-large and specify a value of 1024 for the dimensions API parameter, which will shorten the embedding down from 3072 dimensions, trading off some accuracy in exchange for the smaller vector size.
+
+## Shortening embeddings
+
+OpenAI's new embedding model `text-embedding-3-large` produces embeddings with 3072 dimensions by default.
+
+![](./adaptive-retrieval/shortening-embedding.svg)
+
+But you could safely remove some numbers from the end of the sequence and still maintain a valid representation for text. For example, you could shorten the embeddings to 1024 dimensions.
+
+::: details
+
+You may need normalization to ensure that the shortened embeddings compatible with some distance calculation e.g. dot poroduct. OpenAI's API will help you on this if you are calling `text-embedding-3-large` to generate a lower dimension embedding directly, instead of truncating the original embeddings on your own.
+
+```python
+embedding_3072_dim_from_openai = [-0.00611, -0.00269, ...]
+embedding_1024_dim_from_openai = [-0.01520, -0.03839, ...]
+
+embedding_1024_dim_truncated_from_3072_dim = [-0.00611, -0.00269, ...]
+embedding_1024_dim_normalized = [-0.01520, -0.03839, ...]
+```
+
+:::
+
+There are lots of benefits to using shortening embeddings:
+
+- **Reduced memory usage**: Shortening embeddings can reduce the memory usage to store the embeddings.
+- **Lower dimensions**: Shortening embeddings can reduce the dimensions of the embeddings. Some vector data stores only support embeddings up to a certain number of dimensions (e.g. 2000).
+- **Faster indexing and search**: Smaller embeddings can be indexed and searched faster than larger embeddings.
+- **Trade-off performance and cost**: Shortening embeddings can trade off some accuracy in exchange for the smaller vector size.
+
+However, the accuracy of the shortened embeddings is not as good as the full embeddings.
+
+## Trade-off performance and cost
+
+We have performed an experiment to evaluate the effectiveness of reduced embeddings compared to the complete embeddings. The benchmark was conducted using the [dbpedia-entities-openai3-text-embedding-3-large-3072-1M](https://huggingface.co/datasets/Qdrant/dbpedia-entities-openai3-text-embedding-3-large-3072-1M) dataset. The experiment was carried out on a Google Cloud VM with specifications of `n2-standard-8`, which includes 8 vCPUs and 32 GB of memory.
+
+We have established a table and populated it with 1 million embeddings, each consisting of 3072 dimensions.
+
+```sql
+CREATE TABLE openai3072 (
+  id bigserial PRIMARY KEY,
+  text_embedding_3_large_3072_embedding vector(3072),
+  text_embedding_3_large_3072_embedding bvector(3072)
+);
+```
+
+::: details
+
+The code for inserting 1M embeddings into the table:
+
+```python
+from pgvecto_rs.sdk import PGVectoRs, Record
+from pgvecto_rs.psycopg import register_vector
+import psycopg
+
+conn = psycopg.connect(conninfo='postgresql://postgres:mysecretpassword@localhost:5433/postgres', autocommit=True)
+
+import json
+import os
+
+import numpy as np
+import random
+from datasets import load_dataset
+from datasets.exceptions import DatasetNotFoundError
+from tqdm import tqdm
+
+MODEL_NAME, DIMENSIONS = "text-embedding-3-large", 3072
+DATASET_NAME = f"Qdrant/dbpedia-entities-openai3-{MODEL_NAME}-{DIMENSIONS}-1M"
+collection_name = f"dbpedia-{MODEL_NAME}-{DIMENSIONS}"
+embedding_column_name = f"{MODEL_NAME}-{DIMENSIONS}-embedding"
+
+dataset = load_dataset(
+        DATASET_NAME,
+        streaming=False,
+        split="train",
+    )
+
+bs = 1000
+for i, record in tqdm(enumerate(dataset)):
+    if i % bs == 0:
+        points = []
+    points.append({
+        "embedding": record[embedding_column_name]
+    })
+    if i % bs == bs - 1:
+        batch_points = f", ".join([f"('{p['embedding']}')" for p in points])
+        conn.execute(f"INSERT INTO openai3072 (text_embedding_3_large_3072_embedding) VALUES %s" % (batch_points))
+    print(f"Inserted {i} records")
+```
+
+:::
+
+Furthermore, we have created indexes for embeddings with dimensions of 256, 1024 and 3072, and performed an ANN (Approximate Nearest Neighbors) query for each dimension. There is no need to create the 256-dimensional column, as we can use the first 256 dimensions of the 3072-dimensional embeddings.
+
+```sql
+-- Create 256-dim index
+CREATE INDEX openai_vector_index_256 ON public.openai3072 USING vectors (((text_embedding_3_large_3072_embedding[0:256])::vector(256)) vector_l2_ops);
+
+-- Create 1024-dim index
+CREATE INDEX openai_vector_index_1024 on openai3072 using vectors((text_embedding_3_large_3072_embedding[0:1024]::vector(1024)) vector_l2_ops);
+
+-- Create 3072-dim index
+CREATE INDEX openai_vector_index on openai3072 using vectors(text_embedding_3_large_3072_embedding vector_l2_ops);
+```
+
+Additionally, we have constructed a [binary vector index](/usage/vector-types.html#bvector-binary-vector) for the 3072-dimensional embeddings and conducted an ANN query using this index as well.
+
+```sql
+CREATE INDEX openai_vector_index_bvector ON public.openai3072 USING vectors (text_embedding_3_large_3072_bvector bvector_l2_ops);
+```
+
+We run the KNN query with different limits (5, 10, 50, 100) for the number of neighbors to retrieve. 
+
+::: details
+
+The QPS (Queries Per Second) and the average query time were measured for each query with the help of `pgbench`.
+
+```bash
+export PGHOST="localhost"
+export PGUSER="postgres"
+export PGPASSWORD="mysecretpassword"
+export PGDATABASE="postgres"
+export PGPORT=5433
+export VECTOR_DIM=3072
+TEST_TIME=300
+for clients in 2 4 8 16 32; do
+    pgbench -n -T "${TEST_TIME}" \
+      --file=query.sql \
+      -c "${clients}" -j "${clients}"
+done
+```
+
+The `query.sql` file contains the following SQL queries:
+
+```sql
+\set random_id random(1, 1000000)
+
+select id from openai3072 order by text_embedding_3_large_3072_embedding <-> (select text_embedding_3_large_3072_embedding from openai3072 where id = :random_id) limit 100;
+```
+
+:::
+
+The results of the experiment are shown here:
+
+![](./adaptive-retrieval/first-pass.png)
+
+As anticipated, the 3072-dimensional embeddings exhibit the highest accuracy, while the 1024-dimensional embeddings demonstrate a slightly lower accuracy at around 85%. The 256-dimensional embeddings yield the lowest accuracy, approximately 65%. On the other hand, the binary vector index achieves an accuracy of about 80%.
+
+Regarding the Requests Per Second (RPS) metric, the binary vector index showcases the most efficient performance, followed by the 256-dimensional embeddings, the 1024-dimensional embeddings, and finally, the 3072-dimensional embeddings.
+
+![](./adaptive-retrieval/memusage.png)
+
+::: details
+
+![table](./adaptive-retrieval/first-pass-tab.png)
+
+:::
+
+The memory usage of the indexes is an important aspect to take into account. Indexing 1 million 3072-dimensional binary vectors requires only around 600MB of memory. In contrast, the 3072-dimensional vector index consumes approximately 18GB of memory. This represents a significant difference in memory usage, as the 3072-dimensional vector index utilizes approximately **30x more memory than the binary vector index**.
+
+## Improve the accuracy via adaptive retrieval
+
+Lower dimension and binary vector indexes trade off accuracy for advantages such as higher RPS and lower memory usage compared to full dimension indexes. Adaptive retrieval techniques can be used to combine the strengths of both approaches, achieving an optimized solution that balances accuracy, memory usage, and query speed.
+
+The logic behind adaptive retrieval is really simple. Let's take `get top 100 candidates` as an example. We can perform the following steps:
+
+1. **Query the lower dimensional or binary vector index** first to retrieve 200 candidates from the 1 million embeddings. This is a fast operation.
+2. **Rerank the candidates using a KNN query** to retrieve the top 100 candidates. K-NN is well-suited for situations where smaller sets and precise similarity search are necessary, making it an excellent choice for reranking in this context.
+
+![](./adaptive-retrieval/adaptive-retrieval.svg)
+
+The reranking step is a bit slower than the initial query, but it is still much faster than querying the full dimension index. It could be done in the pgvecto.rs:
+
+```sql
+CREATE OR REPLACE FUNCTION match_documents_adaptive(
+  query_embedding vector(3072),
+  match_count int
+)
+RETURNS SETOF openai3072
+LANGUAGE SQL
+AS $$
+WITH shortlist AS (
+  SELECT *
+  FROM openai3072
+  ORDER BY (text_embedding_3_large_3072_embedding[0:256])::vector(256) <-> (query_embedding[0:256])::vector(256)
+  LIMIT match_count * 2
+)
+SELECT *
+FROM shortlist
+ORDER BY text_embedding_3_large_3072_embedding <-> query_embedding
+LIMIT match_count;
+$$;
+```
+
+The `match_documents_adaptive` function accepts a query embedding and a match count as input parameters. It first retrieves `match_count * 2` candidates from the 1 million embeddings using the 256-dimensional index. Then, it reranks the candidates using a KNN query to retrieve the top `match_count` candidates.
+
+The function for binary vector and 1024-dimensional embeddings can be implemented in a similar manner.
+
+We conduct the benchmark again with the adaptive retrieval technique. We tag the adaptive retrieval technique as `Adaptive Retrieval` in the following figure.
+
+![adaptive retrieval benchmark](./adaptive-retrieval/adaptive-retrieval-bench.png)
+
+Adding a reranking step significantly improves the accuracy of the lower dimensional and binary vector indexes. 
+
+The binary vector index with adaptive retrieval achieves an accuracy of around 95%, with only a slight decrease in RPS compared to the binary vector index without adaptive retrieval. Remarkably, the memory usage of the binary vector index with adaptive retrieval is only around 600MB, which is the same as the binary vector index without adaptive retrieval.
+
+![](../usage/images/bvector.png)
+
+::: details
+
+![table](./adaptive-retrieval/adaptive-retrieval-tab.png)
+
+:::
+
+## Conclusion
+
+Shortening embeddings can reduce the memory usage to store the embeddings, and speed up indexing and search. However, the accuracy of the shortened embeddings is not as good as the full embeddings. That's why we need to use adaptive retrieval techniques to improve the accuracy of the lower dimensional indexes.
+
+It seems that binary vector indexes with adaptive retrieval are the best choice for the `text-embedding-3-large` model. You could also consider 1024-dimensional indexes with adaptive retrieval if you need a higher accuracy (e.g. 99%).
+
+PS: All the code and data used in this experiment are available in the [pgvecto.rs-matryoshka-embeddings-benchmark](https://github.com/tensorchord/pgvecto.rs-matryoshka-embeddings-benchmark).
+
+<style>
+code {
+  white-space: pre-wrap !important;
+  counter-reset: step;
+  counter-increment: step 0;
+  min-width: calc(100% - 40px) !important;
+}
+
+.line::before {
+  content: counter(step);
+  counter-increment: step;
+  width: 2ch;
+  margin-right: 36px;
+  margin-left: calc(-36px - 2ch);
+  display: inline-block;
+  text-align: right;
+  color: var(--vp-code-line-number-color);
+}
+</style>
diff --git a/src/use-case/adaptive-retrieval/adaptive-retrieval-bench.png b/src/use-case/adaptive-retrieval/adaptive-retrieval-bench.png
diff --git a/src/use-case/adaptive-retrieval/adaptive-retrieval-tab.png b/src/use-case/adaptive-retrieval/adaptive-retrieval-tab.png
diff --git a/src/use-case/adaptive-retrieval/adaptive-retrieval.svg b/src/use-case/adaptive-retrieval/adaptive-retrieval.svg
diff --git a/src/use-case/adaptive-retrieval/first-pass-tab.png b/src/use-case/adaptive-retrieval/first-pass-tab.png
diff --git a/src/use-case/adaptive-retrieval/first-pass.png b/src/use-case/adaptive-retrieval/first-pass.png
diff --git a/src/use-case/adaptive-retrieval/memusage.png b/src/use-case/adaptive-retrieval/memusage.png