Skip to content

Commit

Permalink
WIP
Browse files Browse the repository at this point in the history
Signed-off-by: Ce Gao <cegao@tensorchord.ai>
  • Loading branch information
gaocegege committed Mar 7, 2024
1 parent ba8a781 commit 52cc6e2
Show file tree
Hide file tree
Showing 4 changed files with 57 additions and 3 deletions.
57 changes: 54 additions & 3 deletions src/use-case/adaptive-retrieval.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ OpenAI's new embedding model `text-embedding-3-large` produces embeddings with 3

But you could safely remove some numbers from the end of the sequence and still maintain a valid representation for text. For example, you could shorten the embeddings to 1024 dimensions.

::: info
::: details

You may need normalization to ensure that the shortened embeddings compatible with some distance calculation e.g. dot poroduct. OpenAI's API will help you on this if you are calling `text-embedding-3-large` to generate a lower dimension embedding directly, instead of truncating the original embeddings on your own.

Expand Down Expand Up @@ -108,7 +108,7 @@ CREATE INDEX openai_vector_index_1024 on openai3072 using vectors((text_embeddin
CREATE INDEX openai_vector_index on openai3072 using vectors(text_embedding_3_large_3072_embedding vector_l2_ops);
```

Additionally, we have constructed a binary vector index for the 3072-dimensional embeddings and conducted an ANN query using this index as well.
Additionally, we have constructed a [binary vector index](/usage/vector-types.html#bvector-binary-vector) for the 3072-dimensional embeddings and conducted an ANN query using this index as well.

```sql
CREATE INDEX openai_vector_index_bvector ON public.openai3072 USING vectors (text_embedding_3_large_3072_bvector bvector_l2_ops);
Expand Down Expand Up @@ -149,14 +149,65 @@ The results of the experiment are shown here:

![](./adaptive-retrieval/first-pass.png)

The 3072-dimensional embeddings have the best accuracy as we would expect. The 1024-dimensional embeddings have a slightly lower accuracy ~85%, and the 256-dimensional embeddings have the lowest accuracy ~65%. The binary vector index has ~80%.
As anticipated, the 3072-dimensional embeddings exhibit the highest accuracy, while the 1024-dimensional embeddings demonstrate a slightly lower accuracy at around 85%. The 256-dimensional embeddings yield the lowest accuracy, approximately 65%. On the other hand, the binary vector index achieves an accuracy of about 80%.

Regarding the Requests Per Second (RPS) metric, the binary vector index showcases the most efficient performance, followed by the 256-dimensional embeddings, the 1024-dimensional embeddings, and finally, the 3072-dimensional embeddings.

![](./adaptive-retrieval/memusage.png)

::: details

![table](./adaptive-retrieval/first-pass-tab.png)

:::

The memory usage of the indexes is an important aspect to take into account. Indexing 1 million 3072-dimensional binary vectors requires only around 600MB of memory. In contrast, the 3072-dimensional vector index consumes approximately 18GB of memory. This represents a significant difference in memory usage, as the 3072-dimensional vector index utilizes approximately **30x more memory than the binary vector index**.

## Improve the accuracy via adaptive retrieval

Lower dimension and binary vector indexes trade off accuracy for advantages such as higher RPS and lower memory usage compared to full dimension indexes. Adaptive retrieval techniques can be used to combine the strengths of both approaches, achieving an optimized solution that balances accuracy, memory usage, and query speed.

The logic behind adaptive retrieval is really simple. Let's take `get top 100 candidates` as an example. We can perform the following steps:

1. **Query the lower dimensional or binary vector index** first to retrieve 200 candidates from the 1 million embeddings. This is a fast operation.
2. **Rerank the candidates using a KNN query** to retrieve the top 100 candidates. K-NN is well-suited for situations where smaller sets and precise similarity search are necessary, making it an excellent choice for reranking in this context.

![](./adaptive-retrieval/adaptive-retrieval.svg)

The reranking step is a bit slower than the initial query, but it is still much faster than querying the full dimension index.

::: details

It could be done in the pgvecto.rs:

```sql
CREATE OR REPLACE FUNCTION match_documents_adaptive(
query_embedding vector(3072),
match_count int
)
RETURNS SETOF openai3072
LANGUAGE SQL
AS $$
WITH shortlist AS (
SELECT *
FROM openai3072
ORDER BY (text_embedding_3_large_3072_embedding[0:256])::vector(256) <-> (query_embedding[0:256])::vector(256)
LIMIT match_count * 2
)
SELECT *
FROM shortlist
ORDER BY text_embedding_3_large_3072_embedding <-> query_embedding
LIMIT match_count;
$$;
```

The `match_documents_adaptive` function accepts a query embedding and a match count as input parameters. It first retrieves `match_count * 2` candidates from the 1 million embeddings using the 256-dimensional index. Then, it reranks the candidates using a KNN query to retrieve the top `match_count` candidates.

The function for binary vector and 1024-dimensional embeddings can be implemented in a similar manner.

:::

We conduct the benchmark again with the adaptive retrieval technique. We tag the adaptive retrieval technique as `Adaptive Retrieval` in the following figure. The results without adaptive retrieval are also included for comparison.

<style>
code {
Expand Down
3 changes: 3 additions & 0 deletions src/use-case/adaptive-retrieval/adaptive-retrieval.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified src/use-case/adaptive-retrieval/first-pass.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added src/use-case/adaptive-retrieval/memusage.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 52cc6e2

Please sign in to comment.