From 52cc6e290f9e8fe24140f828957c69a47dee1aba Mon Sep 17 00:00:00 2001 From: Ce Gao Date: Thu, 7 Mar 2024 12:50:54 +0800 Subject: [PATCH] WIP Signed-off-by: Ce Gao --- src/use-case/adaptive-retrieval.md | 57 +++++++++++++++++- .../adaptive-retrieval/adaptive-retrieval.svg | 3 + .../adaptive-retrieval/first-pass.png | Bin 89511 -> 84009 bytes src/use-case/adaptive-retrieval/memusage.png | Bin 0 -> 29370 bytes 4 files changed, 57 insertions(+), 3 deletions(-) create mode 100644 src/use-case/adaptive-retrieval/adaptive-retrieval.svg create mode 100644 src/use-case/adaptive-retrieval/memusage.png diff --git a/src/use-case/adaptive-retrieval.md b/src/use-case/adaptive-retrieval.md index 60ea8d5..dbd048d 100644 --- a/src/use-case/adaptive-retrieval.md +++ b/src/use-case/adaptive-retrieval.md @@ -12,7 +12,7 @@ OpenAI's new embedding model `text-embedding-3-large` produces embeddings with 3 But you could safely remove some numbers from the end of the sequence and still maintain a valid representation for text. For example, you could shorten the embeddings to 1024 dimensions. -::: info +::: details You may need normalization to ensure that the shortened embeddings compatible with some distance calculation e.g. dot poroduct. OpenAI's API will help you on this if you are calling `text-embedding-3-large` to generate a lower dimension embedding directly, instead of truncating the original embeddings on your own. @@ -108,7 +108,7 @@ CREATE INDEX openai_vector_index_1024 on openai3072 using vectors((text_embeddin CREATE INDEX openai_vector_index on openai3072 using vectors(text_embedding_3_large_3072_embedding vector_l2_ops); ``` -Additionally, we have constructed a binary vector index for the 3072-dimensional embeddings and conducted an ANN query using this index as well. +Additionally, we have constructed a [binary vector index](/usage/vector-types.html#bvector-binary-vector) for the 3072-dimensional embeddings and conducted an ANN query using this index as well. ```sql CREATE INDEX openai_vector_index_bvector ON public.openai3072 USING vectors (text_embedding_3_large_3072_bvector bvector_l2_ops); @@ -149,7 +149,11 @@ The results of the experiment are shown here: ![](./adaptive-retrieval/first-pass.png) -The 3072-dimensional embeddings have the best accuracy as we would expect. The 1024-dimensional embeddings have a slightly lower accuracy ~85%, and the 256-dimensional embeddings have the lowest accuracy ~65%. The binary vector index has ~80%. +As anticipated, the 3072-dimensional embeddings exhibit the highest accuracy, while the 1024-dimensional embeddings demonstrate a slightly lower accuracy at around 85%. The 256-dimensional embeddings yield the lowest accuracy, approximately 65%. On the other hand, the binary vector index achieves an accuracy of about 80%. + +Regarding the Requests Per Second (RPS) metric, the binary vector index showcases the most efficient performance, followed by the 256-dimensional embeddings, the 1024-dimensional embeddings, and finally, the 3072-dimensional embeddings. + +![](./adaptive-retrieval/memusage.png) ::: details @@ -157,6 +161,53 @@ The 3072-dimensional embeddings have the best accuracy as we would expect. The 1 ::: +The memory usage of the indexes is an important aspect to take into account. Indexing 1 million 3072-dimensional binary vectors requires only around 600MB of memory. In contrast, the 3072-dimensional vector index consumes approximately 18GB of memory. This represents a significant difference in memory usage, as the 3072-dimensional vector index utilizes approximately **30x more memory than the binary vector index**. + +## Improve the accuracy via adaptive retrieval + +Lower dimension and binary vector indexes trade off accuracy for advantages such as higher RPS and lower memory usage compared to full dimension indexes. Adaptive retrieval techniques can be used to combine the strengths of both approaches, achieving an optimized solution that balances accuracy, memory usage, and query speed. + +The logic behind adaptive retrieval is really simple. Let's take `get top 100 candidates` as an example. We can perform the following steps: + +1. **Query the lower dimensional or binary vector index** first to retrieve 200 candidates from the 1 million embeddings. This is a fast operation. +2. **Rerank the candidates using a KNN query** to retrieve the top 100 candidates. K-NN is well-suited for situations where smaller sets and precise similarity search are necessary, making it an excellent choice for reranking in this context. + +![](./adaptive-retrieval/adaptive-retrieval.svg) + +The reranking step is a bit slower than the initial query, but it is still much faster than querying the full dimension index. + +::: details + +It could be done in the pgvecto.rs: + +```sql +CREATE OR REPLACE FUNCTION match_documents_adaptive( + query_embedding vector(3072), + match_count int +) +RETURNS SETOF openai3072 +LANGUAGE SQL +AS $$ +WITH shortlist AS ( + SELECT * + FROM openai3072 + ORDER BY (text_embedding_3_large_3072_embedding[0:256])::vector(256) <-> (query_embedding[0:256])::vector(256) + LIMIT match_count * 2 +) +SELECT * +FROM shortlist +ORDER BY text_embedding_3_large_3072_embedding <-> query_embedding +LIMIT match_count; +$$; +``` + +The `match_documents_adaptive` function accepts a query embedding and a match count as input parameters. It first retrieves `match_count * 2` candidates from the 1 million embeddings using the 256-dimensional index. Then, it reranks the candidates using a KNN query to retrieve the top `match_count` candidates. + +The function for binary vector and 1024-dimensional embeddings can be implemented in a similar manner. + +::: + +We conduct the benchmark again with the adaptive retrieval technique. We tag the adaptive retrieval technique as `Adaptive Retrieval` in the following figure. The results without adaptive retrieval are also included for comparison.