You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am running Azure Hybrid Search on my data which has 12 pdfs[423 chunks], that I am embedding in my vector store, and getting top 12 chunks for a query.
results = await self.search_client.search(
search_text=query,
vector_queries=[vector_query],
top=12,
filter=filter_expression
)
but the problem I am facing is, the top 12 results are not consistent, and they are changing with different iterations. To solve this I used exhaustiveKNN as well, but it didn't help. Upon reading Azure blogs, I found that some stochasticity may come from BM25, so I set the parameter scoring_statistics='global' & also added session_id.
Code snippet:
vector_query = VectorizedQuery(
vector=query_embeddings, k_nearest_neighbors=60,
fields="contentVector", exhaustive=True)
results = await self.search_client.search(
search_text=query,
vector_queries=[vector_query],
top=12,
filter=filter_expression, scoring_statistics='global'
) #session_id = 'abcd1234xyz',
results = await self._format_metadata(results)
I request the team to please guide me how to get the same consistent output from Hybrid Search, given that I want to optimize based on Search Accuracy[how relevant chunks are] and time to embed and retrieve?
The text was updated successfully, but these errors were encountered:
@AVIN8233 can you execute the two queries independently (BM25 and Vector Search) and see if the order is consistent? also, how many replicas do you have on your search service?
Hi team,
I am running Azure Hybrid Search on my data which has 12 pdfs[423 chunks], that I am embedding in my vector store, and getting top 12 chunks for a query.
Code snippet:
vector_query = VectorizedQuery(
vector=query_embeddings, k_nearest_neighbors=60,
fields="contentVector")
I request the team to please guide me how to get the same consistent output from Hybrid Search, given that I want to optimize based on Search Accuracy[how relevant chunks are] and time to embed and retrieve?
The text was updated successfully, but these errors were encountered: