Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Azure Hybrid Search results are not consistent #209

Open
AVIN8233 opened this issue Mar 13, 2024 · 1 comment
Open

Azure Hybrid Search results are not consistent #209

AVIN8233 opened this issue Mar 13, 2024 · 1 comment

Comments

@AVIN8233
Copy link

Hi team,

I am running Azure Hybrid Search on my data which has 12 pdfs[423 chunks], that I am embedding in my vector store, and getting top 12 chunks for a query.

Code snippet:
vector_query = VectorizedQuery(
vector=query_embeddings, k_nearest_neighbors=60,
fields="contentVector")

    results = await self.search_client.search(
        search_text=query,
        vector_queries=[vector_query],
        top=12,
        filter=filter_expression
    )
    
    but the problem I am facing is, the top 12 results are not consistent, and they are changing with different iterations. To solve this I used exhaustiveKNN as well, but it didn't help. Upon reading Azure blogs, I found that some stochasticity may come from BM25, so I set the parameter scoring_statistics='global' & also added session_id.
  
    Code snippet:
    vector_query = VectorizedQuery(
        vector=query_embeddings, k_nearest_neighbors=60,
        fields="contentVector", exhaustive=True)

    results = await self.search_client.search(
        search_text=query,
        vector_queries=[vector_query],
        top=12,
        filter=filter_expression, scoring_statistics='global'
    ) #session_id = 'abcd1234xyz',
    results = await self._format_metadata(results)

I request the team to please guide me how to get the same consistent output from Hybrid Search, given that I want to optimize based on Search Accuracy[how relevant chunks are] and time to embed and retrieve?

@farzad528
Copy link
Collaborator

farzad528 commented Mar 21, 2024

@AVIN8233 can you execute the two queries independently (BM25 and Vector Search) and see if the order is consistent? also, how many replicas do you have on your search service?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants