Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: get relevance scores with milvus #5062

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

cheriL
Copy link

@cheriL cheriL commented Nov 8, 2024

To fix: #5061

@dosubot dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Nov 8, 2024
@cheriL
Copy link
Author

cheriL commented Nov 12, 2024

@liunux4odoo PTAL

@ZamboLin
Copy link

你这个很奇怪啊
similarity_search_with_relevance_scores 深究下去还是similarity_search_with_score,similarity_search_with_score结果是NotImplementedError,但却能运行?
但是计算出向量似乎没有归一化。。

@cheriL
Copy link
Author

cheriL commented Dec 12, 2024

你这个很奇怪啊 similarity_search_with_relevance_scores 深究下去还是similarity_search_with_score,similarity_search_with_score结果是NotImplementedError,但却能运行? 但是计算出向量似乎没有归一化。。

similarity_search_with_score()方法是在各个 vector store 子类实现的,归一化是调用relevance_score_fn去做的

@ZamboLin
Copy link

similarity_search_with_score()方法是在各个 vector store 子类实现的,归一化是调用relevance_score_fn去做的

是的,我看到归一化处理了_select_relevance_score_fn的 1 - l2_distance / 4.0,,应该没有问题。
然后我打印了一下l2_distance ,发现结果有点奇怪,怀疑是不是哪一步没有归一化,然后翻了半天只找到了NotImplementedError。。是我的langchian库版本不对吗
{
"l2_distance": 566.509521484375
}
{
"l2_distance": 617.7225341796875
}
{
"l2_distance": 642.596923828125
}
{
"l2_distance": 566.509521484375
}
{
"l2_distance": 617.7225341796875
}
{
"l2_distance": 642.596923828125

langchain                   0.1.17

langchain-chatchat 0.3.1 /home/admin123/LLM4/chatchat
langchain-community 0.0.36
langchain-core 0.1.53
langchain-experimental 0.0.58
langchain-milvus 0.1.7
langchain-openai 0.0.6
langchain-text-splitters 0.0.2
langchainhub 0.1.14

@cheriL
Copy link
Author

cheriL commented Dec 13, 2024

@ZamboLin 你的 langchain 版本比较低了,我的

langchain                0.2.16
langchain-community      0.2.16
langchain-core           0.2.39

@ZamboLin
Copy link

langchain                0.2.16
langchain-community      0.2.16
langchain-core           0.2.39

感觉提高,虽然更新了,但是,,据公式反算l_distance 也是600.。
护眼罩/戴防护面具。'), -140.55015563964844)]
docs_and_similarities = self.vectorstore.similarity_search_with_relevance_scores(query, **self.search_kwargs)
2024-12-13 15:33:50,801 langchain_core.vectorstores.base 18401 WARNING No relevant docs were retrieved using the relevance score threshold 0.5

根据vector store 子类,确实找到具体计算了。
res = self.col.search(
data=[embedding],
anns_field=self._vector_field,
param=param,
limit=k,
expr=expr,
output_fields=output_fields,
timeout=timeout,
**kwargs,
)
从这里着手看看情况了。。

@ZamboLin
Copy link

找出问题了。。,milvus用col.search 计算的时候 无论是query的embedding还是库里的向量,都不会做归一化的,。。除非自己做。。。😑

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:M This PR changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] 使用milvus向量库查询时逻辑有误
2 participants