Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] 使用milvus向量库查询时逻辑有误 #5061

Open
cheriL opened this issue Nov 8, 2024 · 1 comment · May be fixed by #5062
Open

[BUG] 使用milvus向量库查询时逻辑有误 #5061

cheriL opened this issue Nov 8, 2024 · 1 comment · May be fixed by #5062
Labels
bug Something isn't working

Comments

@cheriL
Copy link

cheriL commented Nov 8, 2024

问题描述 / Problem Description

使用milvus向量库查询时逻辑有误,导致结果不准确:

  1. 当前是通过similarity_search_with_score()方法获取docs,但实际上这个方法返回的不是相似度,如果是L2返回的实际是distance,应调用similarity_search_with_relevance_scores方法;
  2. 当前chatchat使用的是langchain_community中的milvus库,这个库实际上已经废弃且未实现上述方法中的具体逻辑,这边参考了langchain_milvus实现;

复现问题的步骤 / Steps to Reproduce

search_doc()即可测试

预期的结果 / Expected Result

返回准确的doc.

实际结果 / Actual Result

环境信息 / Environment Information

  • Langchain-Chatchat 版本 / commit 号:master 分支
  • 部署方式(pypi 安装 / 源码部署 / docker 部署):源码部署
  • 使用的模型推理框架(Xinference / Ollama / OpenAI API 等):
  • 使用的 LLM 模型(GLM-4-9B / Qwen2-7B-Instruct 等):
  • 使用的 Embedding 模型(bge-large-zh-v1.5 / m3e-base 等):bge-large-zh-v1.5
  • 使用的向量库类型 (faiss / milvus / pg_vector 等): milvus
  • 操作系统及版本 / Operating system and version: linux ubuntu22.04
  • Python 版本 / Python version: 3.10
  • 推理使用的硬件(GPU / CPU / MPS / NPU 等) / Inference hardware (GPU / CPU / MPS / NPU, etc.): GPU
  • 其他相关环境信息 / Other relevant environment information:

附加信息 / Additional Information
添加与问题相关的任何其他信息 / Add any other information related to the issue.

@cheriL cheriL added the bug Something isn't working label Nov 8, 2024
@cheriL cheriL linked a pull request Nov 8, 2024 that will close this issue
@Yanhuanjin
Copy link

Yanhuanjin commented Nov 20, 2024

看到了commit👍

请问楼主解决了吗?我前段时间也碰到了这个问题,当时我是通过将判断相似度>改成判断距离<完成的,但其实很难决定这个距离的阈值设定到多少合适。今天看了下Faiss的标准化方式,欧拉距离是用1-距离/根号2,langchain那边写的是1-距离/4,但将其改到我的code,报错提示对于multi vectors并不适用,难道是因为我用的L2+HNSW索引?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants