Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for creating index by passing embeddings explicitly #1597

Closed
abhinav-upadhyay opened this issue Mar 11, 2023 · 1 comment
Closed

Comments

@abhinav-upadhyay
Copy link
Contributor

I have a use case where I want to be able to create multiple indices of the same set of documents, essentially each index will be built based on some criteria so that I can query from the right set of documents. (I am using FAISS at the moment which does not have great options for filtering within one giant index so they recommend creating multiple indices)

It would be expensive to generate embeddings by calling OpenAI APIs for each document multiple times to populate each of the indices. I think having an interface similar to add_texts and add_documents which allows the user to pass the embeddings explicitly might be an option to achieve this?

As I write, I think I might be able to get around by passing a wrapper function to FAISS as the embedding function which can internally cache the embeddings for each document and avoid the duplicate calls to the embeddings API.

However, creating this issue in case others also think that an add_embeddings API or something similar sounds like a good idea?

abhinav-upadhyay added a commit to abhinav-upadhyay/langchain that referenced this issue Mar 13, 2023
This allows the users to pass pre-created embeddings explicitly to be
indexed by the vector store. If the users wish to create the embeddings
for their documents and reuse them, this can save extra API calls to the
embeddings API endpoints

Fixes langchain-ai#1597
@dosubot
Copy link

dosubot bot commented Aug 17, 2023

Hi, @abhinav-upadhyay! I'm here to help the LangChain team manage their backlog, and I wanted to let you know that we are marking this issue as stale.

Based on my understanding, you are requesting support for creating multiple indices based on different criteria using explicit embeddings. You have suggested an add_embeddings API to avoid the need for generating embeddings multiple times for each document. However, there hasn't been any activity on this issue yet.

Could you please let us know if this issue is still relevant to the latest version of the LangChain repository? If it is, please comment on the issue to let the LangChain team know. Otherwise, feel free to close the issue yourself, or the issue will be automatically closed in 7 days.

Thank you for your understanding and contribution to the LangChain project!

@dosubot dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Aug 17, 2023
@dosubot dosubot bot closed this as not planned Won't fix, can't repro, duplicate, stale Aug 24, 2023
@dosubot dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Aug 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant