Add support for different knowledge retrieval methods #2

transitive-bullshit · 2023-11-15T20:24:52Z

This is for the built-in retrieval tool.

Currently, the current knowledge retrieval implementation uses a very naive retrieval which simply returns the full contents of every attached file (source).

The current implementation also only support text file types like text/plain and markdown, as no preprocessing or conversions are done at the moment.

It shouldn't be too hard to add support for more legit knowledge retrieval approaches, which would require:

processForFileAssistant - File ingestion pre-processing for files marked with purpose: 'assistants'
- converting non-text files to a common format like markdown (this is probably the hardest step to do well across all of the most common file types)
- chunking files
- embedding chunks
- storing embeddings to an external vector store; make sure to store the file_id each chunk comes from for filtering purposes
retrievalTool - Performs knowledge retrieval for a given query on a set of file_ids for RAG.
- embed query
- semantic search over vector store filtering by the given file_ids

Integrations here with LangChain and/or LlamaIndex would be great for their flexibility, but we could also KISS and roll out own using https://github.com/dexaai/dexter

The text was updated successfully, but these errors were encountered:

transitive-bullshit added enhancement New feature or request help wanted Extra attention is needed good first issue Good for newcomers labels Nov 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for different knowledge retrieval methods #2

Add support for different knowledge retrieval methods #2

transitive-bullshit commented Nov 15, 2023 •

edited

Loading

Add support for different knowledge retrieval methods #2

Add support for different knowledge retrieval methods #2

Comments

transitive-bullshit commented Nov 15, 2023 • edited Loading

transitive-bullshit commented Nov 15, 2023 •

edited

Loading