Skip to content

A repository of awesome open source projects to set up your own AI powered knowledge base

License

Notifications You must be signed in to change notification settings

jasonwcfan/awesome-rag

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

🧺 Awesome RAG Awesome GitHub Repo stars

Curated list of open source tools and projects to help with retrieval augmented generation

RAG stands for Retrieval Augmented Generation, a technique where the capabilities of a large language model (LLM) are augmented by retrieving information from other computer systems and providing them as context for the LLM through the prompt. This gives LLMs information beyond what was provided in their training data, which is critical for LLM applications to provide personalized responses. Example use cases include scraping data from current web pages, parsing data from PDFs and documents, and answering questions about data from Confluence, Salesforce or other SaaS apps.

RAG works better than fine-tuning models because it’s cheaper, it’s faster, and it’s more reliable since metadata about the sources of information is attached to each response.

We also have an open source project that makes setting up RAG on your own infrastructure super easy. Check it out here

Contributions welcome. Add links through pull requests or create an issue to start a discussion. Please read the contribution guidelines before contributing.

Table of Contents

Data Connectors

Tools for connecting to data sources

  • Psychic: Data integrations platform for LLMs with turnkey auth, syncs and an universal API. GitHub Repo stars

Storage

Tools and databases to store knowledge for retrieval.

Vector Databases

  • Chroma: The AI-native open-source embedding database. GitHub Repo stars
  • Qdrant: Vector Database for the next generation of AI applications. Also available in the cloud. GitHub Repo stars
  • Weaviate: Weaviate is an open source vector database that stores both objects and vectors, allowing for combining vector search with structured filtering with the fault-tolerance and scalability of a cloud-native database, all accessible through GraphQL, REST, and various language clients. GitHub Repo stars

Document Databases

Relational Databases

Graph Databases

  • Neo4j: Graphs for everyone GitHub Repo stars

Retrieval

  • LlamaIndex: LlamaIndex (GPT Index) is a data framework for your LLM applications GitHub Repo stars

LLMs

  • Llama 2: The next generation of Meta's open source large language model.
  • LlamaIndex: An ecosystem of open-source chatbots trained on a massive collections of clean assistant data including code, stories and dialogue GitHub Repo stars

Deployment

  • RAGStack: Chat your data privately with a self-hosted retrieval augmented generation (RAG) stack built on top of open-source LLMs like Falcon, Llama and GPT4All GitHub Repo stars

Articles

About

A repository of awesome open source projects to set up your own AI powered knowledge base

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published