The first goal of this project is to build a robust RAG system that encapsulates the newest methods and models in an intuitive python interface. The second goal is to put all this logic on a server and expose REST APIs to a Next.js frontend that will include a Generative User Intergace. TBD
This is a work in progress. Right now I'm finishing up the core abstractions. Everything is broken down into a few core services:
data.py
- this includes the
Index
class (embedding database) - and the
Reranker
class
- this includes the
generators.py
- this containes the generators that use instructor to output schema-validated data
models.py
contains the data models
chat.py
- this class wraps around popular LLM APIs and exposes certain methods for interface
rag.py
- TODO: this class will have an end to end configurable rag solution and leverage each of the aforementioned abstractions
utils.py
- pretty self explanatory, just utils and configuration
- use
pytest
to run all local tests - use
pytest --external
to run all local tests, including those that make external API calls (which can take a while)
- Update the generator code to use Anthropic
- add reranking to the index class
- Implement the multiple query inside the index by searching all queries, then removing duplicates, before finally reranking
- finish the Cohere class in
chat.py
- finish the Chat factory class
- add tests for Chat factory class
- add ask_stream function to the abstract chat class
- sanity check all the new abstractions (see
notebooks/tests/sanity-check-chat.ipynb
)- finish cohere stream, others are working
- make a full RAG abstraction
- get all
chat.py
tests passing - updated
chat_stream
andprint_stream
to yield/return mulitple response objects - got a FastAPI server running, configured globals, middleware, routes, and models
- got the
/chat
endpoint working with OAI - figure out how to make the temp, model, and max_tokens params optional in
server/models/chat
- sanity check in a notebook the
/chat
endpoint with OAI, Anthropic, and Cohere - implement tests in
tests/server/routes/chat
- move server prints to logging?
- get the streaming endpoint working
- expose an all purpose chat endpoint that takes in params and returns a stream (FastAPI)
- rebuild logging to use one folder in the root dir
- implement the message logic in the
chat
functions- get all tests passing
- merge into main
- added more robust error handling to the chat routes
- added message validation rules for order
- add usage/cost monitoring per request
- add logging config for lib
- check multiple message support for all models (Claude is not working rn)
- integrate LlaMA 3 70B and 400B when it drops!! (with GROQ)
- write in the docs the process for integraging a new model (add the specific class, add to
Chat
factory class, add message convert messages, add to model config)
- write in the docs the process for integraging a new model (add the specific class, add to
- change the default RAG to persist=Fals
- add rerank 3
- build and text the ability to generate queries that include the best location/db to search for
- Add tool calls for openai and Anthropic
- add Instructor support for Chat models
- add rate limits to model config
- make model config into a data class, add embedding models and rerank models