LLM RAG

PoC for assistant for company-internal knowledge-base.

Featuring

Retrieval Augmented Generation (RAG) pattern / approach.
AWS ready:
- Deployed on EC2
- Uses SageMaker Endpoints for:
  - Text embeddings model
  - Generative LLM model
- Architecture open to integrate a production-grade vector DB (AWS RDS/pg_vector, AWS OpenSearch k-NN, etc)
...yet rapid and free localhost development still possible:
- Integrated local embeddings model
- Integrated local vector database (ChromaDB)
- Integrated free LLM API by replicate.com (only for non-proprietary data)
Loosely coupled and easily swappable components:
- Chunker
- Embedder
- VectorDB
- LLM
- Assistant
Tentative PoC choices:
- LLAMA-2 7B as generative LLM
- All-mpnet-base-v2 for semantic text embeddings (paragraph / section large)
Multiple assistants for easier testing:
- LLM+RAG assitant
- RAG only assistant (to assess retrieval quality)
- LLM only assistant (to have a baseline for LLM+RAG assesment)

Tested on Python 3.10.12.

After cloning the repository, run:

cd llmrag
bin/reset-venv  # creates .venv and installs dependencies

This will take a while because we pull in the heavyweight sentence_transformers dependency :{

Vectorized public documents are conveniently embedded in the project (db/chroma), so you can go straight to running the webapp.

To run webserver locally:

bin/webserver

You will also need to create: config/replicate_api_key.txt # paste API key from https://replicate.com/

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
bin		bin
config		config
db		db
llmrag		llmrag
test		test
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt