This is a chatbot that can answer questions about the Duke MEng AIPI program. Built using streamlit as the frontend and a Mistral-7B model fine tuned on instruction data
- Scraped the internal and external program websites for the Duke MEng AIPI program.
- Iterated over each link present in the
sitemap.xml
file - Extracted the text from each link and saved it in a JSON file.
- Also copied over the FAQ doc from the internal program website
- Once I had a list of all these files, iterated over each file and passed the scraped data on to Gemini for further cleaning and better formatting
- Saved all the cleaned data in a single text file:
data/processed/context.txt
There are 3 primary components to the system
- Vector Database
- HuggingFace Inference Endpoint
- Streamlit Application
Workflow:
Once the context.txt
file was created, chunked the document by paragraphs. Each paragraph was then converted to vector embeddings using the all-MiniLM-L6-v2
model. These embeddings were then stored in ChromaDB
. ChromaDB was hosted on Azure on compute optimized instance with the following specs:
The script for ingestion can be found in scripts/ingest_data_into_vector_db.py
The model is deployed on a dedicated HuggingFace serverless protected endpoint which can only be accessed using a certain HF_TOKEN
which is injected into the streamlit app as an environment variable:
Finally this model API endpoint was called by the streamlit interface:
Using Human-as-a-Judge for the performance metric. 3 testers including myself evaluated the response of the model to the same 20 questions. On average, the model answered 16/20 questions correctly. Sample of questions used:
1. What courses does Professor Brinnae Bent take?
2. Who is the director of the AIPI program?
3. What are some housing options nearby?
4. How many credits do I need to graduate?
5. What courses can I take in the fall?
...
Training (RunPod): 2 GPU x 80GB VRAM H100 NVIDIA GPU + 125 GB RAM: $4.59 / hr
Inference (HuggingFace Serverless Inference Endpoint - Dedicated): 1 GPU x 80GB VRAM A100 NVIDIA GPU + 145 GB RAM: $4 / hr
Hosting ChromaDB (Azure): 1 Standard_F4s_v2 - $ 0.0169 / hr
-
Training: Quantize the model and use techniques like
QLora
for finetuning (this way we can train it on a massive CPU cluster instead of 1 big GPU but this would be slower) -
Inference: This was the cheapest option available, tried
RunPod
serverless andHuggingFace
, even the smaller T4 GPUs don't work because it's a 7B model -
Hosting ChromaDB (Azure): Can use a smaller instance here with lesser vCPUs and RAM
Use this link